[PoC] Reducing planning time when tables have many partitions
Hello,
I found a problem that planning takes too much time when the tables
have many child partitions. According to my observation, the planning
time increases in the order of O(n^2). Here, n is the number of child
partitions. I attached the patch to solve this problem. Please be
noted that this patch is a PoC.
1. Problem Statement
The problem arises in the next simple query. This query is modeled
after a university's grade system, joining tables about students,
scores, and their GPAs to output academic records for each student.
=====
SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score
FROM students, scores, gpas
WHERE students.id = scores.student_id AND students.id = gpas.student_id
GROUP BY students.id, gpas.student_id;
=====
Here, since there are so many students enrolled in the university, we
will partition each table. If so, the planning time of the above query
increases very rapidly as the number of partitions increases.
I conducted an experiment by varying the number of partitions of three
tables (students, scores, and gpas) from 2 to 1024. The attached
figure illustrates the result. The blue line annotated with "master"
stands for the result on the master branch. Obviously, its
computational complexity is large.
I attached SQL files to this e-mail as "sample-queries.zip". You can
reproduce my experiment by the next steps:
=====
$ unzip sample-queries.zip
$ cd sample-queries
# Create tables and insert sample data ('n' denotes the number of partitions)
$ psql -f create-table-n.sql
# Measure planning time
$ psql -f query-n.sql
=====
2. Where is Slow?
In order to identify bottlenecks, I ran a performance profiler(perf).
The "perf-master.png" is a call graph of planning of query-1024.sql.
From this figure, it can be seen that "bms_equal" and "bms_is_subset"
take up most of the running time. Most of these functions are called
when enumerating EquivalenceMembers in EquivalenceClass. The
enumerations exist in src/backend/optimizer/path/equivclass.c and have
the following form.
=====
EquivalenceClass *ec = /* given */;
EquivalenceMember *em;
ListCell *lc;
foreach(lc, ec->ec_members)
{
em = (EquivalenceMember *) lfirst(lc);
/* predicate is bms_equal or bms_is_subset, etc */
if (!predicate(em))
continue;
/* The predicate satisfies */
do something...;
}
=====
This foreach loop is a linear search, whose cost will become very high
when there are many EquivalenceMembers in ec_members. This is the case
when the number of partitions is large. Eliminating this heavy linear
search is a key to improving planning performance.
3. How to Solve?
In my patch, I made three different optimizations depending on the
predicate pattern.
3.1 When the predicate is "!em->em_is_child"
In equivclass.c, there are several processes performed when
em_is_child is false. If a table has many partitions, the number of
EquivalenceMembers which are not children is limited. Therefore, it is
useful to keep only the non-child members as a list in advance.
My patch adds the "ec_not_child_members" field to EquivalenceClass.
This field is a List containing non-child members. Taking advantage of
this, the previous loop can be rewritten as follows:
=====
foreach(lc, ec->ec_not_child_members)
{
em = (EquivalenceMember *) lfirst(lc);
Assert(!em->em_is_child);
do something...;
}
=====
3.2 When the predicate is "bms_equal(em->em_relids, relids)"
"bms_equal" is another example of the predicate. In this case,
processes will be done when the "em_relids" matches certain Relids.
This type of loop can be quickly handled by utilizing a hash table.
First, group EquivalenceMembers with the same Relids into a list.
Then, create an associative array whose key is Relids and whose value
is the list. In my patch, I added the "ec_members_htab" field to
EquivalenceClass, which plays a role of an associative array.
Based on this idea, the previous loop is transformed as follows. Here,
the FindEcMembersMatchingRelids function looks up the hash table and
returns the corresponding value, which is a list.
=====
foreach(lc, FindEcMembersMatchingRelids(ec, relids))
{
em = (EquivalenceMember *) lfirst(lc);
Assert(bms_equal(em->em_relids, relids));
do something...;
}
=====
3.3 When the predicate is "bms_is_subset(em->em_relids, relids)"
There are several processings performed on EquivalenceMembers whose
em_relids is a subset of the given "relids". In this case, the
predicate is "bms_is_subset". Optimizing this search is not as easy as
with bms_equal, but the technique above by hash tables can be applied.
There are 2^m subsets if the number of elements of the "relids" is m.
The key here is that m is not so large in most cases. For example, m
is up to 3 in the sample query, meaning that the number of subsets is
at most 2^3=8. Therefore, we can enumerate all subsets within a
realistic time. Looking up the hash table with each subset as a key
will drastically reduce unnecessary searches. My patch's optimization
is based on this notion.
This technique can be illustrated as the next pseudo-code. The code
iterates over all subsets and looks up the corresponding
EquivalenceMembers from the hash table. The actual code is more
complicated for performance reasons.
===
EquivalenceClass *ec = /* given */;
Relids relids = /* given */;
int num_members_in_relids = bms_num_members(relids);
for (int bit = 0; bit < (1 << num_members_in_relids); bit++)
{
EquivalenceMember *em;
ListCell *lc;
Relids subset = construct subset from 'bit';
foreach(lc, FindEcMembersMatchingRelids(ec, subset))
{
em = (EquivalenceMember *) lfirst(lc);
Assert(bms_is_subset(em->em_relids, relids));
do something...;
}
}
===
4. Experimental Result
The red line in the attached figure is the planning time with my
patch. The chart indicates that planning performance has been greatly
improved. The exact values are shown below.
Planning time of "query-n.sql" (n = number of partitions):
n | Master (s) | Patched (s) | Speed up
------------------------------------------
2 | 0.003 | 0.003 | 0.9%
4 | 0.004 | 0.004 | 1.0%
8 | 0.006 | 0.006 | 4.6%
16 | 0.011 | 0.010 | 5.3%
32 | 0.017 | 0.016 | 4.7%
64 | 0.032 | 0.030 | 8.0%
128 | 0.073 | 0.060 | 17.7%
256 | 0.216 | 0.142 | 34.2%
384 | 0.504 | 0.272 | 46.1%
512 | 0.933 | 0.462 | 50.4%
640 | 1.529 | 0.678 | 55.7%
768 | 2.316 | 1.006 | 56.6%
896 | 3.280 | 1.363 | 58.5%
1024 | 4.599 | 1.770 | 61.5%
With 1024 partitions, the planning time was reduced by 61.5%. Besides,
with 128 partitions, which is a realistic use case, the performance
increased by 17.7%.
5. Things to Be Discussed
5.1 Regressions
While my approach is effective for tables with a large number of
partitions, it may cause performance degradation otherwise. For small
cases, it is necessary to switch to a conventional algorithm. However,
its threshold is not self-evident.
5.2 Enumeration order
My patch may change the order in which members are enumerated. This
affects generated plans.
5.3 Code Quality
Source code quality should be improved.
=====
Again, I posted this patch as a PoC. I would appreciate it if you
would discuss the effectiveness of these optimizations with me.
Best regards,
Yuya Watari
Attachments:
v1-reducing-planning-time-when-tables-have-many-partitions.patchapplication/octet-stream; name=v1-reducing-planning-time-when-tables-have-many-partitions.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6bdad462c7..178464615b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2479,6 +2479,8 @@ _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
WRITE_NODE_FIELD(ec_opfamilies);
WRITE_OID_FIELD(ec_collation);
WRITE_NODE_FIELD(ec_members);
+ WRITE_NODE_FIELD(ec_members_htab);
+ WRITE_NODE_FIELD(ec_not_child_members);
WRITE_NODE_FIELD(ec_sources);
WRITE_NODE_FIELD(ec_derives);
WRITE_BITMAPSET_FIELD(ec_relids);
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 8c6770de97..a1c6e48145 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -31,6 +31,144 @@
#include "optimizer/restrictinfo.h"
#include "utils/lsyscache.h"
+/*
+ * Constant, state structs and enum for looping macros below.
+ */
+#define MAX_NUM_SUBSETS_OF_RELIDS (1 << 16)
+
+typedef struct
+{
+ const List *list;
+ int i;
+} EmForeachListState;
+
+typedef enum
+{
+ EM_FOREACH_NEXT_BIT, /* Proceed to next subset */
+ EM_FOREACH_NEXT_LIST_ENTRY, /* Proceed to next entry */
+ EM_FOREACH_LINEAR_SEARCH, /* Standard linear search */
+ EM_FOREACH_FINISHED /* Loop was done */
+} EmForeachBitStateStatus;
+
+typedef struct
+{
+ EmForeachBitStateStatus status;
+ EquivalenceClass *ec;
+ Relids relids;
+ int32 bit;
+ int num_members_in_relids;
+ int all_members_in_relids[MAX_NUM_SUBSETS_OF_RELIDS];
+ Relids subset;
+ List *list;
+ int i;
+} EmForeachBitState;
+
+/*
+ * The following three em_foreach_* macros help enumerate
+ * EquivalenceClass::ec_members.
+ *
+ * See the comments below for further information.
+ */
+
+/*
+ * em_foreach_not_children
+ *
+ * Optimized version of
+ * =====
+ * ListCell *lc;
+ * foreach(lc, ec->ec_members)
+ * {
+ * EquivalenceMember *em = lfirst(lc);
+ *
+ * if (em->em_is_child)
+ * continue;
+ *
+ * content of loop...
+ * }
+ * =====
+ *
+ * Usage:
+ * =====
+ * EquivalenceClass *ec = ...;
+ *
+ * EquivalenceMember *em;
+ * em_foreach_not_children(em, ec)
+ * {
+ * content of loop...
+ * }
+ * =====
+ */
+#define em_foreach_not_children(em, ec) \
+ for (EmForeachListState em##__state = { (ec)->ec_not_child_members, -1 }; \
+ em_foreach_list_move_next(&em##__state, &em); )
+
+/*
+ * em_foreach_relids_equals
+ *
+ * Optimized version of
+ * =====
+ * ListCell *lc;
+ * foreach(lc, ec->ec_members)
+ * {
+ * EquivalenceMember *em = lfirst(lc);
+ *
+ * if (!bms_equal(em->em_relids, relids))
+ * continue;
+ *
+ * content of loop...
+ * }
+ * =====
+ *
+ * Usage:
+ * =====
+ * EquivalenceClass *ec = ...;
+ * Relids relids = ...;
+ *
+ * EquivalenceMember *em;
+ * em_foreach_relids_equals(em, ec)
+ * {
+ * content of loop...
+ * }
+ * =====
+ */
+#define em_foreach_relids_equals(em, ec, relids) \
+ for (EmForeachListState em##__state = \
+ { FindEcMembersMatchingRelids(ec, relids), -1 }; \
+ em_foreach_list_move_next(&em##__state, &em); )
+
+/*
+ * em_foreach_relids_subset
+ *
+ * Optimized version of
+ * =====
+ * ListCell *lc;
+ * foreach(lc, ec->ec_members)
+ * {
+ * EquivalenceMember *em = lfirst(lc);
+ *
+ * if (!bms_is_subset(em->em_relids, relids))
+ * continue;
+ *
+ * content of loop...
+ * }
+ * =====
+ *
+ * Usage:
+ * =====
+ * EquivalenceClass *ec = ...;
+ * Relids relids = ...;
+ *
+ * EmForeachBitState state;
+ * EquivalenceMember *em;
+ * initialize_EmForeachBitState(&state, ec, relids);
+ * em_foreach_relids_subset(em, ec)
+ * {
+ * content of loop...
+ * }
+ * =====
+ */
+#define em_foreach_relids_subset(state, em) \
+ while (em_foreach_bit_move_next(&state, &em))
static EquivalenceMember *add_eq_member(EquivalenceClass *ec,
Expr *expr, Relids relids, Relids nullable_relids,
@@ -69,6 +207,21 @@ static Bitmapset *get_eclass_indexes_for_relids(PlannerInfo *root,
Relids relids);
static Bitmapset *get_common_eclass_indexes(PlannerInfo *root, Relids relids1,
Relids relids2);
+static inline bool em_foreach_list_move_next(
+ EmForeachListState *state, EquivalenceMember **em);
+static inline void initialize_EmForeachBitState(
+ EmForeachBitState *state, EquivalenceClass *ec, Relids relids);
+static inline bool em_foreach_bit_move_next(
+ EmForeachBitState *state, EquivalenceMember **em);
+static void InitializeEcMembersHtab(EquivalenceClass *ec);
+static void AppendEcMember(EquivalenceClass *ec,
+ EquivalenceMember *emem);
+static void ConcatEcMember(EquivalenceClass *ec1,
+ EquivalenceClass *ec2);
+static void DeleteNthEcMember(EquivalenceClass *ec,
+ int index);
+static List *FindEcMembersMatchingRelids(EquivalenceClass *ec,
+ Relids relids);
/*
@@ -364,7 +517,7 @@ process_equivalence(PlannerInfo *root,
* PathKeys. We leave it behind with a link so that the merged EC can
* be found.
*/
- ec1->ec_members = list_concat(ec1->ec_members, ec2->ec_members);
+ ConcatEcMember(ec1, ec2);
ec1->ec_sources = list_concat(ec1->ec_sources, ec2->ec_sources);
ec1->ec_derives = list_concat(ec1->ec_derives, ec2->ec_derives);
ec1->ec_relids = bms_join(ec1->ec_relids, ec2->ec_relids);
@@ -379,6 +532,8 @@ process_equivalence(PlannerInfo *root,
root->eq_classes = list_delete_nth_cell(root->eq_classes, ec2_idx);
/* just to avoid debugging confusion w/ dangling pointers: */
ec2->ec_members = NIL;
+ ec2->ec_members_htab = NULL;
+ ec2->ec_not_child_members = NIL;
ec2->ec_sources = NIL;
ec2->ec_derives = NIL;
ec2->ec_relids = NULL;
@@ -439,6 +594,8 @@ process_equivalence(PlannerInfo *root,
ec->ec_opfamilies = opfamilies;
ec->ec_collation = collation;
ec->ec_members = NIL;
+ ec->ec_members_htab = NULL;
+ ec->ec_not_child_members = NIL;
ec->ec_sources = list_make1(restrictinfo);
ec->ec_derives = NIL;
ec->ec_relids = NULL;
@@ -573,7 +730,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
{
ec->ec_relids = bms_add_members(ec->ec_relids, relids);
}
- ec->ec_members = lappend(ec->ec_members, em);
+ AppendEcMember(ec, em);
return em;
}
@@ -645,7 +802,7 @@ get_eclass_for_sort_expr(PlannerInfo *root,
foreach(lc1, root->eq_classes)
{
EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);
- ListCell *lc2;
+ EquivalenceMember *cur_em = NULL;
/*
* Never match to a volatile EC, except when we are looking at another
@@ -660,17 +817,31 @@ get_eclass_for_sort_expr(PlannerInfo *root,
if (!equal(opfamilies, cur_ec->ec_opfamilies))
continue;
- foreach(lc2, cur_ec->ec_members)
+ /*
+ * Ignore child members unless they match the request.
+ */
+
+ em_foreach_not_children(cur_em, cur_ec)
{
- EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
+ Assert(!cur_em->em_is_child || bms_equal(cur_em->em_relids, rel));
/*
- * Ignore child members unless they match the request.
+ * If below an outer join, don't match constants: they're not as
+ * constant as they look.
*/
- if (cur_em->em_is_child &&
- !bms_equal(cur_em->em_relids, rel))
+ if (cur_ec->ec_below_outer_join &&
+ cur_em->em_is_const)
continue;
+ if (opcintype == cur_em->em_datatype &&
+ equal(expr, cur_em->em_expr))
+ return cur_ec; /* Match! */
+ }
+
+ em_foreach_relids_equals(cur_em, cur_ec, rel)
+ {
+ Assert(!cur_em->em_is_child || bms_equal(cur_em->em_relids, rel));
+
/*
* If below an outer join, don't match constants: they're not as
* constant as they look.
@@ -700,6 +871,8 @@ get_eclass_for_sort_expr(PlannerInfo *root,
newec->ec_opfamilies = list_copy(opfamilies);
newec->ec_collation = collation;
newec->ec_members = NIL;
+ newec->ec_members_htab = NULL;
+ newec->ec_not_child_members = NIL;
newec->ec_sources = NIL;
newec->ec_derives = NIL;
newec->ec_relids = NULL;
@@ -787,17 +960,23 @@ find_ec_member_matching_expr(EquivalenceClass *ec,
Expr *expr,
Relids relids)
{
- ListCell *lc;
+ EquivalenceMember *em;
+ EmForeachBitState state;
/* We ignore binary-compatible relabeling on both ends */
while (expr && IsA(expr, RelabelType))
expr = ((RelabelType *) expr)->arg;
- foreach(lc, ec->ec_members)
+ /*
+ * Ignore child members unless they belong to the requested rel.
+ */
+
+ em_foreach_not_children(em, ec)
{
- EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
Expr *emexpr;
+ Assert(!em->em_is_child || bms_is_subset(em->em_relids, relids));
+
/*
* We shouldn't be trying to sort by an equivalence class that
* contains a constant, so no need to consider such cases any further.
@@ -806,10 +985,28 @@ find_ec_member_matching_expr(EquivalenceClass *ec,
continue;
/*
- * Ignore child members unless they belong to the requested rel.
+ * Match if same expression (after stripping relabel).
*/
- if (em->em_is_child &&
- !bms_is_subset(em->em_relids, relids))
+ emexpr = em->em_expr;
+ while (emexpr && IsA(emexpr, RelabelType))
+ emexpr = ((RelabelType *) emexpr)->arg;
+
+ if (equal(emexpr, expr))
+ return em;
+ }
+
+ initialize_EmForeachBitState(&state, ec, relids);
+ em_foreach_relids_subset(state, em)
+ {
+ Expr *emexpr;
+
+ Assert(!em->em_is_child || bms_is_subset(em->em_relids, relids));
+
+ /*
+ * We shouldn't be trying to sort by an equivalence class that
+ * contains a constant, so no need to consider such cases any further.
+ */
+ if (em->em_is_const)
continue;
/*
@@ -1578,6 +1775,8 @@ generate_join_implied_equalities_normal(PlannerInfo *root,
List *outer_members = NIL;
List *inner_members = NIL;
ListCell *lc1;
+ EquivalenceMember *cur_em;
+ EmForeachBitState state;
/*
* First, scan the EC to identify member values that are computable at the
@@ -1588,17 +1787,16 @@ generate_join_implied_equalities_normal(PlannerInfo *root,
* as well as to at least one input member, plus enforce at least one
* outer-rel member equal to at least one inner-rel member.
*/
- foreach(lc1, ec->ec_members)
- {
- EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc1);
- /*
- * We don't need to check explicitly for child EC members. This test
- * against join_relids will cause them to be ignored except when
- * considering a child inner rel, which is what we want.
- */
- if (!bms_is_subset(cur_em->em_relids, join_relids))
- continue; /* not computable yet, or wrong child */
+ /*
+ * We don't need to check explicitly for child EC members. This test
+ * against join_relids will cause them to be ignored except when
+ * considering a child inner rel, which is what we want.
+ */
+ initialize_EmForeachBitState(&state, ec, join_relids);
+ em_foreach_relids_subset(state, cur_em)
+ {
+ Assert(bms_is_subset(cur_em->em_relids, join_relids));
if (bms_is_subset(cur_em->em_relids, outer_relids))
outer_members = lappend(outer_members, cur_em);
@@ -2354,7 +2552,7 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo)
*/
if (matchleft && matchright)
{
- cur_ec->ec_members = list_delete_nth_cell(cur_ec->ec_members, coal_idx);
+ DeleteNthEcMember(cur_ec, coal_idx);
return true;
}
@@ -2875,7 +3073,7 @@ generate_implied_equalities_for_column(PlannerInfo *root,
{
EquivalenceClass *cur_ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
- ListCell *lc2;
+ EquivalenceMember *other_em;
/* Sanity check eclass_indexes only contain ECs for rel */
Assert(is_child_rel || bms_is_subset(rel->relids, cur_ec->ec_relids));
@@ -2898,11 +3096,10 @@ generate_implied_equalities_for_column(PlannerInfo *root,
* match. See also get_eclass_for_sort_expr.)
*/
cur_em = NULL;
- foreach(lc2, cur_ec->ec_members)
+ em_foreach_relids_equals(cur_em, cur_ec, rel->relids)
{
- cur_em = (EquivalenceMember *) lfirst(lc2);
- if (bms_equal(cur_em->em_relids, rel->relids) &&
- callback(root, rel, cur_ec, cur_em, callback_arg))
+ Assert(bms_equal(cur_em->em_relids, rel->relids));
+ if (callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}
@@ -2914,14 +3111,12 @@ generate_implied_equalities_for_column(PlannerInfo *root,
* Found our match. Scan the other EC members and attempt to generate
* joinclauses.
*/
- foreach(lc2, cur_ec->ec_members)
+ em_foreach_not_children(other_em, cur_ec)
{
- EquivalenceMember *other_em = (EquivalenceMember *) lfirst(lc2);
Oid eq_op;
RestrictInfo *rinfo;
- if (other_em->em_is_child)
- continue; /* ignore children here */
+ Assert(!other_em->em_is_child);
/* Make sure it'll be a join to a different rel */
if (other_em == cur_em ||
@@ -3253,3 +3448,407 @@ get_common_eclass_indexes(PlannerInfo *root, Relids relids1, Relids relids2)
/* Calculate and return the common EC indexes, recycling the left input. */
return bms_int_members(rel1ecs, rel2ecs);
}
+
+/*
+ * em_foreach_list_move_next
+ * Update EmForeachListState so that it points to the next
+ * EquivalenceMember.
+ * Return true if the next element was found with assigning it to *em.
+ */
+static inline bool em_foreach_list_move_next(
+ EmForeachListState *state, EquivalenceMember **em)
+{
+ int length = list_length(state->list);
+
+ while (++(state->i) < length)
+ {
+ EquivalenceMemberListEntry *listEntry
+ = (EquivalenceMemberListEntry *) list_nth(state->list, state->i);
+
+ if (!listEntry->deleted)
+ {
+ *em = listEntry->emem;
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/*
+ * Helper functions for the em_foreach_relids_subset macro.
+ *
+ * em_foreach_relids_subset macro is based on enumeration of all possible
+ * subsets of the given relids. For each subset, we look up the hash table in
+ * EquivalenceClass and iterate over the corresponding EquivalenceMembers.
+ *
+ * This technique can be written like the next pasudo code.
+ * =====
+ * int num_members_in_relids = bms_num_members(relids);
+ * if (num_members_in_relids < MAX_NUM_SUBSETS_OF_RELIDS)
+ * {
+ * // We will use the special optimization technique
+ * for (int bit = 0; bit < (1 << num_members_in_relids); bit++)
+ * {
+ * EquivalenceMember *em;
+ * ListCell *lc;
+ * Relids subset = construct subset from 'bit';
+ *
+ * foreach(lc, FindEcMembersMatchingRelids(ec, subset))
+ * {
+ * em = (EquivalenceMember *) lfirst(lc);
+ * content of loop...
+ * }
+ * }
+ * }
+ * else
+ * {
+ * // There are too many subsets, so we will use simple linear search
+ * ListCell *lc;
+ * foreach(lc, ec->ec_members)
+ * {
+ * EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+ * if (bms_is_subset(em->em_relids, relids))
+ * {
+ * em = (EquivalenceMember *) lfirst(lc);
+ * content of loop...
+ * }
+ * }
+ * }
+ * =====
+ *
+ * EmForeachBitState and related functions provide the state machine for this
+ * algorithm.
+ */
+
+/*
+ * initialize_EmForeachBitState
+ * Initialize the given EmForeachBitState.
+ * This function must be called before using the
+ * em_foreach_relids_subset macro.
+ */
+static inline void initialize_EmForeachBitState(
+ EmForeachBitState *state, EquivalenceClass *ec, Relids relids)
+{
+ const int num_ec_members = list_length(ec->ec_members);
+ int num_members_in_relids = 0;
+ int i;
+
+ state->ec = ec;
+ state->relids = relids;
+
+ /*
+ * Enumerate all members in 'relids'
+ */
+ i = -1;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ ++num_members_in_relids;
+
+ if (num_members_in_relids >= MAX_NUM_SUBSETS_OF_RELIDS
+ || ((1 << num_members_in_relids) >= num_ec_members))
+ {
+ /*
+ * There are too many subsets, so we will use simple linear search
+ * instead of the special optimization technique.
+ */
+ state->status = EM_FOREACH_LINEAR_SEARCH;
+ state->i = -1;
+ return;
+ }
+
+ state->all_members_in_relids[num_members_in_relids - 1] = i;
+ }
+
+ /*
+ * Prepare for the special optimization
+ */
+ state->status = EM_FOREACH_NEXT_BIT;
+ state->bit = -1;
+ state->num_members_in_relids = num_members_in_relids;
+ state->subset = NULL;
+ return;
+}
+
+/*
+ * em_foreach_bit_move_next
+ * Update EmForeachBitState so that it points to the next
+ * EquivalenceMember.
+ * Return true if the next element was found with assigning it to *em.
+ */
+static inline bool em_foreach_bit_move_next(
+ EmForeachBitState *state, EquivalenceMember **em)
+{
+ switch (state->status)
+ {
+ case EM_FOREACH_NEXT_BIT:
+ CASE_EM_FOREACH_NEXT_BIT:
+ {
+ /*
+ * We need to proceed the next subset.
+ */
+ int i;
+ const int num_members_in_relids = state->num_members_in_relids;
+
+ if (++(state->bit) >= (1 << num_members_in_relids))
+ {
+ /* Reached end */
+ state->status = EM_FOREACH_FINISHED;
+ return false;
+ }
+
+ /* Clear Bitmapset */
+ if (state->subset != NULL)
+ {
+ for (i = 0; i < state->subset->nwords; i++)
+ {
+ state->subset->words[i] = 0;
+ }
+ }
+
+ /* Set flags */
+ for (i = 0; i < num_members_in_relids; i++)
+ {
+ if (state->bit & (1 << i))
+ {
+ state->subset =
+ bms_add_member(state->subset,
+ state->all_members_in_relids[i]);
+ }
+ }
+
+ /* Update state and begin iteration on this subset.*/
+ state->status = EM_FOREACH_NEXT_LIST_ENTRY;
+ state->list = FindEcMembersMatchingRelids(
+ state->ec, state->subset);
+ state->i = -1;
+ goto CASE_EM_FOREACH_NEXT_LIST_ENTRY;
+ }
+ case EM_FOREACH_NEXT_LIST_ENTRY:
+ CASE_EM_FOREACH_NEXT_LIST_ENTRY:
+ {
+ /*
+ * Iterate on the current subset.
+ */
+ const List *list = state->list;
+ const int length = list_length(list);
+ int i = state->i;
+
+ while (++i < length)
+ {
+ EquivalenceMemberListEntry *listEntry
+ = (EquivalenceMemberListEntry *) list_nth(list, i);
+
+ if (!listEntry->deleted)
+ {
+ /* Found */
+ *em = listEntry->emem;
+ state->i = i;
+ return true;
+ }
+ }
+
+ /*
+ * There are no more members in the current subset.
+ * We need to proceed next one.
+ */
+ state->status = EM_FOREACH_NEXT_BIT;
+ goto CASE_EM_FOREACH_NEXT_BIT;
+ }
+ case EM_FOREACH_LINEAR_SEARCH:
+ {
+ /*
+ * Simple linear search.
+ */
+ const List *ec_members = state->ec->ec_members;
+ const Relids relids = state->relids;
+ const int length = list_length(ec_members);
+ int i = state->i;
+
+ while (++i < length)
+ {
+ EquivalenceMember *member =
+ (EquivalenceMember *) list_nth(ec_members, i);
+
+ if (bms_is_subset(member->em_relids, relids))
+ {
+ /* Found */
+ *em = member;
+ state->i = i;
+ return true;
+ }
+ }
+
+ /*
+ * Reached end
+ */
+ state->status = EM_FOREACH_FINISHED;
+ return false;
+ }
+ case EM_FOREACH_FINISHED:
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * InitializeEcMembersHtab
+ * Initialize a hash table in the given EquivalenceClass.
+ */
+static void InitializeEcMembersHtab(EquivalenceClass *ec)
+{
+ HASHCTL hash_ctl;
+
+ if (ec->ec_members_htab != NULL)
+ {
+ return;
+ }
+
+ hash_ctl.keysize = sizeof(Relids);
+ hash_ctl.entrysize = sizeof(EquivalenceMemberHashEntry);
+ hash_ctl.hash = bitmap_hash;
+ hash_ctl.match = bitmap_match;
+ hash_ctl.hcxt = CurrentMemoryContext;
+ ec->ec_members_htab =
+ hash_create("EquivalenceMemberHashTable",
+ 256L,
+ &hash_ctl,
+ HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+}
+
+/*
+ * AppendEcMember
+ * Append EquivalenceMember emem to the EquivalenceClass ec.
+ * This function modifies ec->ec_members, ec->ec_not_child_members,
+ * and ec->ec_members_htab.
+ */
+static void AppendEcMember(EquivalenceClass *ec,
+ EquivalenceMember *emem)
+{
+ EquivalenceMemberHashEntry *hashEntry;
+ EquivalenceMemberListEntry *listEntry;
+ bool found;
+
+ if (ec->ec_members_htab == NULL)
+ {
+ InitializeEcMembersHtab(ec);
+ }
+
+ listEntry = palloc(sizeof(EquivalenceMemberListEntry));
+ listEntry->emem = emem;
+ listEntry->deleted = false;
+
+ ec->ec_members = lappend(ec->ec_members, emem);
+ if (!emem->em_is_child)
+ {
+ ec->ec_not_child_members =
+ lappend(ec->ec_not_child_members, listEntry);
+ }
+
+ hashEntry = hash_search(ec->ec_members_htab, &(emem->em_relids),
+ HASH_ENTER, &found);
+ if (!found)
+ {
+ hashEntry->list = NIL;
+ }
+
+ hashEntry->list = lappend(hashEntry->list, listEntry);
+}
+
+/*
+ * ConcatEcMember
+ * Append all EquivalenceMembers in ec2 to ec1 while updating as in
+ * AppendEcMember.
+ */
+static void ConcatEcMember(EquivalenceClass *ec1,
+ EquivalenceClass *ec2)
+{
+ ListCell *lc;
+
+ if (ec1->ec_members_htab == NULL)
+ {
+ InitializeEcMembersHtab(ec1);
+ }
+
+ foreach(lc, ec2->ec_members)
+ {
+ AppendEcMember(ec1, lfirst_node(EquivalenceMember, lc));
+ }
+}
+
+/*
+ * DeleteNthEcMember
+ * Delete the EquivalenceMember whose index in ec->ec_members is the
+ * 'index' argument.
+ */
+static void DeleteNthEcMember(EquivalenceClass *ec,
+ int index)
+{
+ EquivalenceMemberHashEntry *hashEntry;
+ EquivalenceMemberListEntry *listEntry;
+ bool found;
+ EquivalenceMember *emem;
+ ListCell *lc;
+#ifdef USE_ASSERT_CHECKING
+ bool memberToBeDeletedWasFound = false;
+#endif
+
+ emem = list_nth(ec->ec_members, index);
+ ec->ec_members = list_delete_nth_cell(ec->ec_members, index);
+
+ Assert(ec->ec_members_htab != NULL);
+ hashEntry = hash_search(ec->ec_members_htab, &(emem->em_relids),
+ HASH_FIND, &found);
+ Assert(found);
+
+ /*
+ * We mark the corresponding EquivalenceMemberListEntry deleted
+ * instead of removing it from the list.
+ */
+ foreach(lc, hashEntry->list)
+ {
+ listEntry = (EquivalenceMemberListEntry *) lfirst(lc);
+
+ if (listEntry->emem == emem)
+ {
+ listEntry->deleted = true;
+#ifdef USE_ASSERT_CHECKING
+ memberToBeDeletedWasFound = true;
+#endif
+ break;
+ }
+ }
+#ifdef USE_ASSERT_CHECKING
+ Assert(memberToBeDeletedWasFound);
+#endif
+}
+
+/*
+ * FindEcMembersMatchingRelids
+ * Looks up the hash table in the EquivalenceClass and returns a list
+ * containing EquivalenceMemberHashEntry whose em_relids is equal to the
+ * 'relids' argument. If there are no members satisfying the condition,
+ * NIL will be returned.
+ *
+ * Note: Please confirm the 'deleted' flag in EquivalenceMemberHashEntry.
+ * Further information is available in EquivalenceMemberHashEntry's
+ * comment.
+ */
+static List *FindEcMembersMatchingRelids(EquivalenceClass *ec,
+ Relids relids)
+{
+ EquivalenceMemberHashEntry *entry;
+ bool found;
+
+ if (ec->ec_members_htab == NULL)
+ {
+ return NIL;
+ }
+
+ entry = hash_search(ec->ec_members_htab, &relids,
+ HASH_FIND, &found);
+
+ return found ? entry->list : NIL;
+}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..5f5aa1b314 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -19,6 +19,7 @@
#include "nodes/params.h"
#include "nodes/parsenodes.h"
#include "storage/block.h"
+#include "utils/hsearch.h"
/*
@@ -988,6 +989,9 @@ typedef struct EquivalenceClass
List *ec_opfamilies; /* btree operator family OIDs */
Oid ec_collation; /* collation, if datatypes are collatable */
List *ec_members; /* list of EquivalenceMembers */
+ HTAB *ec_members_htab; /* Hash table for ec_members */
+ List *ec_not_child_members; /* list of EquivalenceMembers
+ which are not children */
List *ec_sources; /* list of generating RestrictInfos */
List *ec_derives; /* list of derived RestrictInfos */
Relids ec_relids; /* all relids appearing in ec_members, except
@@ -1043,6 +1047,34 @@ typedef struct EquivalenceMember
Oid em_datatype; /* the "nominal type" used by the opfamily */
} EquivalenceMember;
+/*
+ * EquivalenceMemberHashEntry
+ *
+ * Key and value of EquivalenceMember::ec_members_htab.
+ * The 'list' field contains EquivalenceMemberListEntries whose em_relids is
+ * equal to the key.
+ */
+typedef struct EquivalenceMemberHashEntry
+{
+ Relids em_relids; /* hash key --- MUST BE FIRST */
+ List *list; /* List of EquivalenceMemberListEntry */
+} EquivalenceMemberHashEntry;
+
+/*
+ * EquivalenceMemberListEntry
+ *
+ * Entry of EquivalenceMemberHashEntry::list.
+ *
+ * Note: Before accessing emem field, confirm that the 'deleted' flag is false.
+ * If the value is true, the corresponding EquivalenceMember no longer exists
+ * in EquivalenceClass::ec_members.
+ */
+typedef struct EquivalenceMemberListEntry
+{
+ EquivalenceMember *emem;
+ bool deleted;
+} EquivalenceMemberListEntry;
+
/*
* PathKeys
*
figure.pngimage/png; name=figure.pngDownload
�PNG
IHDR � 8 ���C sRGB ��� gAMA ���a ��IDATx^��|T����!����%� � (�A�X0�J���V,���U����������l���X��G����5�rP��@ ��!��C��������L�I2������>��J2�u�u�k��
�; ���0� �8
� "( @�� !�0 �
� "( @�� !�0 �
� "( @�� !�0 ��0��i�IXX�i P?���D� �o��s� !�0 �
� "( @���N�aaa���� ���N�\ @0+��0 �
� "( @�� !�0 �
� "( @�� !�0 �
� "( @�� !�0 �
� "( @�� !�0 �
� "( @�� !�0 �
��E�2��),�ST�z���Ac����������cV-s���UF��j�{�MU��I�S�u��ih�J�����?��|?�&�h�>X0^���(�c��t��^��6�k~���^���c���g j�k�*�{b� T��w�d�QG��+u�fg��
�
e3�Y������!U���y�S��f8�i�LM�9��-O�����A=�^vQq=�k���U������\���W�� ��?�\��qL����~�0(g�������^����������m��'=���e���.T��EZ���B3��07KY���� @�B�:���U�������g��<S����g ���������c��m�Q�3��L~��6����f �` h�(��I�b�z#�"0�T�F���VzV}�%�R7� *S��\���JO�?j� @C��+��i^�G��[Ru�1m ��0��i���{�� =4P��������&T�R~�;%�c��f+�����g�K�4}�������9��u�M�!J���&&�~�^nS���y���y9�5~^�4av�������n~H7�0A��8U��t������?������7v��V�G�j"�h]y�C�0�
um���Mn�#7�p��R��s��������������g�l�h�;�$�E������K������e��g�Kd����c����9\ow��H�"U�9��^�g=�_��g����/���?S'�p������k�g�i�Fh�s�u�4e���SR�Mt�o~��?s���k���u����Q
��\���W�� ��?�\��qpWT�3��Z��x��p>��4=���C<��!����G~*>���>6��1J�~��������������e]�dw !D�NbY�Cf�����S,��8����FU�@�ymSC|0�M�����S0����R�c��{�+�
�Gz<�s��s����O�����7��<�|��kY�N^�sTc���C<�r��������S<��!�������8&�����M��:\6C@7P��J�&��9��y�}T` �G���M�)Yw\�S&���E��Kr��M��!}��4 �$ X����g7h��!fF�3�����&