Redundant filter in index scan with a bool column
Hi hackers,
Consider this query plan:
create table t (i int, b bool);
create index on t(i, b);
set enable_bitmapscan to off;
explain select * from t where i = 300 and b;
QUERY PLAN
-------------------------------------------------------------------------
Index Only Scan using t_i_b_idx on t (cost=0.15..24.27 rows=6 width=5)
Index Cond: ((i = 300) AND (b = true))
Filter: b
The filter is not needed, why is it there? Turns out we can't recognize
that the restriction clause 'b' and the index clause 'b = true' are
equivalent. My first reaction was to patch operator_predicate_proof to
handle this case, but there is a more straightforward way: mark the
expanded index clause as potentially redundant when it is generated in
expand_indexqual_conditions. There is already RestrictInfo.parent_ec
that is used to mark redundant EC-derived join clauses. The patch
renames it to rinfo_parent and uses it to mark the expanded index
clauses. Namely, for both the expanded and the original clause,
rinfo_parent points to the original clause or its generating EC, if set.
The name is no good -- the original clause is not a parent of itself,
after all. I considered something like redundancy_tag, but some places
actually use the fact that it points to the generating EC, so I don't
like this name either.
What do you think?
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Recognize-that-an-expanded-bool-index-clause-is-e-v1.patchtext/x-patch; name=0001-Recognize-that-an-expanded-bool-index-clause-is-e-v1.patchDownload
From 144dd4a72fd8278aa6d44865d97239b9fad6965d Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Wed, 16 Jan 2019 14:31:26 +0300
Subject: [PATCH] Recognize that an expanded bool index clause is equivalent to
the original one
---
src/backend/nodes/copyfuncs.c | 2 +-
src/backend/nodes/outfuncs.c | 2 +-
src/backend/optimizer/path/costsize.c | 8 ++---
src/backend/optimizer/path/equivclass.c | 28 ++++++++---------
src/backend/optimizer/path/indxpath.c | 22 ++++++++++----
src/backend/optimizer/plan/createplan.c | 48 +++++++++++++++---------------
src/backend/optimizer/util/restrictinfo.c | 2 +-
src/include/nodes/relation.h | 10 +++++--
src/test/regress/expected/create_index.out | 9 ++----
9 files changed, 71 insertions(+), 60 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 006a3d1..9acb081 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2243,7 +2243,7 @@ _copyRestrictInfo(const RestrictInfo *from)
COPY_BITMAPSET_FIELD(right_relids);
COPY_NODE_FIELD(orclause);
/* EquivalenceClasses are never copied, so shallow-copy the pointers */
- COPY_SCALAR_FIELD(parent_ec);
+ COPY_SCALAR_FIELD(rinfo_parent);
COPY_SCALAR_FIELD(eval_cost);
COPY_SCALAR_FIELD(norm_selec);
COPY_SCALAR_FIELD(outer_selec);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0fde876..980e9c6 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2436,7 +2436,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
WRITE_BITMAPSET_FIELD(left_relids);
WRITE_BITMAPSET_FIELD(right_relids);
WRITE_NODE_FIELD(orclause);
- /* don't write parent_ec, leads to infinite recursion in plan tree dump */
+ /* don't write rinfo_parent, leads to infinite recursion in plan tree dump */
WRITE_FLOAT_FIELD(norm_selec, "%.4f");
WRITE_FLOAT_FIELD(outer_selec, "%.4f");
WRITE_NODE_FIELD(mergeopfamilies);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 99c5ad9..f350919 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4673,7 +4673,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
/* Drop this clause if it matches any column of the FK */
for (i = 0; i < fkinfo->nkeys; i++)
{
- if (rinfo->parent_ec)
+ if (rinfo->rinfo_parent)
{
/*
* EC-derived clauses can only match by EC. It is okay to
@@ -4683,12 +4683,12 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
* have generated one equating the FK's Vars. So for
* purposes of estimation, we can act as though it did so.
*
- * Note: checking parent_ec is a bit of a cheat because
- * there are EC-derived clauses that don't have parent_ec
+ * Note: checking parent EC is a bit of a cheat because
+ * there are EC-derived clauses that don't have parent EC
* set; but such clauses must compare expressions that
* aren't just Vars, so they cannot match the FK anyway.
*/
- if (fkinfo->eclass[i] == rinfo->parent_ec)
+ if ((Node *) fkinfo->eclass[i] == rinfo->rinfo_parent)
{
remove_it = true;
break;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 6e134ae..b18d8b6 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1054,7 +1054,7 @@ generate_base_implied_equalities_broken(PlannerInfo *root,
* methods. We do not worry here about selecting clauses that are optimal
* for use in a parameterized indexscan. indxpath.c makes its own selections
* of clauses to use, and if the ones we pick here are redundant with those,
- * the extras will be eliminated at createplan time, using the parent_ec
+ * the extras will be eliminated at createplan time, using the rinfo_parent
* markers that we provide (see is_redundant_derived_clause()).
*
* Because the same join clauses are likely to be needed multiple times as
@@ -1264,7 +1264,7 @@ generate_join_implied_equalities_normal(PlannerInfo *root,
}
/*
- * Create clause, setting parent_ec to mark it as redundant with other
+ * Create clause, setting parent EC to mark it as redundant with other
* joinclauses
*/
rinfo = create_join_clause(root, ec, best_eq_op,
@@ -1310,7 +1310,7 @@ generate_join_implied_equalities_normal(PlannerInfo *root,
ec->ec_broken = true;
return NIL;
}
- /* do NOT set parent_ec, this qual is not redundant! */
+ /* do NOT set parent EC, this qual is not redundant! */
rinfo = create_join_clause(root, ec, eq_op,
prev_em, cur_em,
NULL);
@@ -1438,7 +1438,7 @@ create_join_clause(PlannerInfo *root,
rinfo = (RestrictInfo *) lfirst(lc);
if (rinfo->left_em == leftem &&
rinfo->right_em == rightem &&
- rinfo->parent_ec == parent_ec &&
+ rinfo->rinfo_parent == (Node *) parent_ec &&
opno == ((OpExpr *) rinfo->clause)->opno)
return rinfo;
}
@@ -1448,7 +1448,7 @@ create_join_clause(PlannerInfo *root,
rinfo = (RestrictInfo *) lfirst(lc);
if (rinfo->left_em == leftem &&
rinfo->right_em == rightem &&
- rinfo->parent_ec == parent_ec &&
+ rinfo->rinfo_parent == (Node *) parent_ec &&
opno == ((OpExpr *) rinfo->clause)->opno)
return rinfo;
}
@@ -1470,7 +1470,7 @@ create_join_clause(PlannerInfo *root,
ec->ec_min_security);
/* Mark the clause as redundant, or not */
- rinfo->parent_ec = parent_ec;
+ rinfo->rinfo_parent = (Node *) parent_ec;
/*
* We know the correct values for left_ec/right_ec, ie this particular EC,
@@ -2310,7 +2310,7 @@ generate_implied_equalities_for_column(PlannerInfo *root,
if (!OidIsValid(eq_op))
continue;
- /* set parent_ec to mark as redundant with other joinclauses */
+ /* set parent EC to mark as redundant with other joinclauses */
rinfo = create_join_clause(root, cur_ec, eq_op,
cur_em, other_em,
cur_ec);
@@ -2487,25 +2487,25 @@ eclass_useful_for_merging(PlannerInfo *root,
/*
* is_redundant_derived_clause
- * Test whether rinfo is derived from same EC as any clause in clauselist;
- * if so, it can be presumed to represent a condition that's redundant
- * with that member of the list.
+ * Test whether rinfo is derived from same parent as any clause in
+ * clauselist; if so, it can be presumed to represent a condition
+ * that's redundant with that member of the list.
*/
bool
is_redundant_derived_clause(RestrictInfo *rinfo, List *clauselist)
{
- EquivalenceClass *parent_ec = rinfo->parent_ec;
+ Node *rinfo_parent = rinfo->rinfo_parent;
ListCell *lc;
- /* Fail if it's not a potentially-redundant clause from some EC */
- if (parent_ec == NULL)
+ /* Fail if it's not a potentially-redundant clause */
+ if (rinfo_parent == NULL)
return false;
foreach(lc, clauselist)
{
RestrictInfo *otherrinfo = (RestrictInfo *) lfirst(lc);
- if (otherrinfo->parent_ec == parent_ec)
+ if (otherrinfo->rinfo_parent == rinfo_parent)
return true;
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f8e674c..0c4141e 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -559,9 +559,10 @@ consider_index_join_outer_rels(PlannerInfo *root, RelOptInfo *rel,
* parameterization; so skip if any clause derived from the same
* eclass would already have been included when using oldrelids.
*/
- if (rinfo->parent_ec &&
- eclass_already_used(rinfo->parent_ec, oldrelids,
- indexjoinclauses))
+ if (rinfo->rinfo_parent
+ && IsA(rinfo->rinfo_parent, EquivalenceClass)
+ && eclass_already_used((EquivalenceClass *) rinfo->rinfo_parent,
+ oldrelids, indexjoinclauses))
continue;
/*
@@ -691,7 +692,7 @@ eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- if (rinfo->parent_ec == parent_ec &&
+ if (rinfo->rinfo_parent == (Node *) parent_ec &&
bms_is_subset(rinfo->clause_relids, oldrelids))
return true;
}
@@ -3608,8 +3609,17 @@ expand_indexqual_conditions(IndexOptInfo *index,
index);
if (boolqual)
{
- indexquals = lappend(indexquals,
- make_simple_restrictinfo(boolqual));
+ RestrictInfo *newRinfo = make_simple_restrictinfo(boolqual);
+ /*
+ * Mark the expanded and the original rinfo as redundant
+ * by setting the same rinfo_parent for them. If the original
+ * is itself derived from an EC, use that EC.
+ */
+ if (!rinfo->rinfo_parent)
+ rinfo->rinfo_parent = (Node *) newRinfo;
+ newRinfo->rinfo_parent = rinfo->rinfo_parent;
+
+ indexquals = lappend(indexquals, newRinfo);
indexqualcols = lappend_int(indexqualcols, indexcol);
continue;
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 97d0c28..8d0e603 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -121,7 +121,7 @@ static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
BitmapHeapPath *best_path,
List *tlist, List *scan_clauses);
static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
- List **qual, List **indexqual, List **indexECs);
+ List **qual, List **indexqual, List **indexParents);
static void bitmap_subplan_mark_shared(Plan *plan);
static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
List *tlist, List *scan_clauses);
@@ -2755,7 +2755,7 @@ create_bitmap_scan_plan(PlannerInfo *root,
Plan *bitmapqualplan;
List *bitmapqualorig;
List *indexquals;
- List *indexECs;
+ List *indexParents;
List *qpqual;
ListCell *l;
BitmapHeapScan *scan_plan;
@@ -2767,7 +2767,7 @@ create_bitmap_scan_plan(PlannerInfo *root,
/* Process the bitmapqual tree into a Plan tree and qual lists */
bitmapqualplan = create_bitmap_subplan(root, best_path->bitmapqual,
&bitmapqualorig, &indexquals,
- &indexECs);
+ &indexParents);
if (best_path->path.parallel_aware)
bitmap_subplan_mark_shared(bitmapqualplan);
@@ -2808,7 +2808,7 @@ create_bitmap_scan_plan(PlannerInfo *root,
continue; /* we may drop pseudoconstants here */
if (list_member(indexquals, clause))
continue; /* simple duplicate */
- if (rinfo->parent_ec && list_member_ptr(indexECs, rinfo->parent_ec))
+ if (rinfo->rinfo_parent && list_member_ptr(indexParents, rinfo->rinfo_parent))
continue; /* derived from same EquivalenceClass */
if (!contain_mutable_functions(clause) &&
predicate_implied_by(list_make1(clause), indexquals, false))
@@ -2867,17 +2867,17 @@ create_bitmap_scan_plan(PlannerInfo *root,
* predicates, because we have to recheck predicates as well as index
* conditions if the bitmap scan becomes lossy.
*
- * In addition, we return a list of EquivalenceClass pointers for all the
- * top-level indexquals that were possibly-redundantly derived from ECs.
- * This allows removal of scan_clauses that are redundant with such quals.
- * (We do not attempt to detect such redundancies for quals that are within
- * OR subtrees. This could be done in a less hacky way if we returned the
- * indexquals in RestrictInfo form, but that would be slower and still pretty
- * messy, since we'd have to build new RestrictInfos in many cases.)
+ * In addition, we return a list of parent pointers for all the
+ * top-level indexquals that were possibly-redundantly derived from ECs or
+ * other clauses. This allows removal of scan_clauses that are redundant with
+ * such quals. (We do not attempt to detect such redundancies for quals that
+ * are within OR subtrees. This could be done in a less hacky way if we returned
+ * the indexquals in RestrictInfo form, but that would be slower and still
+ * pretty messy, since we'd have to build new RestrictInfos in many cases.)
*/
static Plan *
create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
- List **qual, List **indexqual, List **indexECs)
+ List **qual, List **indexqual, List **indexParents)
{
Plan *plan;
@@ -2887,7 +2887,7 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
List *subplans = NIL;
List *subquals = NIL;
List *subindexquals = NIL;
- List *subindexECs = NIL;
+ List *subindexParents = NIL;
ListCell *l;
/*
@@ -2902,16 +2902,16 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
Plan *subplan;
List *subqual;
List *subindexqual;
- List *subindexEC;
+ List *subindexParent;
subplan = create_bitmap_subplan(root, (Path *) lfirst(l),
&subqual, &subindexqual,
- &subindexEC);
+ &subindexParent);
subplans = lappend(subplans, subplan);
subquals = list_concat_unique(subquals, subqual);
subindexquals = list_concat_unique(subindexquals, subindexqual);
- /* Duplicates in indexECs aren't worth getting rid of */
- subindexECs = list_concat(subindexECs, subindexEC);
+ /* Duplicates in indexParents aren't worth getting rid of */
+ subindexParents = list_concat(subindexParents, subindexParent);
}
plan = (Plan *) make_bitmap_and(subplans);
plan->startup_cost = apath->path.startup_cost;
@@ -2923,7 +2923,7 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
plan->parallel_safe = apath->path.parallel_safe;
*qual = subquals;
*indexqual = subindexquals;
- *indexECs = subindexECs;
+ *indexParents = subindexParents;
}
else if (IsA(bitmapqual, BitmapOrPath))
{
@@ -3004,13 +3004,13 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
*indexqual = subindexquals;
else
*indexqual = list_make1(make_orclause(subindexquals));
- *indexECs = NIL;
+ *indexParents = NIL;
}
else if (IsA(bitmapqual, IndexPath))
{
IndexPath *ipath = (IndexPath *) bitmapqual;
IndexScan *iscan;
- List *subindexECs;
+ List *subindexParents;
ListCell *l;
/* Use the regular indexscan plan build machinery... */
@@ -3049,15 +3049,15 @@ create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
*indexqual = lappend(*indexqual, pred);
}
}
- subindexECs = NIL;
+ subindexParents = NIL;
foreach(l, ipath->indexquals)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (rinfo->parent_ec)
- subindexECs = lappend(subindexECs, rinfo->parent_ec);
+ if (rinfo->rinfo_parent)
+ subindexParents = lappend(subindexParents, rinfo->rinfo_parent);
}
- *indexECs = subindexECs;
+ *indexParents = subindexParents;
}
else
{
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index e633881..a1bc1cc 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -179,7 +179,7 @@ make_restrictinfo_internal(Expr *clause,
* that happens only if it appears in the right context (top level of a
* joinclause list).
*/
- restrictinfo->parent_ec = NULL;
+ restrictinfo->rinfo_parent = NULL;
restrictinfo->eval_cost.startup = -1;
restrictinfo->norm_selec = -1;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3430061..a48da61 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1875,9 +1875,13 @@ typedef struct LimitPath
*
* When join clauses are generated from EquivalenceClasses, there may be
* several equally valid ways to enforce join equivalence, of which we need
- * apply only one. We mark clauses of this kind by setting parent_ec to
+ * apply only one. We mark clauses of this kind by setting rinfo_parent to
* point to the generating EquivalenceClass. Multiple clauses with the same
- * parent_ec in the same join are redundant.
+ * rinfo_parent in the same join are redundant.
+ *
+ * Another case when two clauses are marked as redundant is when we expand
+ * a clause to make it usable in index scans. The expanded clause may have
+ * the original clause as the parent (see expand_indexqual_conditions).
*/
typedef struct RestrictInfo
@@ -1918,7 +1922,7 @@ typedef struct RestrictInfo
Expr *orclause; /* modified clause with RestrictInfos */
/* This field is NULL unless clause is potentially redundant: */
- EquivalenceClass *parent_ec; /* generating EquivalenceClass */
+ Node *rinfo_parent; /* generating EquivalenceClass or RestrictInfo */
/* cache space for cost and selectivity */
QualCost eval_cost; /* eval cost of clause; -1 if not yet set */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 46deb55..f08eefc 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -3181,8 +3181,7 @@ explain (costs off)
Limit
-> Index Scan using boolindex_b_i_key on boolindex
Index Cond: (b = true)
- Filter: b
-(4 rows)
+(3 rows)
explain (costs off)
select * from boolindex where b = true order by i desc limit 10;
@@ -3191,8 +3190,7 @@ explain (costs off)
Limit
-> Index Scan Backward using boolindex_b_i_key on boolindex
Index Cond: (b = true)
- Filter: b
-(4 rows)
+(3 rows)
explain (costs off)
select * from boolindex where not b order by i limit 10;
@@ -3201,8 +3199,7 @@ explain (costs off)
Limit
-> Index Scan using boolindex_b_i_key on boolindex
Index Cond: (b = false)
- Filter: (NOT b)
-(4 rows)
+(3 rows)
--
-- Test for multilevel page deletion
--
2.7.4
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
The filter is not needed, why is it there? Turns out we can't recognize
that the restriction clause 'b' and the index clause 'b = true' are
equivalent.
Yeah, it's intentional that we don't get rid of the extra clause;
it doesn't really seem worth the expense and complexity to do so.
Indexes on bool columns are a pretty niche case in the first place.
Usually, if you are interested in just the rows where b = true,
you're better off using "where b" as an index predicate. In your
example, we can do this instead:
regression=# create index on t(i) where b;
CREATE INDEX
regression=# explain select * from t where i = 300 and b;
QUERY PLAN
------------------------------------------------------------------
Index Scan using t_i_idx on t (cost=0.12..24.19 rows=6 width=5)
Index Cond: (i = 300)
(2 rows)
resulting in a much smaller index, if the b=true condition is selective
enough to be worth indexing. Even in the case you showed, how much is
the redundant filter clause really costing?
My first reaction was to patch operator_predicate_proof to
handle this case, but there is a more straightforward way: mark the
expanded index clause as potentially redundant when it is generated in
expand_indexqual_conditions. There is already RestrictInfo.parent_ec
that is used to mark redundant EC-derived join clauses. The patch
renames it to rinfo_parent and uses it to mark the expanded index
clauses.
That's an unbelievable hack that almost certainly breaks existing uses.
The approach of teaching predtest.c that "b = true" implies "b" would
work, but it seems a bit brute-force because ordinarily such cases
would never be seen there, thanks to simplify_boolean_equality having
canonicalized the former into the latter. The problem we have is that
indxpath.c re-generates "b = true" in indexscan conditions. Thinking
about it now, I wonder if we could postpone that conversion till later,
say do it in create_indexscan_plan after having checked for redundant
clauses. Not sure how messy that'd be.
regards, tom lane