Redundant filter in index scan with a bool column

Started by Alexander Kuzmenkovabout 7 years ago2 messageshackers
Jump to latest
#1Alexander Kuzmenkov
a.kuzmenkov@postgrespro.ru

Hi hackers,

Consider this query plan:

create table t (i int, b bool);
create index on t(i, b);
set enable_bitmapscan to off;
explain select * from t where i = 300 and b;

                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using t_i_b_idx on t  (cost=0.15..24.27 rows=6 width=5)
   Index Cond: ((i = 300) AND (b = true))
   Filter: b

The filter is not needed, why is it there? Turns out we can't recognize
that the restriction clause 'b' and the index clause 'b = true' are
equivalent. My first reaction was to patch operator_predicate_proof to
handle this case, but there is a more straightforward way: mark the
expanded index clause as potentially redundant when it is generated in
expand_indexqual_conditions. There is already RestrictInfo.parent_ec
that is used to mark redundant EC-derived join clauses. The patch
renames it to rinfo_parent and uses it to mark the expanded index
clauses. Namely, for both the expanded and the original clause,
rinfo_parent points to the original clause or its generating EC, if set.
The name is no good -- the original clause is not a parent of itself,
after all. I considered something like redundancy_tag, but some places
actually use the fact that it points to the generating EC, so I don't
like this name either.

What do you think?

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Recognize-that-an-expanded-bool-index-clause-is-e-v1.patchtext/x-patch; name=0001-Recognize-that-an-expanded-bool-index-clause-is-e-v1.patchDownload+71-61
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Kuzmenkov (#1)
Re: Redundant filter in index scan with a bool column

Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:

The filter is not needed, why is it there? Turns out we can't recognize
that the restriction clause 'b' and the index clause 'b = true' are
equivalent.

Yeah, it's intentional that we don't get rid of the extra clause;
it doesn't really seem worth the expense and complexity to do so.
Indexes on bool columns are a pretty niche case in the first place.
Usually, if you are interested in just the rows where b = true,
you're better off using "where b" as an index predicate. In your
example, we can do this instead:

regression=# create index on t(i) where b;
CREATE INDEX
regression=# explain select * from t where i = 300 and b;
QUERY PLAN
------------------------------------------------------------------
Index Scan using t_i_idx on t (cost=0.12..24.19 rows=6 width=5)
Index Cond: (i = 300)
(2 rows)

resulting in a much smaller index, if the b=true condition is selective
enough to be worth indexing. Even in the case you showed, how much is
the redundant filter clause really costing?

My first reaction was to patch operator_predicate_proof to
handle this case, but there is a more straightforward way: mark the
expanded index clause as potentially redundant when it is generated in
expand_indexqual_conditions. There is already RestrictInfo.parent_ec
that is used to mark redundant EC-derived join clauses. The patch
renames it to rinfo_parent and uses it to mark the expanded index
clauses.

That's an unbelievable hack that almost certainly breaks existing uses.

The approach of teaching predtest.c that "b = true" implies "b" would
work, but it seems a bit brute-force because ordinarily such cases
would never be seen there, thanks to simplify_boolean_equality having
canonicalized the former into the latter. The problem we have is that
indxpath.c re-generates "b = true" in indexscan conditions. Thinking
about it now, I wonder if we could postpone that conversion till later,
say do it in create_indexscan_plan after having checked for redundant
clauses. Not sure how messy that'd be.

regards, tom lane