BUG #17618: unnecessary filter column <> text even after adding index

Started by PG Bug reporting formover 3 years ago21 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17618
Logged by: Sindy Senorita
Email address: sindysenorita@gmail.com
PostgreSQL version: 13.7
Operating system: Ubuntu
Description:

Hi, I'm not sure if this is a bug or feature, but definitely not what I've
expected

So I have a table with "status" column which can contains 'valid',
'invalid', 'pending', 'unknown'.
A very simple table

CREATE TABLE public.test (
id varchar NOT NULL,
status varchar NOT NULL,
CONSTRAINT test__pkey PRIMARY KEY (id)
)
CREATE INDEX pending_test_4 ON public.test USING btree ((((status)::text <>
'invalid'::text)));

notice that I've created an index to guide statuses that is not 'invalid
my query is:
SELECT * FROM test WHERE status != 'invalid'

When I run explain analyze on that with SET enable_seqscan = off, I got
QUERY PLAN
|
------------------------------------------------------------------------------------------------------------------------+
Bitmap Heap Scan on test (cost=4.62..8.37 rows=120 width=160) (actual
time=0.088..0.134 rows=117 loops=1) |
Filter: ((status)::text <> 'invalid'::text)
|
Heap Blocks: exact=3
|
-> Bitmap Index Scan on pending_test_4 (cost=0.00..4.59 rows=60 width=0)
(actual time=0.073..0.073 rows=117 loops=1)|
Index Cond: (((status)::text <> 'invalid'::text) = true)
|
Planning Time: 0.222 ms
|
Execution Time: 0.172 ms
|

The plan has used the index condition just right, but it still perform
aditional bitmap heap scan just to filter for a clause that exactly match
the index. And worse, it double the query cost
My questions are:
1. Is this a bug? or intended feature by design? If it is by design, I'd be
very happy to learn the rationale behind it.
2. Is there any way to skip/avoid the additional bitmap scan?
3. Could there be a better solution for my query. Suppose that the variants
of the status is unknown so query SELECT .. WHERE STATUS IN (all status
beside 'invalid') is not possible

Many thanks!
Sindy

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #17618: unnecessary filter column <> text even after adding index

PG Bug reporting form <noreply@postgresql.org> writes:

When I run explain analyze on that with SET enable_seqscan = off, I got
QUERY PLAN
|
------------------------------------------------------------------------------------------------------------------------+
Bitmap Heap Scan on test (cost=4.62..8.37 rows=120 width=160) (actual
time=0.088..0.134 rows=117 loops=1) |
Filter: ((status)::text <> 'invalid'::text)
|
Heap Blocks: exact=3
|
-> Bitmap Index Scan on pending_test_4 (cost=0.00..4.59 rows=60 width=0)
(actual time=0.073..0.073 rows=117 loops=1)|
Index Cond: (((status)::text <> 'invalid'::text) = true)
|
Planning Time: 0.222 ms
|
Execution Time: 0.172 ms
|

This is exactly what is expected; it's not a bug.

The plan has used the index condition just right, but it still perform
aditional bitmap heap scan just to filter for a clause that exactly match
the index. And worse, it double the query cost

The filter condition is required because the bitmap produced by the index
can be lossy, ie it might identify more rows than actually satisfy the
condition. BitmapHeapNext will only actually apply the condition if
the index reports that that happened, so in practice for this sort of
query the filter condition probably never gets rechecked.

The "doubled cost" has nothing whatever to do with the filter condition;
most of that is concerned with the number of disk pages touched. It
might help you to read

https://www.postgresql.org/docs/current/using-explain.html

regards, tom lane

#3Sindy Senorita
sindysenorita@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #17618: unnecessary filter column <> text even after adding index

I see, quick google search takes me to BitmapHeapNext implementation here
https://doxygen.postgresql.org/nodeBitmapHeapscan_8c_source.html#l00072. I
hope this is what you mean
Noted. Thanks for the explanation

Cheers

On Mon, Sep 19, 2022 at 10:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Bug reporting form <noreply@postgresql.org> writes:

When I run explain analyze on that with SET enable_seqscan = off, I got
QUERY PLAN

|

------------------------------------------------------------------------------------------------------------------------+

Bitmap Heap Scan on test (cost=4.62..8.37 rows=120 width=160) (actual
time=0.088..0.134 rows=117 loops=1) |
Filter: ((status)::text <> 'invalid'::text)

|
Heap Blocks: exact=3

|
-> Bitmap Index Scan on pending_test_4 (cost=0.00..4.59 rows=60

width=0)

(actual time=0.073..0.073 rows=117 loops=1)|
Index Cond: (((status)::text <> 'invalid'::text) = true)

|
Planning Time: 0.222 ms

|
Execution Time: 0.172 ms

|

This is exactly what is expected; it's not a bug.

The plan has used the index condition just right, but it still perform
aditional bitmap heap scan just to filter for a clause that exactly match
the index. And worse, it double the query cost

The filter condition is required because the bitmap produced by the index
can be lossy, ie it might identify more rows than actually satisfy the
condition. BitmapHeapNext will only actually apply the condition if
the index reports that that happened, so in practice for this sort of
query the filter condition probably never gets rechecked.

The "doubled cost" has nothing whatever to do with the filter condition;
most of that is concerned with the number of disk pages touched. It
might help you to read

https://www.postgresql.org/docs/current/using-explain.html

regards, tom lane

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Mon, Sep 19, 2022 at 8:15 AM PG Bug reporting form <
noreply@postgresql.org> wrote:

CREATE TABLE public.test (
id varchar NOT NULL,
status varchar NOT NULL,
CONSTRAINT test__pkey PRIMARY KEY (id)
)

CREATE INDEX pending_test_4 ON public.test USING btree ((((status)::text <>
'invalid'::text)));

notice that I've created an index to guide statuses that is not 'invalid
my query is:
SELECT * FROM test WHERE status != 'invalid'

Your index contains none of the fields in the original table so the system
can never answer your inquiry using only the index.

You may find this to be informative:

https://www.postgresql.org/docs/devel/indexes-index-only-scans.html

Usually on a "status" field doing a few partial indexes gets you the best
result. The more statuses you need to be concerned about the more likely
just scanning the table is going to win out in performance. But if you do
only care about a few the smaller index size will be of benefit to keep
them in memory. A covering index may be of use as well though for rapidly
changing statuses tuple visibility is going to be a challenge. In short,
you seem to be providing a non-real situation and asking for advice that is
situational in nature.

David J.

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Mon, Sep 19, 2022 at 11:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The plan has used the index condition just right, but it still perform
aditional bitmap heap scan just to filter for a clause that exactly match
the index. And worse, it double the query cost

The filter condition is required because the bitmap produced by the index
can be lossy, ie it might identify more rows than actually satisfy the
condition. BitmapHeapNext will only actually apply the condition if
the index reports that that happened, so in practice for this sort of
query the filter condition probably never gets rechecked.

You are describing a Recheck, but attributing its properties to a Filter.
I think that the filter condition always gets checked. It does so if I
create a tattler function which raises a notice every time it is called and
then build an index on a Boolean expression over that function (but of
course that inevitably does change the code paths a bit).

I don't know about being a bug, but it is at least a mild mal-feature that
boolean index columns/expressions can't be dealt with better, and have to
be handed in a filter rather than in a recheck.

Cheers,

Jeff

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#5)
Re: BUG #17618: unnecessary filter column <> text even after adding index

Jeff Janes <jeff.janes@gmail.com> writes:

You are describing a Recheck, but attributing its properties to a Filter.

You're right of course, momentary brain fade on my part.

I don't know about being a bug, but it is at least a mild mal-feature that
boolean index columns/expressions can't be dealt with better, and have to
be handed in a filter rather than in a recheck.

Yeah ... looking at create_bitmap_scan_plan, I see that it does this:

/*
* When dealing with special operators, we will at this point have
* duplicate clauses in qpqual and bitmapqualorig. We may as well drop
* 'em from bitmapqualorig, since there's no point in making the tests
* twice.
*/
bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual);

I wonder if that isn't backwards, ie we should prefer to put duplicates
in bitmapqualorig (the recheck condition) instead of qpqual (the filter).
If my head is screwed on correctly today, that should allow us to skip
checking the condition much of the time, and the skip would be safe
if the index is correctly asserting that no recheck is needed.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: BUG #17618: unnecessary filter column <> text even after adding index

I wrote:

I wonder if that isn't backwards, ie we should prefer to put duplicates
in bitmapqualorig (the recheck condition) instead of qpqual (the filter).
If my head is screwed on correctly today, that should allow us to skip
checking the condition much of the time, and the skip would be safe
if the index is correctly asserting that no recheck is needed.

Flipping the removal around has the effect I expected on the plan shape,
but some of the regression test queries now give the wrong answer, so
there's something faulty about that analysis.

regards, tom lane

#8Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#7)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Wed, Sep 21, 2022 at 7:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Flipping the removal around has the effect I expected on the plan shape,
but some of the regression test queries now give the wrong answer, so
there's something faulty about that analysis.

I think we may have a minor mistake when constructing the qpqual list in
create_bitmap_scan_plan. The qpqual list is supposed to be scan_clauses
minus indexquals. So we check each scan clause to see if it is redundant
with any indexqual, by using equal, checking EC or using
predicate_implied_by.

Note that the indexqual here may not be the form that has been going
through constant folding. Such as in this case with a boolean index, the
indexqual would be converted to 'indexkey expression = TRUE' by
match_boolean_index_clause. And that may make us fail to tell the scan
clause is redundant.

The comment of predicate_implied_by() says

* The top-level List structure of each list corresponds to an AND list.
* We assume that eval_const_expressions() has been applied and so there
* are no un-flattened ANDs or ORs (e.g., no AND immediately within an AND,
* including AND just below the top-level List structure).

So I think we need to run eval_const_expressions on indexquals before we
check for duplicate clauses, something like attached.

Thanks
Richard

Attachments:

v1-0001-constant-folding-for-indexquals-in-bitmap-scan.patchapplication/octet-stream; name=v1-0001-constant-folding-for-indexquals-in-bitmap-scan.patchDownload+6-1
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#8)
Re: BUG #17618: unnecessary filter column <> text even after adding index

Richard Guo <guofenglinux@gmail.com> writes:

So I think we need to run eval_const_expressions on indexquals before we
check for duplicate clauses, something like attached.

[ squint... ] Surely that was done long before we ever get here?

regards, tom lane

#10Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#9)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Fri, Sep 23, 2022 at 10:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

So I think we need to run eval_const_expressions on indexquals before we
check for duplicate clauses, something like attached.

[ squint... ] Surely that was done long before we ever get here?

We should have already done that long before. It seems afterwards we may
do additional transformation on indexquals. In this case with a boolean
index, I can see we convert the indexqual to form 'indexkey = TRUE' in
match_boolean_index_clause.

Thanks
Richard

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#10)
Re: BUG #17618: unnecessary filter column <> text even after adding index

Richard Guo <guofenglinux@gmail.com> writes:

On Fri, Sep 23, 2022 at 10:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

So I think we need to run eval_const_expressions on indexquals before we
check for duplicate clauses, something like attached.

[ squint... ] Surely that was done long before we ever get here?

We should have already done that long before. It seems afterwards we may
do additional transformation on indexquals. In this case with a boolean
index, I can see we convert the indexqual to form 'indexkey = TRUE' in
match_boolean_index_clause.

Of course, but what about that transformation would introduce something
that eval_const_expressions could simplify? (Actually, now that I think
about it, I think eval_const_expressions would break it completely because
it'd re-canonicalize the expression as just 'indexkey', exactly what we
don't want here.) In any case, if there's something between the
eval_const_expressions pass and createplan.c that introduces simplifiable
expressions, I think it's on that something's head to re-simplify; we
don't want to do something so expensive in a main code path if it's
usually going to be a complete waste.

regards, tom lane

#12Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#8)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Fri, Sep 23, 2022 at 9:29 PM Richard Guo <guofenglinux@gmail.com> wrote:

So I think we need to run eval_const_expressions on indexquals before we
check for duplicate clauses, something like attached.

BTW, (revise to the v1 patch), if this is the right way to go, we should
do that before the foreach loop, so that we need to run
eval_const_expressions on indexquals only once rather than for each scan
clause.

Thanks
Richard

#13Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#11)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Sat, Sep 24, 2022 at 8:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

We should have already done that long before. It seems afterwards we may
do additional transformation on indexquals. In this case with a boolean
index, I can see we convert the indexqual to form 'indexkey = TRUE' in
match_boolean_index_clause.

Of course, but what about that transformation would introduce something
that eval_const_expressions could simplify? (Actually, now that I think
about it, I think eval_const_expressions would break it completely because
it'd re-canonicalize the expression as just 'indexkey', exactly what we
don't want here.) In any case, if there's something between the
eval_const_expressions pass and createplan.c that introduces simplifiable
expressions, I think it's on that something's head to re-simplify; we
don't want to do something so expensive in a main code path if it's
usually going to be a complete waste.

Yeah, I agree that running eval_const_expressions here is expensive.
Maybe we can just do the reverse transformation in
create_bitmap_scan_plan against what we do for boolean index in
match_boolean_index_clause?

I think it's necessary to re-simplify the indexquals here, otherwise we
may fail to compare scan_clauses to indexquals correctly.

Thanks
Richard

#14Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#13)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Sat, Sep 24, 2022 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Sat, Sep 24, 2022 at 8:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

We should have already done that long before. It seems afterwards we may
do additional transformation on indexquals. In this case with a boolean
index, I can see we convert the indexqual to form 'indexkey = TRUE' in
match_boolean_index_clause.

Of course, but what about that transformation would introduce something
that eval_const_expressions could simplify? (Actually, now that I think
about it, I think eval_const_expressions would break it completely because
it'd re-canonicalize the expression as just 'indexkey', exactly what we
don't want here.) In any case, if there's something between the
eval_const_expressions pass and createplan.c that introduces simplifiable
expressions, I think it's on that something's head to re-simplify; we
don't want to do something so expensive in a main code path if it's
usually going to be a complete waste.

Yeah, I agree that running eval_const_expressions here is expensive.
Maybe we can just do the reverse transformation in
create_bitmap_scan_plan against what we do for boolean index in
match_boolean_index_clause?

Following this idea, I come up with v2 patch. Is this the right
direction to go?

Thanks
Richard

Attachments:

v2-0001-constant-folding-for-indexquals-in-bitmap-scan.patchapplication/octet-stream; name=v2-0001-constant-folding-for-indexquals-in-bitmap-scan.patchDownload+38-9
#15Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#14)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Mon, Sep 26, 2022 at 7:42 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Sat, Sep 24, 2022 at 8:41 AM Richard Guo <guofenglinux@gmail.com>
wrote:

Yeah, I agree that running eval_const_expressions here is expensive.
Maybe we can just do the reverse transformation in
create_bitmap_scan_plan against what we do for boolean index in
match_boolean_index_clause?

Following this idea, I come up with v2 patch. Is this the right
direction to go?

Update with v3 patch, nothing changes except fixes a test failure
spotted by cfbot.

Thanks
Richard

Attachments:

v3-0001-constant-folding-for-indexquals-in-bitmap-scan.patchapplication/octet-stream; name=v3-0001-constant-folding-for-indexquals-in-bitmap-scan.patchDownload+39-10
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#15)
Re: BUG #17618: unnecessary filter column <> text even after adding index

Richard Guo <guofenglinux@gmail.com> writes:

Update with v3 patch, nothing changes except fixes a test failure
spotted by cfbot.

I think this is pretty close to usable, except that I don't believe
reusing simplify_boolean_equality this way is a great idea.
It does more than we need (surely the LHS-is-Const case cannot occur here)
and it has assumptions that I'm not sure hold --- particularly the bit
about !constisnull. I'd be inclined to just copy-and-paste the three or
four lines we need.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: BUG #17618: unnecessary filter column <> text even after adding index

I wrote:

I wonder if that isn't backwards, ie we should prefer to put duplicates
in bitmapqualorig (the recheck condition) instead of qpqual (the filter).
If my head is screwed on correctly today, that should allow us to skip
checking the condition much of the time, and the skip would be safe
if the index is correctly asserting that no recheck is needed.

Flipping the removal around has the effect I expected on the plan shape,
but some of the regression test queries now give the wrong answer, so
there's something faulty about that analysis.

BTW, after looking more closely I see my mistake. An example of the
sort of plan that fails with that change is

  Sort
    Sort Key: proname
    ->  Bitmap Heap Scan on pg_proc
-         Filter: (proname ~~ 'RI\_FKey%del'::text)
+         Recheck Cond: (proname ~~ 'RI\_FKey%del'::text)
          ->  Bitmap Index Scan on pg_proc_proname_args_nsp_index
                Index Cond: ((proname >= 'RI_FKey'::text) AND (proname < 'RI_FKez'::text))
 (6 rows)

The difficulty here is pretty obvious: the original clause is stricter
than the index conditions generated from it. So even if the index
enforces the index conditions exactly, we still need to check the
original clause, and so it can't be relegated to the recheck field.
To improve this, we'd need to track which elements of bitmapqualorig
correspond exactly to index conditions, which we don't do ATM.

regards, tom lane

#18Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#16)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Sun, Nov 6, 2022 at 1:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

Update with v3 patch, nothing changes except fixes a test failure
spotted by cfbot.

I think this is pretty close to usable, except that I don't believe
reusing simplify_boolean_equality this way is a great idea.
It does more than we need (surely the LHS-is-Const case cannot occur here)
and it has assumptions that I'm not sure hold --- particularly the bit
about !constisnull. I'd be inclined to just copy-and-paste the three or
four lines we need.

Thanks for the suggestion. Yes, simplify_boolean_equality is doing more
than we need. I think here we just intend to handle indexquals of form
"indexkey = true/false", which seems can only come out from function
match_boolean_index_clause. From what this function does, we are sure
the constant input can be only on right (as you pointed out), and the
operator can only be BooleanEqualOperator. Also it seems the assumption
about !constisnull holds, as match_boolean_index_clause would not make a
clause with a constant-NULL input.

I've updated the patch according to the suggestions as in v4. Thanks
for reviewing this patch!

Thanks
Richard

Attachments:

v4-0001-constant-folding-for-indexquals-in-bitmap-scan.patchapplication/octet-stream; name=v4-0001-constant-folding-for-indexquals-in-bitmap-scan.patchDownload+46-4
#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#18)
Re: BUG #17618: unnecessary filter column <> text even after adding index

Richard Guo <guofenglinux@gmail.com> writes:

I've updated the patch according to the suggestions as in v4. Thanks
for reviewing this patch!

I was about ready to commit this when I re-read your initial comment
and realized that there's a second way to fix it. We can improve
predtest.c so that it understands that "x = true" implies "x" and
so on, whereupon the existing logic in create_bitmap_scan_plan
handles the case correctly. This is pretty nearly the same code as
in your v4, except that it's in a considerably less hot code path, plus
there's at least some chance that it could be useful for other purposes.
So I think I like this way better. Thoughts?

regards, tom lane

Attachments:

v5-0001-constant-folding-for-indexquals-in-bitmap-scan.patchtext/x-diff; charset=us-ascii; name*0=v5-0001-constant-folding-for-indexquals-in-bitmap-scan.patc; name*1=hDownload+60-2
#20Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#19)
Re: BUG #17618: unnecessary filter column <> text even after adding index

On Tue, Nov 8, 2022 at 5:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

I've updated the patch according to the suggestions as in v4. Thanks
for reviewing this patch!

I was about ready to commit this when I re-read your initial comment
and realized that there's a second way to fix it. We can improve
predtest.c so that it understands that "x = true" implies "x" and
so on, whereupon the existing logic in create_bitmap_scan_plan
handles the case correctly. This is pretty nearly the same code as
in your v4, except that it's in a considerably less hot code path, plus
there's at least some chance that it could be useful for other purposes.
So I think I like this way better. Thoughts?

It works for me. predtest.c is a more common place so that there may be
other cases that can benefit from this change. Thanks for the new
patch!

Thanks
Richard

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#20)