BUG #14608: no index scan with NOT IN and ENUM

Started by Lucas Nussbaumabout 9 years ago5 messagesbugs

lucas@lucas-nussbaum.net

about 9 years ago

The following bug has been logged on the website:

Bug reference: 14608
Logged by: Lucas Nussbaum
Email address: lucas@lucas-nussbaum.net
PostgreSQL version: 9.6.2
Operating system: Linux (Debian stretch)
Description:

Hi,

I augment pgbench's pgbench_accounts table with an additional column that
only contains '0', '1', '2', '3' or '4'.
I set the type of this column to enum('0', '1', '2', '3', '4').
I forbid null values.
I set an index on that column.

When I do:
explain analyze select * from tt where d = '0';
the index is used, as expected.

When I do:
explain analyze select * from tt where d not in ('1', '2', '3', '4');
(which should always produce the same result as the previous query)
the index is not used and a seq scan + filter is used instead.
I would expect postgresql to figure out that "not in ('1', '2', '3', '4')"
is equivalent to "d = '0'" given that the column is NOT NULL. And use the
index.

The following code demonstrates this:

# initialize database
pgbench -i -s 100 bench

# create table from pgbench data
create table tt as select *, (aid%5)::text as d from pgbench_accounts ;
# remove lots of '0' rows (this isn't relevant, in the end, but I thought it
might)
delete from tt where d='0' and aid % 100000 <> 0;
# change column type and set NOT NULL
create type dt as enum('0', '1', '2', '3', '4');
alter table tt alter column d type dt using d::dt;
alter table tt alter column d set not null;
# add index
create index tt_d ON tt(d);

# check values and frequency:
bench=# select d, count(*) from tt group by d;
d | count
---+---------
0 | 100
3 | 2000000
4 | 2000000
2 | 2000000
1 | 2000000
(5 rows)

First case: explain analyze select * from tt where d = '0';
=> index scan, OK

Second case: explain analyze select * from tt where d not in ('1', '2', '3',
'4');
=> seq scan :-(

Thanks!

Lucas

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Andrew Gierth

andrew@tao11.riddles.org.uk

about 9 years ago

In reply to: Lucas Nussbaum (#1)

Re: BUG #14608: no index scan with NOT IN and ENUM

"lucas" == lucas <lucas@lucas-nussbaum.net> writes:

lucas> I would expect postgresql to figure out that "not in ('1', '2',
lucas> '3', '4')" is equivalent to "d = '0'" given that the column is
lucas> NOT NULL. And use the index.

Well, whatever you expect, nobody has ever written code for the planner
to make it figure that out (and it doesn't seem likely to be a major
win, so probably not surprising that nobody did it).

So this is not a bug.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Andrew Gierth (#2)

Re: BUG #14608: no index scan with NOT IN and ENUM

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"lucas" == lucas <lucas@lucas-nussbaum.net> writes:
lucas> I would expect postgresql to figure out that "not in ('1', '2',
lucas> '3', '4')" is equivalent to "d = '0'" given that the column is
lucas> NOT NULL. And use the index.

Well, whatever you expect, nobody has ever written code for the planner
to make it figure that out (and it doesn't seem likely to be a major
win, so probably not surprising that nobody did it).

TBH, even if someone submitted a patch that did that, it would probably
get rejected on the grounds that it consumed too many cycles in cases
where it failed to yield a win --- which would be most of the time.

Another problem is that ALTER TYPE ADD VALUE would have to invalidate
plans depending on such a deduction. The infrastructure needed to cope
with that would add still more distributed overhead, which would be paid
by everybody whether or not they ever issued a query that could benefit.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Lucas Nussbaum

lucas@lucas-nussbaum.net

about 9 years ago

In reply to: Tom Lane (#3)

Re: BUG #14608: no index scan with NOT IN and ENUM

On 31/03/17 at 15:49 -0400, Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"lucas" == lucas <lucas@lucas-nussbaum.net> writes:
lucas> I would expect postgresql to figure out that "not in ('1', '2',
lucas> '3', '4')" is equivalent to "d = '0'" given that the column is
lucas> NOT NULL. And use the index.

Well, whatever you expect, nobody has ever written code for the planner
to make it figure that out (and it doesn't seem likely to be a major
win, so probably not surprising that nobody did it).

TBH, even if someone submitted a patch that did that, it would probably
get rejected on the grounds that it consumed too many cycles in cases
where it failed to yield a win --- which would be most of the time.

Another problem is that ALTER TYPE ADD VALUE would have to invalidate
plans depending on such a deduction. The infrastructure needed to cope
with that would add still more distributed overhead, which would be paid
by everybody whether or not they ever issued a query that could benefit.

OK, I see your point.

Note, though, that such reasoning is done for boolean values. For
example, "b <> false" seems to be converted to "b = true" if that makes
it possible to use an index. But I also see how simplifying boolean
expressions makes more sense than rewriting ENUM-based expressions.

Thanks

Lucas

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David G. Johnston

david.g.johnston@gmail.com

about 9 years ago

In reply to: Lucas Nussbaum (#4)

Re: BUG #14608: no index scan with NOT IN and ENUM

On Sat, Apr 1, 2017 at 10:16 AM, Lucas Nussbaum <lucas@lucas-nussbaum.net>
wrote:

Note, though, that such reasoning is done for boolean values. For
example, "b <> false" seems to be converted to "b = true" if that makes
it possible to use an index. But I also see how simplifying boolean
expressions makes more sense than rewriting ENUM-based expressions.

In particular one doesn't need a system catalog to know the possible
values of boolean and being only two never changing options there is an
implicit immutable negator function.

David J.