BUG #14676: neqsel is NULL dumb
The following bug has been logged on the website:
Bug reference: 14676
Logged by: Marko Tiikkaja
Email address: marko@joh.to
PostgreSQL version: 9.6.3
Operating system: Linux
Description:
I'm having an issue with a case where a column is mostly NULLs and I'm doing
an inequality query on the column:
=# create table foo(nullable int);
CREATE TABLE
=# insert into foo select case when i = 1 then i else null end from
generate_series(1, 1000) gs(i);
INSERT 0 1000
=# analyze foo;
ANALYZE
=# explain select * from foo where nullable <> 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on foo (cost=0.00..16.50 rows=999 width=4)
Filter: (nullable <> 1)
(2 rows)
This seems to be because neqsel() doesn't take at all into account that both
operators will exclude NULL rows, and does a simple 1.0 - eqsel(). This
also means that a partial index such as:
create index on foo(othercolumn) where nullable <> 1
will never be used.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, May 29, 2017 at 4:38 PM, <marko@joh.to> wrote:
The following bug has been logged on the website:
Bug reference: 14676
Logged by: Marko Tiikkaja
Email address: marko@joh.to
PostgreSQL version: 9.6.3
Operating system: Linux
Description:I'm having an issue with a case where a column is mostly NULLs and I'm
doing
an inequality query on the column:=# create table foo(nullable int);
CREATE TABLE=# insert into foo select case when i = 1 then i else null end from
generate_series(1, 1000) gs(i);
INSERT 0 1000=# analyze foo;
ANALYZE=# explain select * from foo where nullable <> 1;
QUERY PLAN
------------------------------------------------------
Seq Scan on foo (cost=0.00..16.50 rows=999 width=4)
Filter: (nullable <> 1)
(2 rows)This seems to be because neqsel() doesn't take at all into account that
both
operators will exclude NULL rows, and does a simple 1.0 - eqsel(). This
also means that a partial index such as:create index on foo(othercolumn) where nullable <> 1
will never be used.
Since you say that the majority of rows have NULL in nullable, I would try
a partial index with: WHERE (nullable IS NOT NULL)
create table foo(nullable int, a text);
create index fff on foo(nullable, a) where nullable is not null ;
insert into foo --- 12K rows ;
explain analyze select nullable, a from foo where nullable <> 1 ;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=11.23..55.35 rows=11997 width=20) (actual
time=0.040..0.048 rows=16 loops=1)
Recheck Cond: (nullable IS NOT NULL)
Filter: (nullable <> 1)
Rows Removed by Filter: 3
Heap Blocks: exact=3
-> Bitmap Index Scan on fff (cost=0.00..8.23 rows=19 width=0) (actual
time=0.018..0.018 rows=19 loops=1)
Planning time: 0.117 ms
Execution time: 0.081 ms
(8 rows)
Show quoted text
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, May 29, 2017 at 7:08 PM, Pantelis Theodosiou <ypercube@gmail.com>
wrote:
Since you say that the majority of rows have NULL in nullable, I would try
a partial index with: WHERE (nullable IS NOT NULL)
Thanks, but I've already implemented a workaround. I'm more interested in
getting the actual problem fixed.
.m
marko@joh.to writes:
This seems to be because neqsel() doesn't take at all into account that both
operators will exclude NULL rows, and does a simple 1.0 - eqsel().
Yeah, that's clearly broken. A localized fix would be to re-fetch the
nullfrac statistic and subtract it off, but that seems pretty inefficient.
I'd be inclined to refactor things so that eqsel() and neqsel() call a
common routine that takes a "bool negate" argument, about like the way the
patternsel() functions work, and then the common routine could handle
the nullfrac correctly for both cases.
Hmm ... actually, I think patternsel() is broken for this case too ---
it has the information to do the right thing, but it doesn't look like
it actually is doing it:
return negate ? (1.0 - result) : result;
should be more like
return negate ? (1.0 - result - nullfrac) : result;
neqjoinsel has a similar issue, and I'm not sure what else.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I wrote:
marko@joh.to writes:
This seems to be because neqsel() doesn't take at all into account that both
operators will exclude NULL rows, and does a simple 1.0 - eqsel().
Yeah, that's clearly broken. A localized fix would be to re-fetch the
nullfrac statistic and subtract it off, but that seems pretty inefficient.
I'd be inclined to refactor things so that eqsel() and neqsel() call a
common routine that takes a "bool negate" argument, about like the way the
patternsel() functions work, and then the common routine could handle
the nullfrac correctly for both cases.
Here's a proposed patch for that. I also fixed patternsel, but elected
not to mess with neqjoinsel; partly because I think joining on inequality
is so rare as to not be worth sweating over, and partly because I wasn't
too sure what's the appropriate correction, especially for semijoins.
Although this is clearly a bug fix, I'm leaning towards committing it
only in HEAD; given the lack of previous field complaints, I fear
back-patching might yield more complaints about destabilizing plans
than kudos for fixing things.
regards, tom lane