Allowing x IS NOT NULL as a btree search condition

Started by Tom Laneover 16 years ago2 messageshackers

tgl@sss.pgh.pa.us

over 16 years ago

Back when we put in the ability to use "x IS NULL" as a btree search
condition, we intentionally left out "x IS NOT NULL", on the grounds
that it is comparable to "x <> something" which is not btree-searchable
either. However, it occurs to me that we missed a bet here. The NOT
NULL condition could instead be treated like "x is less than NULL"
(in a NULLS LAST index) or "x is greater than NULL" (in a NULLS FIRST
index), which would make it work like a searchable inequality. It's
still true that except in the case of a mostly-null column, it would
seldom be worth doing such an indexscan. However, I can see an
application where an index search condition like this would be highly
worthwhile: namely, trying to extract a column min or max. Right now,
if there are a fair number of nulls at the end of the index you're
interested in, you have to stupidly scan through them --- but if the
btree code knew about doing this, it could descend the tree
intelligently and land right on the first or last non-null.

We have already seen field complaints about the performance of
index-optimized MAX in cases with many nulls, eg
http://archives.postgresql.org/pgsql-performance/2006-12/msg00099.php
which fixing this would take care of. This would also affect the
usefulness of the idea I proposed earlier today about automatically
updating the histogram bin boundaries when trying to estimate inequality
selectivity for comparison values near the ends of the range --- if we
can't rely on the index lookup for max or min to be cheap, doing that
stops looking quite so attractive.

While I haven't tried to code this yet, I'm guessing that it's just a
very small addition to the logic we already have. Any objections to
fixing this up?

regards, tom lane

Bruce Momjian

bruce@momjian.us

over 16 years ago

In reply to: Tom Lane (#1)

Re: Allowing x IS NOT NULL as a btree search condition

Tom Lane wrote:

Back when we put in the ability to use "x IS NULL" as a btree search
condition, we intentionally left out "x IS NOT NULL", on the grounds
that it is comparable to "x <> something" which is not btree-searchable
either. However, it occurs to me that we missed a bet here. The NOT
NULL condition could instead be treated like "x is less than NULL"
(in a NULLS LAST index) or "x is greater than NULL" (in a NULLS FIRST
index), which would make it work like a searchable inequality. It's
still true that except in the case of a mostly-null column, it would
seldom be worth doing such an indexscan. However, I can see an
application where an index search condition like this would be highly
worthwhile: namely, trying to extract a column min or max. Right now,
if there are a fair number of nulls at the end of the index you're
interested in, you have to stupidly scan through them --- but if the
btree code knew about doing this, it could descend the tree
intelligently and land right on the first or last non-null.

We have already seen field complaints about the performance of
index-optimized MAX in cases with many nulls, eg
http://archives.postgresql.org/pgsql-performance/2006-12/msg00099.php
which fixing this would take care of. This would also affect the
usefulness of the idea I proposed earlier today about automatically
updating the histogram bin boundaries when trying to estimate inequality
selectivity for comparison values near the ends of the range --- if we
can't rely on the index lookup for max or min to be cheap, doing that
stops looking quite so attractive.

While I haven't tried to code this yet, I'm guessing that it's just a
very small addition to the logic we already have. Any objections to
fixing this up?

Sounds good to me.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +