Null values in indexes

Started by Dann Corbitover 23 years ago12 messages
#1Dann Corbit
DCorbit@connx.com

With 7.1.3, large indexes with null values allowed in one or more of the
columns would cause crashes. (I have definitely seen this happen).

Here is a project that mentions repairs:
http://postgis.refractions.net/news/index.php?file=20020425.data

Have repairs been effected in 7.2? Are they delayed until 7.3?

#2Hannu Krosing
hannu@tm.ee
In reply to: Dann Corbit (#1)
Re: Null values in indexes

Not really a followup,but this has been on my mind for some time :

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

(IIRC postgres initially did not include nulls, but it
was added for multi-key btree indexes)

This would be a rough approximation of partial indexes
if used together with functions, i.e. the optimiser
would immediately realize that

WHERE my_ifunc(partfield) = 'header'

can use index on my_ifunc(partfield)

but my_ifunc() has an easy way of skipping indexing
overhaed for non-interesting fields by returning NULL for them.

The following seems to prove thet there is currently
no use of putting NULLS in a single-field index:

--------------------------

hannu=# create table itest (i int, n int);
CREATE
hannu=# create index itest_n_idx on itest(n);
CREATE

then I inserted 16k tuples

hannu=# insert into itest(i) select i+2 from itest;
INSERT 0 2
hannu=# insert into itest(i) select i+4 from itest;
INSERT 0 4
hannu=# insert into itest(i) select i+8 from itest;
INSERT 0 1024
...
hannu=# insert into itest(i) select i+2048 from itest;
INSERT 0 2048
hannu=# insert into itest(i) select i+4096 from itest;
INSERT 0 4096
hannu=# insert into itest(i) select i+8192 from itest;
UPDATE 16380

set most of n's to is but left 4 as NULLs

hannu=# update itest set n=1 where i>1;
UPDATE 16383

and vacuumed just in case

hannu=# vacuum analyze itest;
VACUUM

now selects for real value do use index

hannu=# explain select * from itest where n = 7;
NOTICE: QUERY PLAN:

Index Scan using itest_n_idx on itest (cost=0.00..2.01 rows=1 width=8)

but IS NULL does not.

hannu=# explain select * from itest where n is null;
NOTICE: QUERY PLAN:

Seq Scan on itest (cost=0.00..341.84 rows=16 width=8)

EXPLAIN

------------------------
Hannu

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#1)
Re: Null values in indexes

"Dann Corbit" <DCorbit@connx.com> writes:

With 7.1.3, large indexes with null values allowed in one or more of the
columns would cause crashes. (I have definitely seen this happen).
Have repairs been effected in 7.2?

Submit a test case and we'll tell you ...

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#2)
Re: Null values in indexes

Hannu Krosing <hannu@tm.ee> writes:

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

Seems to me that would be a step backwards.

What should someday happen is to make IS NULL an indexable operator.
The fact that we haven't got around to doing so is not a reason to
rip out the underpinnings for it.

regards, tom lane

#5Jan Wieck
janwieck@yahoo.com
In reply to: Tom Lane (#4)
Re: Null values in indexes

Tom Lane wrote:

Hannu Krosing <hannu@tm.ee> writes:

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

Seems to me that would be a step backwards.

It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#5)
Re: Null values in indexes

Jan Wieck <janwieck@yahoo.com> writes:

Tom Lane wrote:

Hannu Krosing <hannu@tm.ee> writes:

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

Seems to me that would be a step backwards.

It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.

Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?

regards, tom lane

#7Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#6)
Re: Null values in indexes

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, May 29, 2002 9:07 AM
To: Jan Wieck
Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Null values in indexes

Jan Wieck <janwieck@yahoo.com> writes:

Tom Lane wrote:

Hannu Krosing <hannu@tm.ee> writes:

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

Seems to me that would be a step backwards.

It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.

Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?

The PostGIS people have already fixed it. However, they may not be
willing to contribute the patch. On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments. (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)

If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.

#8Paul Ramsey
pramsey@refractions.net
In reply to: Dann Corbit (#7)
Re: Null values in indexes

Just to clarify:

- With respect to null safety. My understanding is the Oleg and Teodor
put support for nulls into the GiST indexing prior to the 7.2 release,
so 7.2 GiST should already be null-safe. Our project was just to take
our GiST bindings in PostGIS and update them to the new 7.2 GiST API, we
did no work on null-safety, null-safety was just one of the side
benefits we received as a result of updating our code to the 7.2 GiST
indexes.

- With respect to code contribution. If we find ourselves making changes
to the mainline PgSQL distribution we will always submit back. End of
story. All our changes have been to PostGIS itself, with the aim of
supporting 7.2. 7.2 rocks, we love it. :)

- There is one outstanding bug which we identified and Oleg and Teodor
fixed, but it is to the code in contrib/rtree, not in the mainline, and
Oleg and Teodor have already submitted that patch to Bruce. I believe
there was some unresolved discussion regarding whether to cut a 7.2.2
release including that patch and a few other housekeeping items.

Paul

Dann Corbit wrote:

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, May 29, 2002 9:07 AM
To: Jan Wieck
Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Null values in indexes

Jan Wieck <janwieck@yahoo.com> writes:

Tom Lane wrote:

Hannu Krosing <hannu@tm.ee> writes:

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

Seems to me that would be a step backwards.

It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.

Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?

The PostGIS people have already fixed it. However, they may not be
willing to contribute the patch. On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments. (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)

If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.

--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey@refractions.net
| Phone: (250) 885-0632
\_

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dann Corbit (#7)
Re: Null values in indexes

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls.

Actually, it appears that 7.2 GiST does handle NULLs in columns after
the first one, which I think is enough to avoid the problem Jan
mentioned. The boolean column pg_am.amindexnulls is not really
sufficient to describe this behavior accurately. Looking at current
uses it seems correct to leave it set FALSE for GiST.

In short: false alarm; the 7.2 code is okay as-is, at least on this
particular point.

regards, tom lane

#10Chris Hodgson
chodgson@refractions.net
In reply to: Dann Corbit (#7)
Re: Null values in indexes

Hmm... I think there is some confusion here.

Oleg and Teodor updated the GiST indexing to be null safe for postgresql 7.2.
The changes we made to PostGIS were just to allow our spacial indexing
support functions to work with the changes made in the actual GiST indexing
code (the GiST interface changed somewhat from postgresql 7.1 -> 7.2).

And for the record, I'm confident that we would submit a patch for postgresql
if something like this did come up.

Chris Hodgson

Dann Corbit <DCorbit@connx.com> said:

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, May 29, 2002 9:07 AM
To: Jan Wieck
Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Null values in indexes

Jan Wieck <janwieck@yahoo.com> writes:

Tom Lane wrote:

Hannu Krosing <hannu@tm.ee> writes:

How hard would it be to _not_ include nulls in indexes
as they are not used anyway.

Seems to me that would be a step backwards.

It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.

Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?

The PostGIS people have already fixed it. However, they may not be
willing to contribute the patch. On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments. (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)

If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.

--

#11Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#9)
Re: Null values in indexes

Glad to hear GiST in 7.2 isn't broken :-)
We miss the topic, what was the problem ?
Do we need to fix GiST code for 7.3 ?

proposal for null-safe GiST interface is available
http://fts.postgresql.org/db/mw/msg.html?mid=1028327
and discussion
http://fts.postgresql.org/db/mw/msg.html?mid=1025848

Regards,
Oleg
On Wed, 29 May 2002, Tom Lane wrote:

Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls.

Actually, it appears that 7.2 GiST does handle NULLs in columns after
the first one, which I think is enough to avoid the problem Jan
mentioned. The boolean column pg_am.amindexnulls is not really
sufficient to describe this behavior accurately. Looking at current
uses it seems correct to leave it set FALSE for GiST.

In short: false alarm; the 7.2 code is okay as-is, at least on this
particular point.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#11)
Re: Null values in indexes

Oleg Bartunov <oleg@sai.msu.su> writes:

Do we need to fix GiST code for 7.3 ?

No, I think it's fine. I had forgotten that old discussion ...

regards, tom lane