Null values in indexes
With 7.1.3, large indexes with null values allowed in one or more of the
columns would cause crashes. (I have definitely seen this happen).
Here is a project that mentions repairs:
http://postgis.refractions.net/news/index.php?file=20020425.data
Have repairs been effected in 7.2? Are they delayed until 7.3?
Not really a followup,but this has been on my mind for some time :
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.
(IIRC postgres initially did not include nulls, but it
was added for multi-key btree indexes)
This would be a rough approximation of partial indexes
if used together with functions, i.e. the optimiser
would immediately realize that
WHERE my_ifunc(partfield) = 'header'
can use index on my_ifunc(partfield)
but my_ifunc() has an easy way of skipping indexing
overhaed for non-interesting fields by returning NULL for them.
The following seems to prove thet there is currently
no use of putting NULLS in a single-field index:
--------------------------
hannu=# create table itest (i int, n int);
CREATE
hannu=# create index itest_n_idx on itest(n);
CREATE
then I inserted 16k tuples
hannu=# insert into itest(i) select i+2 from itest;
INSERT 0 2
hannu=# insert into itest(i) select i+4 from itest;
INSERT 0 4
hannu=# insert into itest(i) select i+8 from itest;
INSERT 0 1024
...
hannu=# insert into itest(i) select i+2048 from itest;
INSERT 0 2048
hannu=# insert into itest(i) select i+4096 from itest;
INSERT 0 4096
hannu=# insert into itest(i) select i+8192 from itest;
UPDATE 16380
set most of n's to is but left 4 as NULLs
hannu=# update itest set n=1 where i>1;
UPDATE 16383
and vacuumed just in case
hannu=# vacuum analyze itest;
VACUUM
now selects for real value do use index
hannu=# explain select * from itest where n = 7;
NOTICE: QUERY PLAN:
Index Scan using itest_n_idx on itest (cost=0.00..2.01 rows=1 width=8)
but IS NULL does not.
hannu=# explain select * from itest where n is null;
NOTICE: QUERY PLAN:
Seq Scan on itest (cost=0.00..341.84 rows=16 width=8)
EXPLAIN
------------------------
Hannu
Import Notes
Reply to msg id not found: 15896.1022613872@sss.pgh.pa.us
"Dann Corbit" <DCorbit@connx.com> writes:
With 7.1.3, large indexes with null values allowed in one or more of the
columns would cause crashes. (I have definitely seen this happen).
Have repairs been effected in 7.2?
Submit a test case and we'll tell you ...
regards, tom lane
Hannu Krosing <hannu@tm.ee> writes:
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.
Seems to me that would be a step backwards.
What should someday happen is to make IS NULL an indexable operator.
The fact that we haven't got around to doing so is not a reason to
rip out the underpinnings for it.
regards, tom lane
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.Seems to me that would be a step backwards.
It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@yahoo.com> writes:
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.Seems to me that would be a step backwards.
It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.
Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.
Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?
regards, tom lane
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, May 29, 2002 9:07 AM
To: Jan Wieck
Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Null values in indexesJan Wieck <janwieck@yahoo.com> writes:
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.Seems to me that would be a step backwards.
It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?
The PostGIS people have already fixed it. However, they may not be
willing to contribute the patch. On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments. (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)
If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.
Import Notes
Resolved by subject fallback
Just to clarify:
- With respect to null safety. My understanding is the Oleg and Teodor
put support for nulls into the GiST indexing prior to the 7.2 release,
so 7.2 GiST should already be null-safe. Our project was just to take
our GiST bindings in PostGIS and update them to the new 7.2 GiST API, we
did no work on null-safety, null-safety was just one of the side
benefits we received as a result of updating our code to the 7.2 GiST
indexes.
- With respect to code contribution. If we find ourselves making changes
to the mainline PgSQL distribution we will always submit back. End of
story. All our changes have been to PostGIS itself, with the aim of
supporting 7.2. 7.2 rocks, we love it. :)
- There is one outstanding bug which we identified and Oleg and Teodor
fixed, but it is to the code in contrib/rtree, not in the mainline, and
Oleg and Teodor have already submitted that patch to Bruce. I believe
there was some unresolved discussion regarding whether to cut a 7.2.2
release including that patch and a few other housekeeping items.
Paul
Dann Corbit wrote:
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, May 29, 2002 9:07 AM
To: Jan Wieck
Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Null values in indexesJan Wieck <janwieck@yahoo.com> writes:
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.Seems to me that would be a step backwards.
It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?The PostGIS people have already fixed it. However, they may not be
willing to contribute the patch. On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments. (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.
--
__
/
| Paul Ramsey
| Refractions Research
| Email: pramsey@refractions.net
| Phone: (250) 885-0632
\_
Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls.
Actually, it appears that 7.2 GiST does handle NULLs in columns after
the first one, which I think is enough to avoid the problem Jan
mentioned. The boolean column pg_am.amindexnulls is not really
sufficient to describe this behavior accurately. Looking at current
uses it seems correct to leave it set FALSE for GiST.
In short: false alarm; the 7.2 code is okay as-is, at least on this
particular point.
regards, tom lane
Hmm... I think there is some confusion here.
Oleg and Teodor updated the GiST indexing to be null safe for postgresql 7.2.
The changes we made to PostGIS were just to allow our spacial indexing
support functions to work with the changes made in the actual GiST indexing
code (the GiST interface changed somewhat from postgresql 7.1 -> 7.2).
And for the record, I'm confident that we would submit a patch for postgresql
if something like this did come up.
Chris Hodgson
Dann Corbit <DCorbit@connx.com> said:
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, May 29, 2002 9:07 AM
To: Jan Wieck
Cc: Oleg Bartunov; Teodor Sigaev; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Null values in indexesJan Wieck <janwieck@yahoo.com> writes:
Tom Lane wrote:
Hannu Krosing <hannu@tm.ee> writes:
How hard would it be to _not_ include nulls in indexes
as they are not used anyway.Seems to me that would be a step backwards.
It would cause multi-key indexes beeing unusable for partial
key lookup. Imagine you have a key over (a, b, c) and query
with WHERE a = 1 AND b = 2. This query cannot use the index
if a NULL value in c would cause the index entry to be
suppressed.Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls. AFAIR the planner
will try to use partial qualification on any multicolumn index...
it had better avoid doing so for non-null-capable AMs.Alternatively, we could fix GiST to support nulls. Oleg, Teodor:
how far away might that be?The PostGIS people have already fixed it. However, they may not be
willing to contribute the patch. On the other hand, I think it would be
in their interest, since the source code trees will fork if they don't
and they will have trouble staying in synch with PostgreSQL
developments. (See the 7.2 index project here:
http://postgis.refractions.net/
http://postgis.refractions.net/news/index.php?file=20020425.data
)If they are not willing to commit a patch, I suspect that they will at
least tell you what they had to do to fix it and it could be performed
internally.
--
Glad to hear GiST in 7.2 isn't broken :-)
We miss the topic, what was the problem ?
Do we need to fix GiST code for 7.3 ?
proposal for null-safe GiST interface is available
http://fts.postgresql.org/db/mw/msg.html?mid=1028327
and discussion
http://fts.postgresql.org/db/mw/msg.html?mid=1025848
Regards,
Oleg
On Wed, 29 May 2002, Tom Lane wrote:
Urgh ... that means GiST indexing is actually broken, because GiST
currently handles multicolumns but not nulls.Actually, it appears that 7.2 GiST does handle NULLs in columns after
the first one, which I think is enough to avoid the problem Jan
mentioned. The boolean column pg_am.amindexnulls is not really
sufficient to describe this behavior accurately. Looking at current
uses it seems correct to leave it set FALSE for GiST.In short: false alarm; the 7.2 code is okay as-is, at least on this
particular point.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83