Bitmap indexes?
One of the reasons why I originally stated following the hackers list is
because I wanted to implement bitmap indexes. I found in the archives,
the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
was extracted from this,
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1, archive thread.
At any rate, that was some number of months ago. I've started looking
at the results posted from their bitmap GiST efforts and found that they
were being tested rather poorly to be of real value (I also found it
annoying that the project seems to quote other people's work without
giving credit). Nonetheless, I thought I'd post to find out if anyone
feels there is still a need for this? That is, I'm not really sure that
data warehousing or DDS systems are currently very common with Postgres.
If the group here still see value in adding various types of bitmap
support, can someone please point me to some documentation. I had
several bookmarked but lost then when X crashed. Anything that outlines
cache strategy, index support, am overview, and any other documentation
that would help excel my understanding of the code as well as the
various structure relationships would be wonderful?
Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.
Thanks,
Greg
P.S. And yes, I have been reading lots of code...including
walk-throughs! :P
Greg Copeland wrote:
Checking application/pgp-signature: FAILURE
-- Start of PGP signed section.
One of the reasons why I originally stated following the hackers list is
because I wanted to implement bitmap indexes. I found in the archives,
the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
was extracted from this,
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1, archive thread.At any rate, that was some number of months ago. I've started looking
at the results posted from their bitmap GiST efforts and found that they
were being tested rather poorly to be of real value (I also found it
annoying that the project seems to quote other people's work without
giving credit). Nonetheless, I thought I'd post to find out if anyone
feels there is still a need for this? That is, I'm not really sure that
data warehousing or DDS systems are currently very common with Postgres.If the group here still see value in adding various types of bitmap
support, can someone please point me to some documentation. I had
several bookmarked but lost then when X crashed. Anything that outlines
cache strategy, index support, am overview, and any other documentation
that would help excel my understanding of the code as well as the
various structure relationships would be wonderful?
The only thing I know is that there is discussion of bitmap indexes on
the TODO list linked to from the 'bitmap index' item. I also remember
that the intarray code in /contrib sort of simulates bitmapped indexes,
or something like that. :-)
Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.
I think we would recommend GIST because it is easier.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.I think we would recommend GIST because it is easier.
Would someone be able to explain to me exactly what GIST is? I thought it
was just a _type_ of index, but is it actually a generalised index-creating
framework? Do other DMBSs use it? Is it a cool thing I could talk about
when I give my talk at UWA tommorrow?
Chris
Here's a good spring board for information. They actually have some
pretty cool tools available to help develop. Using the libgist stuff
makes it pretty easy to prototype and play with various index schemes of
your own creation and the debugger lets you rapidly view the storage
results.
Show quoted text
On Wed, 2002-03-13 at 20:48, Christopher Kings-Lynne wrote:
Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.I think we would recommend GIST because it is easier.
Would someone be able to explain to me exactly what GIST is? I thought it
was just a _type_ of index, but is it actually a generalised index-creating
framework? Do other DMBSs use it? Is it a cool thing I could talk about
when I give my talk at UWA tommorrow?Chris
Read a collect of articles at bottom of http://www.sai.msu.su/~megera/postgres/gist/
Christopher Kings-Lynne wrote:
Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.I think we would recommend GIST because it is easier.
Would someone be able to explain to me exactly what GIST is? I thought it
was just a _type_ of index, but is it actually a generalised index-creating
framework? Do other DMBSs use it? Is it a cool thing I could talk about
when I give my talk at UWA tommorrow?Chris
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Teodor Sigaev
teodor@stack.net
Greg,
if you're still in bitmap indices you may take a look on our
contrib/intarray module.
Regards,
Oleg
On 13 Mar 2002, Greg Copeland wrote:
Show quoted text
One of the reasons why I originally stated following the hackers list is
because I wanted to implement bitmap indexes. I found in the archives,
the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
was extracted from this,
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1, archive thread.At any rate, that was some number of months ago. I've started looking
at the results posted from their bitmap GiST efforts and found that they
were being tested rather poorly to be of real value (I also found it
annoying that the project seems to quote other people's work without
giving credit). Nonetheless, I thought I'd post to find out if anyone
feels there is still a need for this? That is, I'm not really sure that
data warehousing or DDS systems are currently very common with Postgres.If the group here still see value in adding various types of bitmap
support, can someone please point me to some documentation. I had
several bookmarked but lost then when X crashed. Anything that outlines
cache strategy, index support, am overview, and any other documentation
that would help excel my understanding of the code as well as the
various structure relationships would be wonderful?Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.Thanks,
GregP.S. And yes, I have been reading lots of code...including
walk-throughs! :P
Christopher,
I'm sorry it's too late, but I haven't receive any messages from
postgres mailing lists for a month (don't know why). Just found your
message in archives.
GiST is a great thing, it's generalised search tree invented by
Hellerstein in 1995. It allows you to define custom data type,
index access to them and custom queries.
I have a little intro about GiST (not finished yest)
http://www.sai.msu.su/~megera/postgres/gist/doc/intro.html
Also, at the bottom of http://www.sai.msu.su/~megera/postgres/gist/
there are several seminal papers for reading.
Regards,
Oleg
On Thu, 14 Mar 2002, Christopher Kings-Lynne wrote:
Show quoted text
Oh yes, one last question, is the required method for adding index
support via GiST? I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.I think we would recommend GIST because it is easier.
Would someone be able to explain to me exactly what GIST is? I thought it
was just a _type_ of index, but is it actually a generalised index-creating
framework? Do other DMBSs use it? Is it a cool thing I could talk about
when I give my talk at UWA tommorrow?Chris
On Tue, 19 Mar 2002, Oleg Bartunov wrote:
Sorry to reply over you, Oleg.
On 13 Mar 2002, Greg Copeland wrote:
One of the reasons why I originally stated following the hackers list is
because I wanted to implement bitmap indexes. I found in the archives,
the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
was extracted from this,
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1, archive thread.
For every case I have used a bitmap index on Oracle, a
partial index[0]Is this a postgres-only feature; my tame Oracle and Sybase DBAs had never heard of such a thing, but were rather impressed at the idea. made more sense (especialy since it
could usefully be compound).
Our troublesome case (on Oracle) is a table of "events"
where maybe fifty to a couple of hundred are "published"
(ie. web-visible) at any time. The events are categorised
by sport (about a dozen) and by "event type" (about five).
We never really query events except by PK or by sport/type/
published.
We make a bitmap index on "published", and trust Oracle to
use it correctly, and hope that our other indexes are also
useful.
On Postgres[1]Disclaimer. Our system doesn't run on PG, though I do have a nearly equivalent prototype system which does. I'd love to hear any success (or otherwise) stories about PG partial indexes. we would make a partial compound index:
create index ... on events(sport_id,event_type_id)
where published='Y';
Matthew.
[0]: Is this a postgres-only feature; my tame Oracle and Sybase DBAs had never heard of such a thing, but were rather impressed at the idea.
Sybase DBAs had never heard of such a thing, but
were rather impressed at the idea.
[1]: Disclaimer. Our system doesn't run on PG, though I do have a nearly equivalent prototype system which does. I'd love to hear any success (or otherwise) stories about PG partial indexes.
do have a nearly equivalent prototype system which
does. I'd love to hear any success (or otherwise)
stories about PG partial indexes.
On Tue, 2002-03-19 at 15:30, Matthew Kirkwood wrote:
On Tue, 19 Mar 2002, Oleg Bartunov wrote:
Sorry to reply over you, Oleg.
On 13 Mar 2002, Greg Copeland wrote:
One of the reasons why I originally stated following the hackers list is
because I wanted to implement bitmap indexes. I found in the archives,
the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
was extracted from this,
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1, archive thread.For every case I have used a bitmap index on Oracle, a
partial index[0] made more sense (especialy since it
could usefully be compound).
That's very true, however, often bitmap indexes are used where partial
indexes may not work well. It maybe you were trying to apply the cure
for the wrong disease. ;)
Our troublesome case (on Oracle) is a table of "events"
where maybe fifty to a couple of hundred are "published"
(ie. web-visible) at any time. The events are categorised
by sport (about a dozen) and by "event type" (about five).
We never really query events except by PK or by sport/type/
published.
The reason why bitmap indexes are primarily used for DSS and data
wherehousing applications is because they are best used on extremely
large to very large tables which have low cardinality (e.g, 10,000,000
rows having 200 distinct values). On top of that, bitmap indexes also
tend to be much smaller than their "standard" cousins. On large and
very tables tables, this can sometimes save gigs in index space alone
(serious space benefit). Plus, their small index size tends to result
in much less I/O (serious speed benefit). This, of course, can result
in several orders of magnitude speed improvements when index scans are
required. As an added bonus, using AND, OR, XOR and NOT predicates are
exceptionally fast and if implemented properly, can even take advantage
of some 64-bit hardware for further speed improvements. This, of
course, further speeds look ups. The primary down side is that inserts
and updates to bitmap indexes are very costly (comparatively) which is,
yet again, why they excel in read-only environments (DSS & data
wherehousing).
It should also be noted that RDMS's, such as Oracle, often use multiple
types of bitmap indexes. This further impedes insert/update
performance, however, the additional bitmap index types usually allow
for range predicates while still making use of the bitmap index. If I'm
not mistaken, several other types of bitmaps are available as well as
many ways to encode and compress (rle, quad compression, etc) bitmap
indexes which further save on an already compact indexing scheme.
Given the proper problem domain, index bitmaps can be a big win.
We make a bitmap index on "published", and trust Oracle to
use it correctly, and hope that our other indexes are also
useful.On Postgres[1] we would make a partial compound index:
create index ... on events(sport_id,event_type_id)
where published='Y';
Generally speaking, bitmap indexes will not serve you very will on
tables having a low row counts, high cardinality or where they are
attached to tables which are primarily used in an OLTP capacity.
Situations where you have a low row count and low cardinality or high
row count and high cardinality tend to be better addressed by partial
indexes; which seem to make much more sense. In your example, it sounds
like you did "the right thing"(tm). ;)
Greg