plans for bitmap indexes?

Started by Yann Michelover 21 years ago59 messageshackers
Jump to latest
#1Yann Michel
yann-postgresql@spline.de

Hi,

I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.

Thanks in advance!

Regards,
Yann

#2Reini Urban
rurban@x-ray.at
In reply to: Yann Michel (#1)
Re: plans for bitmap indexes?

Yann Michel schrieb:

I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.

Sure, and the next will come with musicbrainz.
Why not :)
If you don't want to code that in the app, the database backend is the
solution. The database is the golden hammer.
http://c2.com/cgi/wiki?GoldenHammer
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/

#3Bruce Momjian
bruce@momjian.us
In reply to: Yann Michel (#1)
Re: plans for bitmap indexes?

Yann Michel wrote:

Hi,

I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.

Lots of people have talked about it but I don't know anyone coding it.
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Yann Michel
yann-postgresql@spline.de
In reply to: Bruce Momjian (#3)
Re: plans for bitmap indexes?

Hi,

On Thu, Oct 07, 2004 at 06:54:15PM -0400, Bruce Momjian wrote:

I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.

Lots of people have talked about it but I don't know anyone coding it.

have you ever discussed if bitmap indexes lead to better query
performance than hash indexes will do?

Regards,
Yann

#5Yann Michel
yann-postgresql@spline.de
In reply to: Reini Urban (#2)
Re: plans for bitmap indexes?

Hi,

On Thu, Oct 07, 2004 at 10:41:04PM +0200, Reini Urban wrote:

I'd like to know if there are any plans on introducing bitmap indexes
into postgresql. I think this could mean a big performance improvement
especially for datawarehousing applications. I know that there is an
index type hash but I don't know how both types are comparable due to
they are both best usable for equality expressions.

Sure, and the next will come with musicbrainz.
Why not :)

I don't know what this means to my questin, but anyway...

If you don't want to code that in the app, the database backend is the
solution. The database is the golden hammer.

Therefore I asked wheather there are any thoughts of implementing them.

Regards,
Yann

#6Dave Page
dpage@pgadmin.org
In reply to: Yann Michel (#5)
Re: plans for bitmap indexes?

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Yann Michel
Sent: 08 October 2004 08:28
To: Reini Urban
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] plans for bitmap indexes?

I don't know what this means to my questin, but anyway...

If you don't want to code that in the app, the database

backend is the

solution. The database is the golden hammer.

Therefore I asked wheather there are any thoughts of
implementing them.

I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type?

Regards, Dave.

#7Yann Michel
yann-postgresql@spline.de
In reply to: Dave Page (#6)
Re: plans for bitmap indexes?

Hi,

On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:

I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type?

due to I'm developing a datawarehousing application we have lots of
fact-data in our central fact-table. As I know how to improve
performance on Oracle based datawarehouses, I'm used to add bitmap
indexes for atributes having only a few distinct values.
So I was looking for any comparable indexing technology but didn't find
any so far.

Regards,
Yann

#8Hannu Krosing
hannu@tm.ee
In reply to: Yann Michel (#7)
Re: plans for bitmap indexes?

On R, 2004-10-08 at 12:45, Yann Michel wrote:

Hi,

On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:

I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type?

due to I'm developing a datawarehousing application we have lots of
fact-data in our central fact-table. As I know how to improve
performance on Oracle based datawarehouses, I'm used to add bitmap
indexes for atributes having only a few distinct values.
So I was looking for any comparable indexing technology but didn't find
any so far.

There is currently no suitable index type for this type of queries (huge
tables with a few distinct values).

You may try to optimise performance by partitioning your fact tables on
these few dimension values by using table inheritance or union all
views.

There was a discussion on partitioning postgres tables on
pgsql-performance list a few days ago.

-------------
Hannu

#9Oleg Bartunov
oleg@sai.msu.su
In reply to: Yann Michel (#7)
Re: plans for bitmap indexes?

On Fri, 8 Oct 2004, Yann Michel wrote:

Hi,

On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:

I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type?

due to I'm developing a datawarehousing application we have lots of
fact-data in our central fact-table. As I know how to improve
performance on Oracle based datawarehouses, I'm used to add bitmap
indexes for atributes having only a few distinct values.
So I was looking for any comparable indexing technology but didn't find
any so far.

Yann, you may try our contrib/intarray module

Regards,
Yann

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#10Josh Berkus
josh@agliodbs.com
In reply to: Oleg Bartunov (#9)
Re: plans for bitmap indexes?

Yann,

Lots of people have talked about it but I don't know anyone coding it.

I would love to have bitmap indexes in Postgres, as would a lot of other
community members. However, they are far from trivial to code. Are you
offering to help?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#11Yann Michel
yann-postgresql@spline.de
In reply to: Josh Berkus (#10)
Re: plans for bitmap indexes?

Hi Josh,

On Fri, Oct 08, 2004 at 09:59:41AM -0700, Josh Berkus wrote:

Lots of people have talked about it but I don't know anyone coding it.

I would love to have bitmap indexes in Postgres, as would a lot of other
community members. However, they are far from trivial to code. Are you
offering to help?

I'd like to help you, but I think, that my C-Experience is not good
enough for beeing able to. I mean, I coded some C-stuff and I know how
bitmap indexes (should) work but I guess that this won't be enough. In
addidtion I never took a look into postgresql's sources.

Regards,
Yann

#12Josh Berkus
josh@agliodbs.com
In reply to: Yann Michel (#11)
Re: plans for bitmap indexes?

Yann,

I'd like to help you, but I think, that my C-Experience is not good
enough for beeing able to. I mean, I coded some C-stuff and I know how
bitmap indexes (should) work but I guess that this won't be enough. In
addidtion I never took a look into postgresql's sources.

Well, there's no time like the present to get started! ;-)

--
Josh Berkus
Aglio Database Solutions
San Francisco

#13Yann Michel
yann-postgresql@spline.de
In reply to: Josh Berkus (#12)
Re: plans for bitmap indexes?

Hi Josh,

On Fri, Oct 08, 2004 at 10:18:27AM -0700, Josh Berkus wrote:

I'd like to help you, but I think, that my C-Experience is not good
enough for beeing able to. I mean, I coded some C-stuff and I know how
bitmap indexes (should) work but I guess that this won't be enough. In
addidtion I never took a look into postgresql's sources.

Well, there's no time like the present to get started! ;-)

O.K. I downloaded it :-)
We will see if and how I can help....

Regards,
Yann

#14Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Yann Michel (#13)
Re: plans for bitmap indexes?

"Yann" == Yann Michel <yann-postgresql@spline.de> writes:

Yann> O.K. I downloaded it :-) We will see if and how I can
Yann> help....

FYI .. in case you aren't aware already:

http://portal.acm.org/citation.cfm?id=98720

--
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh

#15Chris Browne
cbbrowne@acm.org
In reply to: Dave Page (#6)
Re: plans for bitmap indexes?

yann-postgresql@spline.de (Yann Michel) writes:

On Fri, Oct 08, 2004 at 10:09:18AM +0100, Dave Page wrote:

I think what Reini was asking was why do you think you need bitmap
indexes as opposed to any existing type?

due to I'm developing a datawarehousing application we have lots of
fact-data in our central fact-table. As I know how to improve
performance on Oracle based datawarehouses, I'm used to add bitmap
indexes for atributes having only a few distinct values. So I was
looking for any comparable indexing technology but didn't find any
so far.

The most nearly comparable thing is be the notion of "partial
indexes," where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:

TABLE=my_table
FIELD=stateprov
FILE=$HOME/regionlist.txt
for region in `cat $FILE`; do
query="create index ${TABLE}_partial_on_${region} on $TABLE($FIELD) where $FIELD = '$region';"
echo $query | psql -d datawarehouse
done

That would set up 60 (or whatever $HOME/regionlist.txt indicated)
partial indices on the table on that field.

By the way, I thought ahead a little, in that; doing the same thing
for country codes might be as easy as replacing:

FIELD=country
FILE=$HOME/countrylist.txt

The partial indexes will not ALWAYS be useful; in cases where they
aren't, it is not inconceivable that there are improvements to be made
in the query optimizer...
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.

#16Chris Browne
cbbrowne@acm.org
In reply to: Josh Berkus (#10)
Re: plans for bitmap indexes?

josh@agliodbs.com (Josh Berkus) writes:

Lots of people have talked about it but I don't know anyone coding it.

I would love to have bitmap indexes in Postgres, as would a lot of other
community members. However, they are far from trivial to code. Are you
offering to help?

I'm curious as to whether partial indexes might suffice as an
alternative.

There are doubtless cases where the optimizer won't use them where it
would be plausible to do so; that suggests, to me, possibilities for
enhancing the optimizer.
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.

#17Josh Berkus
josh@agliodbs.com
In reply to: Chris Browne (#16)
Re: plans for bitmap indexes?

Chris,

The most nearly comparable thing is be the notion of "partial
indexes," where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:

I'm afraid that you're mistaken about the functionality of bitmap indexes.
The purpose of a bitmap index is not to partition an index, but to allow
multiple indexes to be used in the same operation.

For example, imagine you have a table on a dating website with 18 columns
representing 18 different characteristics for matching. Imagine that you
index each of those columns seperately. If you do:

SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city =
'San Francisco';

... then the planner can use an index on orientation OR on gender OR on city,
but not all three. Multicolumn indexes are no solution for this use case
because you'd have to create a multicolumn index for each possible combo of
two or three columns ( 18! ).

The Bitmap index allows the query executor to use several indexes on the same
operation, comparing them and selecting rows where they "overlap" like a Venn
diagram.

...

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#18Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#17)
Re: plans for bitmap indexes?

Josh Berkus <josh@agliodbs.com> writes:

SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city =
'San Francisco';

There are actually two TODOs here.

1) a "bitmap scan" that would be usable with any type of index. The tuple
locations can be read in for each criteria and sorted by location and built
into bitmaps. The results can be combined using bitmap operations and the
tuples scanned in physical order.

2) A persistent bitmap index that would enable skipping the first step of the
above.

In the case if all the columns were btree indexes it might still make sense to
scan through all the indexes and combine the results before reading in the
actual tuples. Especially if the tuples are very wide and each column
individually very unselective, but the combination very selective.

However it would work even better if gender and orientation could be stored on
disk in a bitmap representation. They're very low cardinality and could be
stored quite compactly. The result would read the data faster, skip the sort,
and be able to start returning tuples even before it finished reading the
entire index.

--
greg

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#17)
Re: plans for bitmap indexes?

Josh Berkus <josh@agliodbs.com> writes:

The Bitmap index allows the query executor to use several indexes on
the same operation, comparing them and selecting rows where they
"overlap" like a Venn diagram.

Note that what Josh is describing is not really a distinct index type,
but a different way of using an index: that is, you pull candidate tuple
locations from several indexes and intersect or union those sets before
you go to the heap. In principle this works whatever the index access
methods are.

I believe that the term "bitmap index" is also used with a different
meaning wherein it actually does describe a particular kind of on-disk
index structure, with one bit per table row.

IMHO building in-memory bitmaps (the first idea) is a very good idea to
pursue for Postgres. I'm not at all sold on on-disk bitmap indexes,
though ... those I suspect *are* sufficiently replaced by partial
indexes.

regards, tom lane

#20Gaetano Mendola
mendola@bigfoot.com
In reply to: Josh Berkus (#17)
Re: plans for bitmap indexes?

Josh Berkus wrote:

Chris,

The most nearly comparable thing is be the notion of "partial
indexes," where, supposing you had 60 region codes (e.g. - 50 US
states, 10 Canadian provinces), you might set up indices thus:

I'm afraid that you're mistaken about the functionality of bitmap indexes.
The purpose of a bitmap index is not to partition an index, but to allow
multiple indexes to be used in the same operation.

For example, imagine you have a table on a dating website with 18 columns
representing 18 different characteristics for matching. Imagine that you
index each of those columns seperately. If you do:

SELECT * FROM people WHERE orientation = 'gay' AND gender = 'male' AND city =
'San Francisco';

... then the planner can use an index on orientation OR on gender OR on city,
but not all three. Multicolumn indexes are no solution for this use case
because you'd have to create a multicolumn index for each possible combo of
two or three columns ( 18! ).

I'm wondering if in some cases the following query could be more efficient

select * from people where orientation = 'gay'
intersect
select * from people where gender = 'male'
intersect
select * from people where city =

Regards
Gaetano Mendola

#21Neil Conway
neilc@samurai.com
In reply to: Chris Browne (#16)
#22Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Neil Conway (#21)
#23Hannu Krosing
hannu@tm.ee
In reply to: Bruce Momjian (#18)
#24Reini Urban
rurban@x-ray.at
In reply to: Josh Berkus (#17)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#22)
#26Josh Berkus
josh@agliodbs.com
In reply to: Zeugswetter Andreas SB SD (#22)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#26)
#28Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#27)
#29Yann Michel
yann-postgresql@spline.de
In reply to: Chris Browne (#15)
#30Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Yann Michel (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#30)
#32Yann Michel
yann-postgresql@spline.de
In reply to: Tom Lane (#31)
#33Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#19)
#34Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#17)
#35Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Simon Riggs (#34)
#36Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Simon Riggs (#34)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#34)
#38Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#37)
#39Gavin Sherry
swm@linuxworld.com.au
In reply to: Josh Berkus (#38)
#40Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#17)
#41Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#17)
#42Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Tom Lane (#37)
#43Gavin Sherry
swm@linuxworld.com.au
In reply to: Sailesh Krishnamurthy (#42)
#44Sailesh Krishnamurthy
sailesh@cs.berkeley.edu
In reply to: Gavin Sherry (#43)
#45Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Alvaro Herrera (#35)
#46Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#40)
#47Hannu Krosing
hannu@tm.ee
In reply to: Mark Kirkwood (#36)
#48Bruce Momjian
bruce@momjian.us
In reply to: Hannu Krosing (#47)
#49Hannu Krosing
hannu@tm.ee
In reply to: Bruce Momjian (#48)
#50Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#49)
#51Andre Maasikas
andre@abs.ee
In reply to: Hannu Krosing (#49)
#52Hannu Krosing
hannu@tm.ee
In reply to: Andre Maasikas (#51)
#53Bruce Momjian
bruce@momjian.us
In reply to: Hannu Krosing (#52)
#54Yann Michel
yann-postgresql@spline.de
In reply to: Bruce Momjian (#53)
#55Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Bruce Momjian (#53)
#56Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#55)
#57Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#40)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#57)
#59Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#58)