Indexing columns with low cardinality: persistent bitmap indexes?

Started by Bruno Lavoieabout 17 years ago8 messagesgeneral
Jump to latest
#1Bruno Lavoie
bruno.lavoie@gmail.com

Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes... Is that feature planned in next
releases of pg?

Thanks
Bruno Lavoie

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruno Lavoie (#1)
Re: Indexing columns with low cardinality: persistent bitmap indexes?

Bruno Lavoie escribi�:

Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes...

It doesn't.

Is that feature planned in next releases of pg?

There are some efforts to get it done, but don't hold your breath (it
won't be in 8.4 either, as it has major problems currently.)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Reg Me Please
regmeplease@gmail.com
In reply to: Bruno Lavoie (#1)
Re: Indexing columns with low cardinality: persistent bitmap indexes?

On Monday 12 January 2009 21:38:02 Bruno Lavoie wrote:

Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes... Is that feature planned in next
releases of pg?

Thanks
Bruno Lavoie

I would try partial indexes, as many as the distinct values.
I'm not sure this would help, though.

--
Fahrbahn ist ein graues Band
weisse Streifen, grüner Rand

#4Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Reg Me Please (#3)
Re: Indexing columns with low cardinality: persistent bitmap indexes?

On Mon, Jan 12, 2009 at 4:16 PM, Reg Me Please <regmeplease@gmail.com> wrote:

On Monday 12 January 2009 21:38:02 Bruno Lavoie wrote:

Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes... Is that feature planned in next
releases of pg?

Thanks
Bruno Lavoie

I would try partial indexes, as many as the distinct values.
I'm not sure this would help, though.

you should create partial indexes only on those values that are a
lower fraction on the table
ie: if you have

value "fraction of the table that has this value"
1 5%
2 3%
3 20%
4 25%
5 47%

then only partial indexes on values 1 and 2 are of some value

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#5Scott Marlowe
scott.marlowe@gmail.com
In reply to: Bruno Lavoie (#1)
Re: Indexing columns with low cardinality: persistent bitmap indexes?

On Mon, Jan 12, 2009 at 1:38 PM, Bruno Lavoie <bruno.lavoie@gmail.com> wrote:

Hello,

I have a column with a small number of distinct values, indexing this one
with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the creation of
persistent bitmap indexes... Is that feature planned in next releases of pg?

You have a few options based on your access patterns. If you tend to
access just one of these an get them all at once, then either clusting
on this value, or partitioning your table will help.

If you access your data using these values and other column values at
the same time, then partial or multi-column indexes might help.

#6Dann Corbit
DCorbit@connx.com
In reply to: Alvaro Herrera (#2)
Re: Indexing columns with low cardinality: persistentbitmap indexes?

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Alvaro Herrera
Sent: Monday, January 12, 2009 12:41 PM
To: Bruno Lavoie
Cc: PostgreSQL
Subject: Re: [GENERAL] Indexing columns with low cardinality:
persistentbitmap indexes?

Bruno Lavoie escribió:

Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes...

It doesn't.

Is that feature planned in next releases of pg?

There are some efforts to get it done, but don't hold your breath (it
won't be in 8.4 either, as it has major problems currently.)

Here is an interesting experiment:

Application of Bitmap Index to Information Retrieval. K. Fujioka, Y. Uematsu, and M. Onizuka. WWW 2008
Source: [ACM]
Synopsis: This paper proposes a hierarchical structure called HS-bitmap index to represent document-term matrix. The authors implemented their data structure on PostgreSQL and observed it to perform better than an inverted index. A short-coming might be that HS-bitmap index takes more space than the inverted index even after compression.
Note this work makes use of PostgreSQL but is unrelated to the on-going work of implementing bitmap index in PostgreSQL.

http://portal.acm.org/citation.cfm?doid=1367497.1367680

Here is the research page where I found the above:
http://www-users.cs.umn.edu/~kewu/annotated.html

#7Martin Gainty
mgainty@hotmail.com
In reply to: Jaime Casanova (#4)
Re: Indexing columns with low cardinality: persistent bitmap indexes?

Jaime-

Porque no utiliza Bitmap?

*Saludos Cordiales desde EEUU*
Martin
______________________________________________
Disclaimer and confidentiality note
Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission.

Date: Mon, 12 Jan 2009 16:20:40 -0500
From: jcasanov@systemguards.com.ec
To: regmeplease@gmail.com
Subject: Re: [GENERAL] Indexing columns with low cardinality: persistent bitmap indexes?
CC: pgsql-general@postgresql.org; bruno.lavoie@gmail.com

On Mon, Jan 12, 2009 at 4:16 PM, Reg Me Please <regmeplease@gmail.com> wrote:

On Monday 12 January 2009 21:38:02 Bruno Lavoie wrote:

Hello,

I have a column with a small number of distinct values, indexing this
one with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the
creation of persistent bitmap indexes... Is that feature planned in next
releases of pg?

Thanks
Bruno Lavoie

I would try partial indexes, as many as the distinct values.
I'm not sure this would help, though.

you should create partial indexes only on those values that are a
lower fraction on the table
ie: if you have

value "fraction of the table that has this value"
1 5%
2 3%
3 20%
4 25%
5 47%

then only partial indexes on values 1 and 2 are of some value

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Windows Live™: Keep your life in sync.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_t1_allup_explore_012009

#8Adam Rich
adam.r@sbcglobal.net
In reply to: Martin Gainty (#7)
Re: Indexing columns with low cardinality: persistent bitmap indexes?

Hello,

I have a column with a small number of distinct values, indexing this

one

with a standard BTree is useless. How do I can index this column
efficiently? I searched and it seems that pg doesn't support the

creation of

persistent bitmap indexes... Is that feature planned in next releases of

pg?

You have a few options based on your access patterns. If you tend to
access just one of these an get them all at once, then either clusting
on this value, or partitioning your table will help.

How will clustering benefit this pattern? Won't a full table scan be
required
regardless of the table being clustered? And I thought the point of
clustering
was the organize the table by some indexed key, requiring fewer seeks and
increasing the likelihood of the pages being in the cache .... if the index
is never used in this case ( low cardinality ) would it still help?

Show quoted text

If you access your data using these values and other column values at
the same time, then partial or multi-column indexes might help.