Indexing a Bit String column

Started by George Oakmanabout 17 years ago4 messagesgeneral
Jump to latest
#1George Oakman
oakmang@hotmail.com

Hi all,

I am planning to use the Bit String data type for a large number of binary strings, e.g.
CREATE TABLE myTable (myBitStringCol BIT(3));

I will need to perform & (bitwise AND) operations using SELECT on this column, e.g.
SELECT * FROM myTable WHERE myBitStringCol & B'101' = myBitStringCol;

To optimise this type of SELECT statement, I guess I’ll have to build an index on the Bit String column, e.g.
CREATE INDEX myBitStringCol_idx ON myTable (myBitStringCol);

Is it all I need to do? Will PgSQL know how to index properly a Bit String column? Should I build the index using a special method, e.g.
CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);

Since we’re already talking of a Bit String column, the USING gist() statement looks a bit redundant to me. Basically, I though I would ask if I need to do anything special when indexing a BIT column.

Thanks for your comments.

George.

_________________________________________________________________
Twice the fun—Share photos while you chat with Windows Live Messenger. Learn more.
http://www.microsoft.com/uk/windows/windowslive/products/messenger.aspx

#2Bruce Momjian
bruce@momjian.us
In reply to: George Oakman (#1)
Re: Indexing a Bit String column

George Oakman <oakmang@hotmail.com> writes:

Is it all I need to do? Will PgSQL know how to index properly a Bit String
column? Should I build the index using a special method, e.g.
CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);

No, the default will be to build a btree index which won't help these types of
queries at all.

You would want a GIST index if there was a built-in GIST opclass for these
kinds of queries, but sadly there isn't. You could add one fairly easily but
it would require C code. I think it would be a valuable addition to Postgres
if you do write one.

Note that something like "WHERE myBitStringCol & B'101'" might be selecting
too much of your table to make an index useful anyways. If each bit is set in
half the table then you're talking about selecting 3/4 of the table in which
case a full table scan would be more efficient than any index.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: Indexing a Bit String column

Gregory Stark <stark@enterprisedb.com> writes:

Note that something like "WHERE myBitStringCol & B'101'" might be selecting
too much of your table to make an index useful anyways. If each bit is set in
half the table then you're talking about selecting 3/4 of the table in which
case a full table scan would be more efficient than any index.

If the expectation is that the bitstring is mostly zeroes, I wonder
whether the OP wouldn't be better off using an integer-array
representation, ie instead of '00000101' store '{6,8}'. Then he could
use the existing GIST or GIN support for integer array operations.

regards, tom lane

#4George Oakman
oakmang@hotmail.com
In reply to: Bruce Momjian (#2)
Re: Indexing a Bit String column

Hi,

Thanks for the info. I new it would have been too easy! :)

Sorry, I made a mistake earlier, my queries will actually be more like

SELECT * FROM myTable WHERE myBitStringCol & B'101' = B'101';

This doesn't make much difference for the indexing problem, but it may help address the very good point you raised regarding the predicted number of results, and therefore the justification of needing an index at all. In practice, my BitStrings will be 1024 bits long - both A and B in "WHERE A & B = B" will be 1024 bits long.

Assuming that bits are independents and randomly distributed in the database, am I right in thinking that a typical search is expected to return N*0.5^n, where N is the total number of rows in the table and n is the number of bits set to 1 in B?

If this calculation is correct, even if 1% of the bits are set to 1 in B, then this would give N*0.5^10 results, i.e. roughly 0.1% of the database.

So it looks like I'll need the index in the end!

I am actually new to PgSQL - I would be very grateful if you could point me to resources/tutorials to help me build an index extension in C?

Many thanks for your help.

George.

To: oakmang@hotmail.com
CC: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Indexing a Bit String column
From: stark@enterprisedb.com
Date: Tue, 24 Feb 2009 15:35:58 +0000

George Oakman <oakmang@hotmail.com> writes:

Is it all I need to do? Will PgSQL know how to index properly a Bit String
column? Should I build the index using a special method, e.g.
CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);

No, the default will be to build a btree index which won't help these types of
queries at all.

You would want a GIST index if there was a built-in GIST opclass for these
kinds of queries, but sadly there isn't. You could add one fairly easily but
it would require C code. I think it would be a valuable addition to Postgres
if you do write one.

Note that something like "WHERE myBitStringCol & B'101'" might be selecting
too much of your table to make an index useful anyways. If each bit is set in
half the table then you're talking about selecting 3/4 of the table in which
case a full table scan would be more efficient than any index.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Windows Live Messenger just got better .Video display pics, contact updates & more.
http://www.download.live.com/messenger