Bitmap index status

Started by Heikki Linnakangasover 19 years ago15 messages
#1Heikki Linnakangas
heikki@enterprisedb.com

Hi,

What's the status of the bitmap index patch? Have you worked on it since
the last posted patch
(http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)?

I've started to review it, to get it into CVS early in the 8.3 cycle. I
just want to make sure that I'm working on the latest version.

Beside the issues already discussed, I found two minor bugs:
* pg_am says that bitmap am supports unique indexes, while it doesn't.
Second,
* race condition in _bitmap_inserttuple if two backends try to insert
the same, new value. If they both find that there's no lov item for the
key, and try to create one, one backend will get a duplicate key error
on the lov index.

Also, vacuum actually does a reindex, which seems awfully wasteful. That
needs to be looked at.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: Bitmap index status

Heikki Linnakangas <heikki@enterprisedb.com> writes:

What's the status of the bitmap index patch? Have you worked on it since
the last posted patch
(http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)?

Gavin and Jie have made major changes since that version (or at least
they'd better have something to show for the month since then ;-)).
I wouldn't recommend reviewing the patch until they post something
current ...

Also, vacuum actually does a reindex, which seems awfully wasteful. That
needs to be looked at.

Yikes. I imagine they've not tried to do anything about that; if you
want to help, maybe you could take that subproblem?

regards, tom lane

#3Jie Zhang
jzhang@greenplum.com
In reply to: Heikki Linnakangas (#1)
Re: Bitmap index status

Hi Heikki,

Gavin and I are trying to merge our changes together this week. We will post
a new patch by the end of this week. This patch will include some style
fixes, bug fixes, and the stream bitmap implementation.

I will look into the problems you have mentioned in this email. Yes, vacuum
currently does a reindex now. Gavin and I just talked about this yesterday.
We are looking into ways to improve this. One way is not to do reindex for
each vacuum. We maintain a list of updated tids along with the bitmap index.
Only when this list goes to a certain point, vacuum will re-build the index.

Thanks,
Jie

On 9/12/06 2:43 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

Show quoted text

Hi,

What's the status of the bitmap index patch? Have you worked on it since
the last posted patch
(http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)?

I've started to review it, to get it into CVS early in the 8.3 cycle. I
just want to make sure that I'm working on the latest version.

Beside the issues already discussed, I found two minor bugs:
* pg_am says that bitmap am supports unique indexes, while it doesn't.
Second,
* race condition in _bitmap_inserttuple if two backends try to insert
the same, new value. If they both find that there's no lov item for the
key, and try to create one, one backend will get a duplicate key error
on the lov index.

Also, vacuum actually does a reindex, which seems awfully wasteful. That
needs to be looked at.

#4Jie Zhang
jzhang@greenplum.com
In reply to: Heikki Linnakangas (#1)
Re: Bitmap index status

Hi Heikki and all,

I just sent the latest bitmap index patch to the list. I am not sure if
there is any size limit for this mailing list. If you have received my
previous email, please let me know.

Thanks,
Jie

On 9/12/06 2:43 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

Show quoted text

Hi,

What's the status of the bitmap index patch? Have you worked on it since
the last posted patch
(http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)?

I've started to review it, to get it into CVS early in the 8.3 cycle. I
just want to make sure that I'm working on the latest version.

Beside the issues already discussed, I found two minor bugs:
* pg_am says that bitmap am supports unique indexes, while it doesn't.
Second,
* race condition in _bitmap_inserttuple if two backends try to insert
the same, new value. If they both find that there's no lov item for the
key, and try to create one, one backend will get a duplicate key error
on the lov index.

Also, vacuum actually does a reindex, which seems awfully wasteful. That
needs to be looked at.

#5Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Jie Zhang (#4)
Re: Bitmap index status

Jie Zhang wrote:

Hi Heikki and all,

Please find the latest bitmap index patch in the attachment. This patch is
generated against the postgresql cvs head.

Thanks.

The handling of stream and hash bitmaps looks pretty complicated to me.
All the bitmap-related nodes have logic to handle both types slightly
differently. It all seems to come down to that if a subnode (or
amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the
caller needs to call the subnode many times, until it returns a NULL.
With a HashBitmap, the caller only calls the subnode once.

I think amgetbitmap should be called just once per index scan, and it
should return either a hash bitmap or a stream bitmap. The same applies
to all the executor nodes that return bitmaps, they would only return a
single HashBitmap or StreamBitmap, and the upper node would call
tbm_iterate repeatedly on that.

StreamBitmap would contain a callback (filled by the indexam) that
tbm_iterate would call to fill the next TBMIterateResult.

I would also move the code to AND and OR together different types of
bitmaps to tidbitmap.c, so that BitmapAnd and BitmapOr nodes wouldn't
need to care about different types either.

tbm_intersect and tbm_union with two HashBitmaps would work like they do
now. Calling tbm_intersect or tbm_union with two StreamBitmaps would
return a new StreamBitmap object that would pull results from the two
argument StreamBitmaps page at a time and AND or OR them together.
Combining a HashBitmap and a StreamBitmap would wrap the HashBitmap with
a new StreamBitmap that pulls the entries from the HashBitmap page at a
time.

What do you think? Would you like me to help implementing the above?

This patch includes code style changes, bug fixes, and the stream bitmap
implementation. I have also fixed the problems mentioned in Heikki's email.

It seems you fixed the race condition in inserttuple by holding a write
lock on the metapage while you find/create the lov item. While correct,
isn't that pretty bad for concurrency? I realize that the main use case
for bitmap indexes is large data warehouses where you don't do
concurrent updates, but still...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Gavin Sherry
swm@linuxworld.com.au
In reply to: Heikki Linnakangas (#5)
Re: Bitmap index status

On Tue, 19 Sep 2006, Heikki Linnakangas wrote:

Jie Zhang wrote:

Hi Heikki and all,

Please find the latest bitmap index patch in the attachment. This patch is
generated against the postgresql cvs head.

Thanks.

The handling of stream and hash bitmaps looks pretty complicated to me.
All the bitmap-related nodes have logic to handle both types slightly
differently. It all seems to come down to that if a subnode (or
amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the
caller needs to call the subnode many times, until it returns a NULL.
With a HashBitmap, the caller only calls the subnode once.

I think amgetbitmap should be called just once per index scan, and it
should return either a hash bitmap or a stream bitmap. The same applies
to all the executor nodes that return bitmaps, they would only return a
single HashBitmap or StreamBitmap, and the upper node would call
tbm_iterate repeatedly on that.

StreamBitmap would contain a callback (filled by the indexam) that
tbm_iterate would call to fill the next TBMIterateResult.

Right, this was the approach taken by an earlier version of the patch I
had worked on. It was significantly uglified by the need to keep the index
state around and to be careful about what amrescan might do behind out
back. I will, however, introduce the idea again because it makes the code
much cleaner and more logical, as you seem to suggest.

Thanks,

Gavin

#7Jie Zhang
jzhang@greenplum.com
In reply to: Gavin Sherry (#6)
Re: Bitmap index status

On 9/19/06 5:15 AM, "Gavin Sherry" <swm@linuxworld.com.au> wrote:

On Tue, 19 Sep 2006, Heikki Linnakangas wrote:

Jie Zhang wrote:

Hi Heikki and all,

Please find the latest bitmap index patch in the attachment. This patch is
generated against the postgresql cvs head.

Thanks.

The handling of stream and hash bitmaps looks pretty complicated to me.
All the bitmap-related nodes have logic to handle both types slightly
differently. It all seems to come down to that if a subnode (or
amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the
caller needs to call the subnode many times, until it returns a NULL.
With a HashBitmap, the caller only calls the subnode once.

I think amgetbitmap should be called just once per index scan, and it
should return either a hash bitmap or a stream bitmap. The same applies
to all the executor nodes that return bitmaps, they would only return a
single HashBitmap or StreamBitmap, and the upper node would call
tbm_iterate repeatedly on that.

StreamBitmap would contain a callback (filled by the indexam) that
tbm_iterate would call to fill the next TBMIterateResult.

Right, this was the approach taken by an earlier version of the patch I
had worked on. It was significantly uglified by the need to keep the index
state around and to be careful about what amrescan might do behind out
back. I will, however, introduce the idea again because it makes the code
much cleaner and more logical, as you seem to suggest.

I will think about this approach. But I am not quite convinced that this
approach will be simpler and cleaner than the above approach. :-)

Thanks,
Jie

#8Mark Wong
markw@osdl.org
In reply to: Jie Zhang (#4)
Re: Bitmap index status

Jie Zhang wrote:

Hi Heikki and all,

I just sent the latest bitmap index patch to the list. I am not sure if
there is any size limit for this mailing list. If you have received my
previous email, please let me know.

Hi Jie,

I know I said I was going to get testing on this months ago but I've
been juggling between 3 systems due to disk failures and other hardware
configuration issues. Anyways, I've take a baseline run of only the
power test using a 1GB database with the patch 09-17 patch against a
snapshot of pgsql from 2006-09-17:

http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/

Do you think the 1GB scale factor will be sufficient for testing as it
will certainly be faster? Do you think testing with just a power test
will be sufficient for now? I really don't have a good reason why I
didn't run a throughput test other than to save time. :) I also wanted
to get your opinion again on which indexes we will want to try first.

Thanks,
Mark

#9Jie Zhang
jzhang@greenplum.com
In reply to: Gavin Sherry (#6)
Re: Bitmap index status

Gavin & Heikki,

The handling of stream and hash bitmaps looks pretty complicated to me.
All the bitmap-related nodes have logic to handle both types slightly
differently. It all seems to come down to that if a subnode (or
amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the
caller needs to call the subnode many times, until it returns a NULL.
With a HashBitmap, the caller only calls the subnode once.

I think amgetbitmap should be called just once per index scan, and it
should return either a hash bitmap or a stream bitmap. The same applies
to all the executor nodes that return bitmaps, they would only return a
single HashBitmap or StreamBitmap, and the upper node would call
tbm_iterate repeatedly on that.

StreamBitmap would contain a callback (filled by the indexam) that
tbm_iterate would call to fill the next TBMIterateResult.

Right, this was the approach taken by an earlier version of the patch I
had worked on. It was significantly uglified by the need to keep the index
state around and to be careful about what amrescan might do behind out
back. I will, however, introduce the idea again because it makes the code
much cleaner and more logical, as you seem to suggest.

I have been thinking about this approach some more. I do agree that this is
more attractive now. The following includes some more detailed design.
Please let me know if you have any comments. (My apologies to Gavin. You
talked to me about this approach before. But you introduced some on-disk
bitmap specific code into the tidbitmap.c, which prevented me from looking
more in this direction.)

Essentially, we want to have a stream bitmap object that has an iterator,
which will be able to iterate through the bitmaps. The BitmapIndexscan,
BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a
hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate through
the bitmaps.

The StreamBitmap structure will look like below.

struct StreamBitmap {
NodeTag type; /* to make it a valid Node */
PagetableEntry entry; /* a page of tids in this stream bitmap */

/* the iterator function */
void (*next)(StreamBitmap*);
Node* state; /* store how this stream bitmap generated,
and all necessary information to
obtain the next stream bitmap. */
};

Two new state objects will look like below. At the same time, we introduce
three new node types: T_StreamBitmapAND, T_StreamBitmapOR,
And T_StreamBitmapIndex, to define different states.

/*
* Stores the necessary information for iterating through the stream bitmaps
* generated by nodeBitmapAnd or nodeBitmapOr.
*/
struct StreamBitmapOp {
NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */
List* bitmaps;
};

/*
* Stores some necessary information for iterating through the stream
* bitmaps generated by nodeBitmapIndexscan.
*/
struct StreamBitmapIndex {
NodeTag type; /* handle T_StreamBitmapIndex */
IndexScanDesc scan;
BlockNumber nextBlockNo;/* next block no to be read */
};

Then we will have the iterator functions like the following:

void StreamBitmapAndNext(StreamBitmap* node) {
tbm_intersect_stream(((StreampBitmapOp*) node->state)->bitmaps, node);
}

void StreamBitmapOrNext(StreamBitmap* node) {
tbm_union_stream(((StreampBitmapOp*) node->state)->bitmaps, node);
}

void StreamBitmapIndexNext(StreamBitmap* node) {
StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state;
amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo);
}

What do you think?

Thanks,
Jie

#10Jie Zhang
jzhang@greenplum.com
In reply to: Mark Wong (#8)
Re: Bitmap index status

Hi Mark,

Thanks for doing the test. I checked out the link you provided below. I am a
little confused about the goal of these tests. Do you plan to test the
overall performance of postgreSQL on handling TPC-H queries?

Thanks,
Jie

On 9/22/06 3:45 PM, "Mark Wong" <markw@osdl.org> wrote:

Show quoted text

Jie Zhang wrote:

Hi Heikki and all,

I just sent the latest bitmap index patch to the list. I am not sure if
there is any size limit for this mailing list. If you have received my
previous email, please let me know.

Hi Jie,

I know I said I was going to get testing on this months ago but I've
been juggling between 3 systems due to disk failures and other hardware
configuration issues. Anyways, I've take a baseline run of only the
power test using a 1GB database with the patch 09-17 patch against a
snapshot of pgsql from 2006-09-17:

http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/

Do you think the 1GB scale factor will be sufficient for testing as it
will certainly be faster? Do you think testing with just a power test
will be sufficient for now? I really don't have a good reason why I
didn't run a throughput test other than to save time. :) I also wanted
to get your opinion again on which indexes we will want to try first.

Thanks,
Mark

#11Mark Wong
markw@osdl.org
In reply to: Jie Zhang (#10)
Re: Bitmap index status

Hi Jie,

Yeah, basically gather as many stats as I can to accurately profile the
overall system performance. I thought it would be appropriate to use a
TPC-H based workload as one measuring stick to use for bitmap indexes.

Mark

Jie Zhang wrote:

Show quoted text

Hi Mark,

Thanks for doing the test. I checked out the link you provided below. I am a
little confused about the goal of these tests. Do you plan to test the
overall performance of postgreSQL on handling TPC-H queries?

Thanks,
Jie

On 9/22/06 3:45 PM, "Mark Wong" <markw@osdl.org> wrote:

Jie Zhang wrote:

Hi Heikki and all,

I just sent the latest bitmap index patch to the list. I am not sure if
there is any size limit for this mailing list. If you have received my
previous email, please let me know.

Hi Jie,

I know I said I was going to get testing on this months ago but I've
been juggling between 3 systems due to disk failures and other hardware
configuration issues. Anyways, I've take a baseline run of only the
power test using a 1GB database with the patch 09-17 patch against a
snapshot of pgsql from 2006-09-17:

http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/

Do you think the 1GB scale factor will be sufficient for testing as it
will certainly be faster? Do you think testing with just a power test
will be sufficient for now? I really don't have a good reason why I
didn't run a throughput test other than to save time. :) I also wanted
to get your opinion again on which indexes we will want to try first.

Thanks,
Mark

#12Luke Lonergan
llonergan@greenplum.com
In reply to: Mark Wong (#11)
Re: Bitmap index status

Mark,

On 9/25/06 11:32 AM, "Mark Wong" <markw@osdl.org> wrote:

Yeah, basically gather as many stats as I can to accurately profile the
overall system performance. I thought it would be appropriate to use a
TPC-H based workload as one measuring stick to use for bitmap indexes.

Note that the TPC-H queries don't follow the typical good use case for
bitmap indexes. You'd like to see queries that use multiple AND and OR
clauses, otherwise there may be no benefit.

Also, DBT-3/TPC-H on Postgres right now does not benefit from indices
overall. The planner has limitations WRT selectivity estimates and other
limitations that cause it to choose index access poorly for the query
workload. We have two new features coming (for 8.3) that fix this, but for
now we find that indexes are a net loss, in some queries a huge loss.

If you look at the whitepaper that Ayush Parashar published, he uses the
TPC-H data with some targeted queries that showcase the best use-cases for
bitmap index.

- Luke

#13Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Jie Zhang (#9)
Re: Bitmap index status

Looks a bit better now, though I think you need to think more about the
encapsulation of the structs. More detailed comments below.

Jie Zhang wrote:

Essentially, we want to have a stream bitmap object that has an iterator,
which will be able to iterate through the bitmaps. The BitmapIndexscan,
BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a
hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate
through
the bitmaps.

The StreamBitmap structure will look like below.

struct StreamBitmap {
NodeTag type; /* to make it a valid Node */
PagetableEntry entry; /* a page of tids in this stream bitmap */

/* the iterator function */
void (*next)(StreamBitmap*);
Node* state; /* store how this stream bitmap generated,
and all necessary information to
obtain the next stream bitmap. */
};

I'd suggest making state just a (void *). It's private to the producer
of the bitmap, and I don't see a reason to expose it. I assume that the
next-function fills in the PageTableEntry with the next set of tids.

Two new state objects will look like below. At the same time, we introduce
three new node types: T_StreamBitmapAND, T_StreamBitmapOR,
And T_StreamBitmapIndex, to define different states.

/*
* Stores the necessary information for iterating through the stream
bitmaps
* generated by nodeBitmapAnd or nodeBitmapOr.
*/
struct StreamBitmapOp {
NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */
List* bitmaps;
};

AFAICS, this struct is private to tidbitmap.c, where the new
stream-enabled tbm_intersect etc. functions are defined. Also, I don't
see a reason why it needs to by a valid Node.

/*
* Stores some necessary information for iterating through the stream
* bitmaps generated by nodeBitmapIndexscan.
*/
struct StreamBitmapIndex {
NodeTag type; /* handle T_StreamBitmapIndex */
IndexScanDesc scan;
BlockNumber nextBlockNo;/* next block no to be read */
};

Where would this struct be defined? I think different index access
methods might want to have different kind of states. The struct above
assumes that the position of an index scan is always represented by the
nextBlockNo. That seems to be the right thing for the bitmap indexam, so
this struct is fine for StreamBitmaps returned by bmgetbitmap, but not
necessary for others.

Then we will have the iterator functions like the following:

...

void StreamBitmapIndexNext(StreamBitmap* node) {
StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state;
amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo);
}

This means that the amgetbitmap function would still be called many
times in each scan. What would amgetbitmap return? A new StreamBitmap
on each call?

I'd like to see just one call to amgetbitmap. It would a) fill in the
hash bitmap passed to it, b) return a new hash bitmap, or c) return a
new StreamBitmap, with a indexam specific next-function that returns the
tids one page at a time. I think we'll also need some kind of a
destructor in StreamBitmap that's called by the consumer of the bitmap
after it's done with it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14Gavin Sherry
swm@linuxworld.com.au
In reply to: Heikki Linnakangas (#13)
Re: Bitmap index status

On Tue, 26 Sep 2006, Heikki Linnakangas wrote:

Looks a bit better now, though I think you need to think more about the
encapsulation of the structs. More detailed comments below.

Jie Zhang wrote:

Essentially, we want to have a stream bitmap object that has an iterator,
which will be able to iterate through the bitmaps. The BitmapIndexscan,
BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a
hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate
through
the bitmaps.

The StreamBitmap structure will look like below.

struct StreamBitmap {
NodeTag type; /* to make it a valid Node */
PagetableEntry entry; /* a page of tids in this stream bitmap */

/* the iterator function */
void (*next)(StreamBitmap*);
Node* state; /* store how this stream bitmap generated,
and all necessary information to
obtain the next stream bitmap. */
};

I'd suggest making state just a (void *). It's private to the producer
of the bitmap, and I don't see a reason to expose it. I assume that the
next-function fills in the PageTableEntry with the next set of tids.

Two new state objects will look like below. At the same time, we introduce
three new node types: T_StreamBitmapAND, T_StreamBitmapOR,
And T_StreamBitmapIndex, to define different states.

/*
* Stores the necessary information for iterating through the stream
bitmaps
* generated by nodeBitmapAnd or nodeBitmapOr.
*/
struct StreamBitmapOp {
NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */
List* bitmaps;
};

AFAICS, this struct is private to tidbitmap.c, where the new
stream-enabled tbm_intersect etc. functions are defined. Also, I don't
see a reason why it needs to by a valid Node.

Well, making it a valid nodes makes it easy to identify (IsA) and gives us
access to copy/equal frameworks. I do think that it is best to bury this
in the bitmap code however.

* Stores some necessary information for iterating through the stream
* bitmaps generated by nodeBitmapIndexscan.
*/
struct StreamBitmapIndex {
NodeTag type; /* handle T_StreamBitmapIndex */
IndexScanDesc scan;
BlockNumber nextBlockNo;/* next block no to be read */
};

Where would this struct be defined? I think different index access
methods might want to have different kind of states. The struct above
assumes that the position of an index scan is always represented by the
nextBlockNo. That seems to be the right thing for the bitmap indexam, so
this struct is fine for StreamBitmaps returned by bmgetbitmap, but not
necessary for others.

right.

Then we will have the iterator functions like the following:

...

void StreamBitmapIndexNext(StreamBitmap* node) {
StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state;
amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo);
}

This means that the amgetbitmap function would still be called many
times in each scan. What would amgetbitmap return? A new StreamBitmap
on each call?

I'd like to see just one call to amgetbitmap. It would a) fill in the
hash bitmap passed to it, b) return a new hash bitmap, or c) return a
new StreamBitmap, with a indexam specific next-function that returns the
tids one page at a time. I think we'll also need some kind of a
destructor in StreamBitmap that's called by the consumer of the bitmap
after it's done with it.

Right, I agree. I am working on this now.

Thanks,

gavin

#15Mark Wong
markw@osdl.org
In reply to: Luke Lonergan (#12)
Re: Bitmap index status

Luke Lonergan wrote:

Mark,

On 9/25/06 11:32 AM, "Mark Wong" <markw@osdl.org> wrote:

Yeah, basically gather as many stats as I can to accurately profile the
overall system performance. I thought it would be appropriate to use a
TPC-H based workload as one measuring stick to use for bitmap indexes.

Note that the TPC-H queries don't follow the typical good use case for
bitmap indexes. You'd like to see queries that use multiple AND and OR
clauses, otherwise there may be no benefit.

Oh right, people keep telling me that and it keeps going in one ear and
out the other...

Also, DBT-3/TPC-H on Postgres right now does not benefit from indices
overall. The planner has limitations WRT selectivity estimates and other
limitations that cause it to choose index access poorly for the query
workload. We have two new features coming (for 8.3) that fix this, but for
now we find that indexes are a net loss, in some queries a huge loss.

Great, I'll keep my eye for those. :)

If you look at the whitepaper that Ayush Parashar published, he uses the
TPC-H data with some targeted queries that showcase the best use-cases for
bitmap index.

Mark