Need theory/comprehension help on Multi-Column indexes

Started by Josh Berkusabout 21 years ago4 messages
#1Josh Berkus
josh@agliodbs.com

Folks,

I've been poking around the indexing code, and I really don't understand the
page structure and splittng/branching for multi-column BTree indexes. I've
looked in a couple DB textbooks to get a theoretically underpinning of the
structure of multi-column indexes, but none of the ones I've seen cover them.
Can someone help me out?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#2Merlin Moncure
merlin.moncure@rcsonline.com
In reply to: Josh Berkus (#1)
Re: Need theory/comprehension help on Multi-Column indexes

Folks,

I've been poking around the indexing code, and I really don't

understand

the
page structure and splittng/branching for multi-column BTree indexes.
I've
looked in a couple DB textbooks to get a theoretically underpinning of

the

structure of multi-column indexes, but none of the ones I've seen

cover

them.
Can someone help me out?

Heh. You haven't done much programming in COBOL. The basic idea is to
combine the multiple fields in a sequence of bytes (reversible into the
original fields) and do a straight strcmp()
int c(6) n(2)
So you have key k on t(f1, f2, f3)
And do an insert to t(1, 'abc', 44)
The datum
******* **
"00000001 abc44" gets applied to the index. The values below the stars
are the lowest values supported by that particular type. The
requirement being that for a type to be indexible it must have able to
be mutated into a fixed length string.

At least, that is the simple way to do it. It is also possible to
create an index using discreet fields and the type's built in Boolean
comparison. This is more complicated, for example to find out if
t(a,b,c) > t(a1,b1,c1)
You have to check
a >= a1 and
(a > a1 or b >= b1) and
(a > a1 or b > b1 or c > c1)
or the Boolean reverse of the above:
a > a1 or
(a >= a1 and b > b1) or
(a >= a1 or b >= b1 or c > c1)

The above expression would have to be applied to generate a comparison
between an input value and a stored key value.
Merlin

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#1)
Re: Need theory/comprehension help on Multi-Column indexes

Josh Berkus <josh@agliodbs.com> writes:

I've been poking around the indexing code, and I really don't understand the
page structure and splittng/branching for multi-column BTree indexes.

It's not fundamentally different from single-column indexes. The only
aspect of a btree index that requires any knowledge about the content of
index entries is the "compare two index entries for lesser, equal, or
greater" operation. For that, we just compare the first columns, then
compare the second columns if the first are equal, etc. Plain
lexicographic sort semantics.

Everything else in the btree code just considers an index entry to be an
undifferentiated tuple.

regards, tom lane

#4Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#3)
Re: Need theory/comprehension help on Multi-Column indexes

Tom, Merlin,

It's not fundamentally different from single-column indexes. The only
aspect of a btree index that requires any knowledge about the content of
index entries is the "compare two index entries for lesser, equal, or
greater" operation. For that, we just compare the first columns, then
compare the second columns if the first are equal, etc. Plain
lexicographic sort semantics.

So the different columns of the index don't have seperate data pages? It's
just a partitioned index node?

Wow, no wonder I couldn't figure it out, I was looking for something more
complicated ...

BTW, while we're on the optimizer, what is random_page_cost supposed to
represent, exactly? I used to think it was the ratio of index page
retreivals to direct page retrievals, but I see that that's already being
calculated for. I'm wondering if it might be possible to calculate RPC and
eliminate it as a GUC.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco