Need theory/comprehension help on Multi-Column indexes
Folks,
I've been poking around the indexing code, and I really don't understand the
page structure and splittng/branching for multi-column BTree indexes. I've
looked in a couple DB textbooks to get a theoretically underpinning of the
structure of multi-column indexes, but none of the ones I've seen cover them.
Can someone help me out?
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
Folks,
I've been poking around the indexing code, and I really don't
understand
the
page structure and splittng/branching for multi-column BTree indexes.
I've
looked in a couple DB textbooks to get a theoretically underpinning of
the
structure of multi-column indexes, but none of the ones I've seen
cover
them.
Can someone help me out?
Heh. You haven't done much programming in COBOL. The basic idea is to
combine the multiple fields in a sequence of bytes (reversible into the
original fields) and do a straight strcmp()
int c(6) n(2)
So you have key k on t(f1, f2, f3)
And do an insert to t(1, 'abc', 44)
The datum
******* **
"00000001 abc44" gets applied to the index. The values below the stars
are the lowest values supported by that particular type. The
requirement being that for a type to be indexible it must have able to
be mutated into a fixed length string.
At least, that is the simple way to do it. It is also possible to
create an index using discreet fields and the type's built in Boolean
comparison. This is more complicated, for example to find out if
t(a,b,c) > t(a1,b1,c1)
You have to check
a >= a1 and
(a > a1 or b >= b1) and
(a > a1 or b > b1 or c > c1)
or the Boolean reverse of the above:
a > a1 or
(a >= a1 and b > b1) or
(a >= a1 or b >= b1 or c > c1)
The above expression would have to be applied to generate a comparison
between an input value and a stored key value.
Merlin
Import Notes
Resolved by subject fallback
Josh Berkus <josh@agliodbs.com> writes:
I've been poking around the indexing code, and I really don't understand the
page structure and splittng/branching for multi-column BTree indexes.
It's not fundamentally different from single-column indexes. The only
aspect of a btree index that requires any knowledge about the content of
index entries is the "compare two index entries for lesser, equal, or
greater" operation. For that, we just compare the first columns, then
compare the second columns if the first are equal, etc. Plain
lexicographic sort semantics.
Everything else in the btree code just considers an index entry to be an
undifferentiated tuple.
regards, tom lane
Tom, Merlin,
It's not fundamentally different from single-column indexes. The only
aspect of a btree index that requires any knowledge about the content of
index entries is the "compare two index entries for lesser, equal, or
greater" operation. For that, we just compare the first columns, then
compare the second columns if the first are equal, etc. Plain
lexicographic sort semantics.
So the different columns of the index don't have seperate data pages? It's
just a partitioned index node?
Wow, no wonder I couldn't figure it out, I was looking for something more
complicated ...
BTW, while we're on the optimizer, what is random_page_cost supposed to
represent, exactly? I used to think it was the ratio of index page
retreivals to direct page retrievals, but I see that that's already being
calculated for. I'm wondering if it might be possible to calculate RPC and
eliminate it as a GUC.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco