Table clustering idea
There is a well known command called CLUSTER which organizes table
in specified index's order. It has a drawback, that new tuples added are
not in this order. Last night I had idea which could be interesting, I hope.
The idea is to make use of 'histogram_bounds' collected statistical data.
Instead of inserting row into first suitable spot in a table, a table would
be "divided" into sections, one for each of histogram_bounds ranges.
When inserting, the database would try to find most suitable section
to insert (using the histogram_bounds), and if there were free spots
there, would insert there. If not, it would either look for a tuple in
nearby
sections, or first suitable place.
What would it do? It would try to keep table somewhat organized,
keeping rows of similar values close together (within SET STATISTICS
resolution, so a common scenario would be 50 or 100 "sections").
It would make it a bit hard for a table to shrink (since new rows would
be added throughout the table, not at the beginning).
Other idea than using histogram_bounds would be using the position
of key inside the index to determine the "ideal" place of row inside
the table and find the closest free spot there. This would be of course
much more precise and wouldn't rely on statistic.
Regards,
Dawid
Dawid,
Other idea than using histogram_bounds would be using the
position of key inside the index to determine the "ideal"
place of row inside the table and find the closest free spot
there. This would be of course much more precise and wouldn't
rely on statistic.
This is generally known as "index organized tables" in other databases.
It implements a clustered storage of the table, which dramatically
speeds access on the chosen indexed column and makes inserts fast.
This also eases some of the visibility/MVCC implementation issues being
discussed on hackers, but does not help with the maintenance of
non-storage key indices.
Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses. We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?
- Luke
Import Notes
Resolved by subject fallback
Luke,
Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses. We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?
Search the archives. It's been dicussed on this list at least twice
before, that I know of.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote:
Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses. We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?
I'm curious how you'll do it, as I was once told that actually trying to
store heap data in a btree structure would be a non-starter (don't
remember why).
On a somewhat related note, I think that it would be advantageous if the
FSM had a means to prefer certain pages for a given tuple over other
pages. This would allow for a better way to keep heap and possibly index
data more compacted, and it would also be a means of keeping tables
loosely clustered. It would also make it far easier to shrink heaps that
have become bloated, because the FSM could be told to favor pages at the
beginning of the relation.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim,
On 6/26/06 8:15 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:
On a somewhat related note, I think that it would be advantageous if the
FSM had a means to prefer certain pages for a given tuple over other
pages. This would allow for a better way to keep heap and possibly index
data more compacted, and it would also be a means of keeping tables
loosely clustered. It would also make it far easier to shrink heaps that
have become bloated, because the FSM could be told to favor pages at the
beginning of the relation.
Interesting idea - page affinity implemented using the FSM.
WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
Teradata implemented a hashing filesystem for their heap storage and I've
always wondered about how they handle collision and chaining efficiently,
but it's a solved problem for sure - knowing that makes the challenge that
much easier :-)
- Luke
Jim C. Nasby wrote:
On Sun, Jun 25, 2006 at 08:04:18PM -0400, Luke Lonergan wrote:
Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses. We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?I'm curious how you'll do it, as I was once told that actually trying to
store heap data in a btree structure would be a non-starter (don't
remember why).
Ingres is now open source - they have clustering on btree/isam/hash
(it's called "modify table xx to btree on col1,col2")
Regards,
Kim
On Mon, Jun 26, 2006 at 11:31:24PM -0700, Luke Lonergan wrote:
Jim,
On 6/26/06 8:15 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:
On a somewhat related note, I think that it would be advantageous if the
FSM had a means to prefer certain pages for a given tuple over other
pages. This would allow for a better way to keep heap and possibly index
data more compacted, and it would also be a means of keeping tables
loosely clustered. It would also make it far easier to shrink heaps that
have become bloated, because the FSM could be told to favor pages at the
beginning of the relation.Interesting idea - page affinity implemented using the FSM.
WRT feasibility of BTREE organized tables, I'm not sure I see the problem.
Teradata implemented a hashing filesystem for their heap storage and I've
always wondered about how they handle collision and chaining efficiently,
but it's a solved problem for sure - knowing that makes the challenge that
much easier :-)
I know there were discussions in the past, though as per usual I can't
find them in the archives. At one point I had suggested clustering not
on a row level, but on a page level, since it doesn't really matter
terribly if the tuples in a page are clustered (worst case you can scan
the entire page).
I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item (since
items will need to move to maintain an IOT).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item (since
items will need to move to maintain an IOT).
I guess you shouldn't allow any other indexes. That's a perfectly
acceptable compromise I think... it would be still very useful for big
and narrow tables which would benefit from being clustered.
The other concern is how would you do sequential scans on the table if
items are allowed to move ? I think some other DBs have a facility to
make a "fast index scan" which is essentially a sequential scan of the
index file, something like that would be needed here too.
Cheers,
Csaba.
On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote:
I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item
(since
items will need to move to maintain an IOT).
There are clean ways to handle this. The table is organized on the
primary key, a typical requirement for IOTs. Any indexes you add to
IOT reference the primary key of the heap tuple. Since the heap and
PK index are the same thing, external indexes use the PK as the tuple
identifier.
The only caveat is that this creates performance asymmetries. IOTs
have significantly faster access through their primary keys but
slower external index access since two B-Trees have to be traversed.
An IOT is typically only used for tables that are only accessed
through their primary key. Not supporting external indexes on IOTs
is a functional implementation (and probably recommended in
practice), though most real implementations allow external indexes if
not always in their first version.
J. Andrew Rogers
jrogers@neopolitan.com
Jim,
I know there were discussions in the past, though as per usual I can't
find them in the archives.
Search on "B-Tree Organized Tables".
From what I can find, this feature isn't prohibitively useless. It's just a
singnificant amount of effort for a result which is a tradeoff. That is,
you'd *only* want to use it on tables which are *always* accessed by their
primary key.
What stopped the features AFAICT is that the interested parties weren't up to
doing the code.
--
Josh Berkus
PostgreSQL @ Sun
San Francisco