Fractal tree indexing

Started by Atri Sharmaover 13 years ago31 messageshackers

atri.jiit@gmail.com

over 13 years ago

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.

I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

Regards,

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Atri Sharma (#1)

Re: Fractal tree indexing

On 13.02.2013 11:01, Atri Sharma wrote:

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.

Hmm, sounds very similar to the GiST buffering build work Alexander
Korotkov did for 9.2. Only the buffers are for B-trees rather than GiST,
and the buffers are permanent, rather than used only during index build.
It's also somewhat similar to the fast insert mechanism in GIN, except
that the gin fast insert buffer is just a single buffer, rather than a
buffer at each node.

I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#2)

Re: Fractal tree indexing

On Wed, Feb 13, 2013 at 3:08 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 13.02.2013 11:01, Atri Sharma wrote:

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.

Hmm, sounds very similar to the GiST buffering build work Alexander Korotkov
did for 9.2. Only the buffers are for B-trees rather than GiST, and the
buffers are permanent, rather than used only during index build. It's also
somewhat similar to the fast insert mechanism in GIN, except that the gin
fast insert buffer is just a single buffer, rather than a buffer at each
node.

I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

- Heikki

Hi Heikki,

Yeah,it is pretty close to GisT, but as you said, it still works on BTree.

On the other hand, one thing I really liked about Fractal trees is
that it attempts to address the problems with BTrees. I feel fractal
trees can provide us with a new way altogether to handle new data,
rather than building on top of BTrees.

I would love to chip in, but would require lots of help :)

Do you think building a new index in postgres with fractal trees as
the basis would serve the purpose? or is there something else we
should think of?

Atri

--
Regards,

Atri
l'apprenant

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

aekorotkov@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#2)

Re: Fractal tree indexing

On Wed, Feb 13, 2013 at 1:38 PM, Heikki Linnakangas <hlinnakangas@vmware.com

wrote:

On 13.02.2013 11:01, Atri Sharma wrote:

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/**12/fractal-tree-indexing-**overview/<http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/>)
and
how TokuDB engine for MySQL is really working nicely with big data.

Hmm, sounds very similar to the GiST buffering build work Alexander
Korotkov did for 9.2. Only the buffers are for B-trees rather than GiST,
and the buffers are permanent, rather than used only during index build.
It's also somewhat similar to the fast insert mechanism in GIN, except that
the gin fast insert buffer is just a single buffer, rather than a buffer at
each node.

I was wondering, do we have support for fractal tree indexing? I mean,

it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

I remember we have already discussed fractal trees privately. Short
conclusions are so:
1) Fractal tree indexes are patented. It is distributed as commercial
extension to MySQL. So we can't include it into PostgreSQL core.
2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
extension because lack of WAL extensibility.
We could think about WAL extensibility which would help other applications
as well.

------
With best regards,
Alexander Korotkov.

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Alexander Korotkov (#4)

Re: Fractal tree indexing

Wed:

I remember we have already discussed fractal trees privately. Short
conclusions are so:
1) Fractal tree indexes are patented. It is distributed as commercial
extension to MySQL. So we can't include it into PostgreSQL core.
2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
extension because lack of WAL extensibility.
We could think about WAL extensibility which would help other applications
as well.

Sounds nice. WAL extensibility can help.

Atri

--
Regards,

Atri
l'appren

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Atri Sharma (#5)

Re: Fractal tree indexing

On Wed, Feb 13, 2013 at 10:19 AM, Atri Sharma <atri.jiit@gmail.com> wrote:

2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
extension because lack of WAL extensibility.
We could think about WAL extensibility which would help other applications
as well.

Sounds nice. WAL extensibility can help.

The problem with WAL extensibility is that extensions can come and go
and change over time. If the database can't interpret some WAL record
or misinterprets it because a module is missing or changed since that
record was written then you could lose your whole database. I think a
fundamental part of extensibility is isolating the effects of the
extensions from the rest of the system so that problem would have to
be tackled. Perhaps making each file owned by a single resource
manager and having the database be able to deal with individual files
being corrupted. But that doesn't deal with all record types and there
are situations where you really want to have part of a file contain
data managed by another resource manager.

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#6)

Re: Fractal tree indexing

Sent from my iPad

On 13-Feb-2013, at 18:21, Greg Stark <stark@mit.edu> wrote

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

How do we handle the case you mentioned, maybe a module that has been removed since a record was made? Is the solution that we encapsulate WAL from those kind of changes, and keep the WAL records same for everyone,irrespective whether they use an external module or not(I inferred this from Heikki's idea,or am I missing something here?)

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

aekorotkov@gmail.com

over 13 years ago

In reply to: Bruce Momjian (#6)

Re: Fractal tree indexing

On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark <stark@mit.edu> wrote:

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.

------
With best regards,
Alexander Korotkov.

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Alexander Korotkov (#8)

Re: Fractal tree indexing

On 13.02.2013 15:31, Alexander Korotkov wrote:

On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark<stark@mit.edu> wrote:

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.

You could have a generic WAL record that applies changes to multiple
pages atomically.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#9)

Re: Fractal tree indexing

Sent from my iPad

On 13-Feb-2013, at 19:05, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 13.02.2013 15:31, Alexander Korotkov wrote:

On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark<stark@mit.edu> wrote:

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.

You could have a generic WAL record that applies changes to multiple pages atomically.

Sounds extremely interesting and fun.How would we go about implementing it?

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#9)

Re: Fractal tree indexing

On 13 February 2013 13:35, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

You could have a generic WAL record that applies changes to multiple pages
atomically.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Simon Riggs (#11)

Re: Fractal tree indexing

Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs <simon@2ndQuadrant.com> wrote:
.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.

Does that mean we can add support for fractal tree indexes(or some thing on similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use case for generic WAL record(I am kind of obsessed with the idea of seeing fractal indexes being supported in Postgres).

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andrew Dunstan

andrew@dunslane.net

over 13 years ago

In reply to: Atri Sharma (#12)

Re: Fractal tree indexing

On 02/13/2013 09:13 AM, Atri Sharma wrote:

Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs <simon@2ndQuadrant.com> wrote:
.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.

Does that mean we can add support for fractal tree indexes(or some thing on similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use case for generic WAL record(I am kind of obsessed with the idea of seeing fractal indexes being supported in Postgres).

If they are patented as Alexander says upthread, then surely the idea is
dead in the water.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Andrew Dunstan (#13)

Re: Fractal tree indexing

If they are patented as Alexander says upthread, then surely the idea is dead in the water.

True, I think so too.

But,the generic WAL seems an awesome idea and I would love to help.

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Andrew Dunstan (#13)

Re: Fractal tree indexing

On 02/13/2013 10:43 PM, Andrew Dunstan wrote:

On 02/13/2013 09:13 AM, Atri Sharma wrote:

Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs <simon@2ndQuadrant.com> wrote:
.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.

Does that mean we can add support for fractal tree indexes(or some
thing on similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use
case for generic WAL record(I am kind of obsessed with the idea of
seeing fractal indexes being supported in Postgres).

If they are patented as Alexander says upthread, then surely the idea
is dead in the water.

Isn't practically everything patented, with varying degrees of validity
and patent defensibility? Particularly the trick of "renewing" expired
patents by submitting tiny variations.

I realise that this general situation is different to knowing about a
specific patent applying to a specific proposed technique, particularly
regarding the USA's insane triple-damages-for-knowing-about-it thing,
and that a patent can well and truly block the adoption of a technique
into Pg core. It might not prevent its implementation as an out-of-tree
extension though, even if that requires some enhancements to core APIs
to make it possible.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Atri Sharma (#3)

Re: Fractal tree indexing

Atri Sharma <atri.jiit@gmail.com> writes:

Do you think building a new index in postgres with fractal trees as
the basis would serve the purpose? or is there something else we
should think of?

First explain why you couldn't build it as an opclass for gist or
spgist ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Tom Lane (#16)

Re: Fractal tree indexing

Sent from my iPad

On 13-Feb-2013, at 20:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

First explain why you couldn't build it as an opclass for gist or
spgist ...

That needs thinking about a bit.I was confused about the current indexes because they all build on BTrees.But, building an opclass with GiST should be a good solution.

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Atri Sharma (#17)

Re: Fractal tree indexing

On 13.02.2013 17:49, Atri Sharma wrote:

On 13-Feb-2013, at 20:30, Tom Lane<tgl@sss.pgh.pa.us> wrote:

First explain why you couldn't build it as an opclass for gist or
spgist ...

That needs thinking about a bit.I was confused about the current indexes because they all build on BTrees.But, building an opclass with GiST should be a good solution.

That makes no sense. I don't see any way to implement this in an
opclass, and it wouldn't make sense to re-implement this for every
opclass anyway.

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.
This speeds up insertions, as you don't need to fetch and update the
right leaf page on every insert; the lower-level pages are updated in
batch as a buffer fills up.

As I said earlier, this is very similar to the way the GiST buffering
build algorithm works. It could be applied to any tree-structured access
method, including b-tree, GiST and SP-GiST.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Atri Sharma

atri.jiit@gmail.com

over 13 years ago

In reply to: Heikki Linnakangas (#18)

Re: Fractal tree indexing

Sent from my iPad
.

That makes no sense. I don't see any way to implement this in an opclass, and it wouldn't make sense to re-implement this for every opclass anyway.

The basic idea of a fractal tree index is to attach a buffer to every non-leaf page. On insertion, instead of descending all the way down to the correct leaf page, the new tuple is put on the buffer at the root page. When that buffer fills up, all the tuples in the buffer are cascaded down to the buffers on the next level pages. And recursively, whenever a buffer fills up at any level, it's flushed to the next level. This speeds up insertions, as you don't need to fetch and update the right leaf page on every insert; the lower-level pages are updated in batch as a buffer fills up.

As I said earlier, this is very similar to the way the GiST buffering build algorithm works. It could be applied to any tree-structured access method, including b-tree, GiST and SP-GiST.

Can we implement it in a generic manner then? I mean,irrespective of the tree it is being applied to,be it BTree,gist or spgist?

Another thing,in case of a large tree which is split over multiple pages, how do we reduce the cost of I/o to fetch and rewrite all those pages?

Atri

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Heikki Linnakangas (#18)

Re: Fractal tree indexing

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.

[ scratches head... ] What's "fractal" about that? Or is that just a
content-free marketing name for this technique?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Andrew Dunstan

andrew@dunslane.net

over 13 years ago

In reply to: Tom Lane (#20)

#22

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Tom Lane (#20)

#23

Heikki Linnakangas