recursing down a tree
hi,
say i have a table with something like
id,parent,description
and parent points to an id of the very same table. now i have
a specific id and i want to recurse down the tree to the root.
is it in any way possible to do that without to doing a sql query
for each level ? until today i always saved the parent and formulated
a new query using it to get the next upper level. however, it seems to
me that this might be quite some work given a tree of a larger size.
thanks for your ideas
carl
Carl,
you certainly need our "famous" module contrib/tree
(http://www.sai.msu.su/~megera/postgres/gist/)
Documentation is sparse and written in english but some people in
mailing list already use it.
Oleg
On Fri, 28 Jun 2002, Carl Meyer wrote:
hi,
say i have a table with something like
id,parent,description
and parent points to an id of the very same table. now i have
a specific id and i want to recurse down the tree to the root.
is it in any way possible to do that without to doing a sql query
for each level ? until today i always saved the parent and formulated
a new query using it to get the next upper level. however, it seems to
me that this might be quite some work given a tree of a larger size.thanks for your ideas
carl---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Carl Meyer wrote:
say i have a table with something like
id,parent,description
and parent points to an id of the very same table. now i have
a specific id and i want to recurse down the tree to the root.
is it in any way possible to do that without to doing a sql query
for each level ? until today i always saved the parent and formulated
a new query using it to get the next upper level. however, it seems to
me that this might be quite some work given a tree of a larger size.thanks for your ideas
carl
Well, this calls for recursive union support as per SQL '99. The
query would be
CREATE TABLE MyTree(id, parent, description);
WITH tmp(id) AS (
SELECT parent FROM MyTree WHERE id=$myId
UNION ALL
SELECT MyTree.parent
FROM tmp INNER JOIN MyTree ON tmp.id = MyTree.id
) SELECT *;
This is most powerful if you search for children though, since
then you get the full power of the recursive union join to find
more and more stuff:
WITH tmp(id) AS (
SELECT id FROM MyTree WHERE id=$myId
UNION ALL
SELECT MyTree.id
FROM tmp INNER JOIN MyTree ON MyTree.parent = tmp.id
) SELECT *;
Now, this is not supported in PostgreSQL, and I only know of DB2
implementing it actually. But it's way cool!
In the absence of this feature ... and even to speed some of such
queries up, you can use some tree representation in a special
field. There are many approaches. An interval method is theoretically
nice because it has a constant storage requirement regardless of
depth and fan-out of the tree. Oleg's GiST tree type is certainly
very nice here. If you don't want to use special data types, you
can use some path expression string, but the problem is you'll
have to have leading zeroes in each path step.
regards
-Gunther
--
Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org
Medical Information Scientist Regenstrief Institute for Health Care
Adjunct Assistant Professor Indiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org
Carl Meyer wrote:
hi,
say i have a table with something like
id,parent,description
Depending on what your most common types of queries are, you might be
better off with Joe Celko's nested set model (from the book SQL for
Smarties). It would do:
id, left, right, description
where you traverse the tree depth-first and as you pass the left side of
each node you increment a counter (left) and as you pass by it again on
the right you increment it again (right).
This model provides much easier ways of making queries such as "give me
all descendants of id=2" because there's then no recursion (WHERE
child.left between parent.left and parent.right).
I had a table set up exactly as you have described, and when the
recursion proved too costly in terms of db performance, I went to nested
set and we're pleased with the results.
Just another option,
Fran
On Fri, 28 Jun 2002, Carl Meyer wrote:
id,parent,description
and parent points to an id of the very same table. now i have
a specific id and i want to recurse down the tree to the root.
is it in any way possible to do that without to doing a sql query
for each level ?
There are a bunch of different ways to do this (several of which
have been mentioned already), and a lot of it depends on the type
of data you store, how big your trees are, etc. etc.
One really, really fast option, if you don't need to be moving up
and down arbitrary levels, but just need to find the root, or find
all children in a given tree, or similar operations, is to add a
column holding the ID of the root of the tree the node is in:
id, root_id, parent_id, description
I was tending to use smaller trees, though, (typically a few hundred,
almost never more than a few thousand nodes), and so for any more
complex searches manipulation I'd just load the whole tree into
memory with one query. (I had a clustered index on root_id--this
was MS SQL Server--and so this load was extremely fast; just a
matter of reading a few disk blocks.) This turned out to be far
more efficient and faster than doing three or four queries to get
up to the root.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
I did a little research on the subject and it seemed quite interesting.
However, it seemed that insertion would require updating most of the tree for
a minor insert. Can you send along a few more details about your setup and
algorithms? I would like to find more information about this.
Regards,
Jeff
Show quoted text
On Tuesday 02 July 2002 06:07 am, Fran Fabrizio wrote:
Carl Meyer wrote:
hi,
say i have a table with something like
id,parent,description
Depending on what your most common types of queries are, you might be
better off with Joe Celko's nested set model (from the book SQL for
Smarties). It would do:id, left, right, description
where you traverse the tree depth-first and as you pass the left side of
each node you increment a counter (left) and as you pass by it again on
the right you increment it again (right).This model provides much easier ways of making queries such as "give me
all descendants of id=2" because there's then no recursion (WHERE
child.left between parent.left and parent.right).I had a table set up exactly as you have described, and when the
recursion proved too costly in terms of db performance, I went to nested
set and we're pleased with the results.Just another option,
Fran---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
Jeff, some detailed information about nested set model by Celco is in
http://www.mip.berkeley.edu/mip/related/thesaurus/thesaurus.pdf
Generally, if you want to support fast online update of tree stucture,
you have to maintain link information separately - parent, child, level
but this is very slow for select. If you code link information into
something, you will have a very fast select but slow insert.
I didn't found good compomise :( It's a nature of relational database,
when tuple know nothing except itself. We develope our module
tree ( www.sai.msu.su/~megera/postgres/gist) to support tree data type
using path information, so tuple does know hierarchical information
and we was able to provide index support. Unfortunately, it's very fast
for select. I think solution could be separating working with tree
structure (editing) and the exporting into our data type for
public usage.
Oleg
On Thu, 4 Jul 2002, Jeff Davis wrote:
I did a little research on the subject and it seemed quite interesting.
However, it seemed that insertion would require updating most of the tree for
a minor insert. Can you send along a few more details about your setup and
algorithms? I would like to find more information about this.Regards,
JeffOn Tuesday 02 July 2002 06:07 am, Fran Fabrizio wrote:
Carl Meyer wrote:
hi,
say i have a table with something like
id,parent,description
Depending on what your most common types of queries are, you might be
better off with Joe Celko's nested set model (from the book SQL for
Smarties). It would do:id, left, right, description
where you traverse the tree depth-first and as you pass the left side of
each node you increment a counter (left) and as you pass by it again on
the right you increment it again (right).This model provides much easier ways of making queries such as "give me
all descendants of id=2" because there's then no recursion (WHERE
child.left between parent.left and parent.right).I had a table set up exactly as you have described, and when the
recursion proved too costly in terms of db performance, I went to nested
set and we're pleased with the results.Just another option,
Fran---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
I did a little research on the subject and it seemed quite
interesting. However, it seemed that insertion would require updating
most of the tree for
a minor insert. Can you send along a few more details about your setup
and algorithms? I would like to find more information about this. <<
I am writing a separate book on "Trees in SQL" which will cover
several different models; I hope to be done by the end of the year. I
also hope to win the lottery.
Updating is not the problem people think it is. The nodes are in one
table and the structure is in another. The Tree table has (node_id,
lft, rgt) in its rows and those the rows are very short; a lot of them
fit into main storage at once. Since the newest addition (i.e.
youngest sibling) is made on the right side of the immediate
subordinates (siblings are ordered in the Nested Set model), you do
not have to update half the nodes on average. Finally, in the real
world, the tree structure tends to stay the same and the nodes change
-- even dot-com firms don't re-organize faster than their personnel
turnover <g>.
However, someone did have a situation where they used the nested set
model for the message threads in a newsgroup front end. Their answer
to this "fixed nodes and changing structure" problem was to use large
gaps in the numbering of (lft, rgt) pairs instead of incrementing
(lft, rgt) by one.
Think about it; subordination is shown with the BETWEEN predicate; any
sequence of unique, nested numbers will work. Subtree size is shown
by the formula ((rgt-lft+1)/2) which does depend on an increment and
happens to also give you subordination.
On 12 Jul 2002, --CELKO-- wrote:
I did a little research on the subject and it seemed quite
interesting. However, it seemed that insertion would require updating
most of the tree for
a minor insert. Can you send along a few more details about your setup
and algorithms? I would like to find more information about this. <<I am writing a separate book on "Trees in SQL" which will cover
several different models; I hope to be done by the end of the year. I
also hope to win the lottery.
Cool. btw, we have developed contrib/tree module which is very fast
for r/o operations. It's available on our GiST development page
http://www.sai.msu.su/~megera/postgres/gist/tree/README.tree.english
The idea is simple - to code path information to the node name,
so tuple itself does know it's place in hierarchy. It's sort of
cheating of relational data model. Also, we're developing another
module - contrib/ltree which will be much more powerful (we use
real names in path instead of digits as in contrib/tree).
Updating is not the problem people think it is. The nodes are in one
table and the structure is in another. The Tree table has (node_id,
lft, rgt) in its rows and those the rows are very short; a lot of them
fit into main storage at once. Since the newest addition (i.e.
youngest sibling) is made on the right side of the immediate
subordinates (siblings are ordered in the Nested Set model), you do
not have to update half the nodes on average. Finally, in the real
world, the tree structure tends to stay the same and the nodes change
-- even dot-com firms don't re-organize faster than their personnel
turnover <g>.However, someone did have a situation where they used the nested set
model for the message threads in a newsgroup front end. Their answer
to this "fixed nodes and changing structure" problem was to use large
gaps in the numbering of (lft, rgt) pairs instead of incrementing
(lft, rgt) by one.Think about it; subordination is shown with the BETWEEN predicate; any
sequence of unique, nested numbers will work. Subtree size is shown
by the formula ((rgt-lft+1)/2) which does depend on an increment and
happens to also give you subordination.---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
On 12 Jul 2002, --CELKO-- wrote:
I am writing a separate book on "Trees in SQL" which will cover
several different models; I hope to be done by the end of the year. I
also hope to win the lottery.
Are you *the* Joe Celko? "I'm not worthy! I'm not worthy!" :-)
Though I hope you'll let us know when your book comes out.
Updating is not the problem people think it is. The nodes are in one
table and the structure is in another. The Tree table has (node_id,
lft, rgt) in its rows and those the rows are very short; a lot of them
fit into main storage at once.
Unfortunately, this is not such a great assumption for postgres,
unless you otherwise have longish rows. The row overhead is over
40 bytes, including the row pointers in the page (giving you about
150 rows per page, when all is said and done). So with smallish
rows anyway, and depending on the application, and yada yada yada,
you might be better off not using a separate table.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC