WITH RECURSIVE doesn't work properly for me
I use following command to get a shortest-path query:
with recursive paths( src_id, dest_id, dist) as(
select n1,n2,1
from nodes
union
select src_id, dest_id, min(dist)
from ( select paths.src_id as src_id, nodes.n2 as dest_id,
paths.dist+1 as dist
from paths, nodes
where paths.dest_id=nodes.n1
and paths.src_id<>nodes.n2
) as Temp
group by src_id, dest_id
)
select paths.src_id, paths.dest_id, min(dist)
from paths
group by 1,2;
It seems that this query goes into infinite loops and finally run out of
disk space. However, I testrf every iteration seperately and found that it
will converge after 3-4 iterations. I wonder where is the problem. Could
anyone help with it? The attatchment is the test data.
Thank you very much:)
Attachments:
data.txttext/plain; charset=US-ASCII; name=data.txtDownload
Jing Fan wrote:
I use following command to get a shortest-path query:
with recursive paths( src_id, dest_id, dist) as(
select n1,n2,1
from nodes
union
select src_id, dest_id, min(dist)
from ( select paths.src_id as src_id, nodes.n2 as dest_id, paths.dist+1 as dist
from paths, nodes
where paths.dest_id=nodes.n1
and paths.src_id<>nodes.n2
) as Temp
group by src_id, dest_id
)
select paths.src_id, paths.dest_id, min(dist)
from paths
group by 1,2;It seems that this query goes into infinite loops and finally run out of disk space. However, I testrf
every iteration seperately and found that it will converge after 3-4 iterations. I wonder where is the
problem. Could anyone help with it? The attatchment is the test data.
The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".
The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.
To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);
Your recursive WITH clause (CTE) will now happily produce:
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
and so on to infinity, which is why you will eventually run
out of space.
The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.
One solution I can envision is to put a limit on the distance,
for example the total count of nodes.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I have two group operations.
One is inside the CTE ( union
select src_id, dest_id, min(dist) ),
another is outside the CTE.
Do you mean that even the grouping inside the CTE will be calculated only
after the CTE has been calculated?
Thank you very much:)
Best,
Jing
On Tue, Nov 5, 2013 at 5:04 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Jing Fan wrote:
I use following command to get a shortest-path query:
with recursive paths( src_id, dest_id, dist) as(
select n1,n2,1
from nodes
union
select src_id, dest_id, min(dist)
from ( select paths.src_id as src_id, nodes.n2 as dest_id,paths.dist+1 as dist
from paths, nodes
where paths.dest_id=nodes.n1
and paths.src_id<>nodes.n2
) as Temp
group by src_id, dest_id
)
select paths.src_id, paths.dest_id, min(dist)
from paths
group by 1,2;It seems that this query goes into infinite loops and finally run out of
disk space. However, I testrf
every iteration seperately and found that it will converge after 3-4
iterations. I wonder where is the
problem. Could anyone help with it? The attatchment is the test data.
The attached test data suggest different table and column names,
but I assume that you mean "edge" when you write "nodes" and
that columns "n1" and "n2" are really "src_id" and "dest_id".The endless loop occurs because there are loops in your
directed graph, but you only exclude circles where beginning
is equal to end.To quote three lines from your attachment:
INSERT INTO edge (src_id, dest_id) VALUES (1739, 6236);
INSERT INTO edge (src_id, dest_id) VALUES (6236, 1739);
INSERT INTO edge (src_id, dest_id) VALUES (3384, 6236);Your recursive WITH clause (CTE) will now happily produce:
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
and so on to infinity, which is why you will eventually run
out of space.The grouping (and any other processing in your main query)
takes place only after the CTE has been calculated, so while
your query would in theory return the desired result, it does
so only after calculating infinitely many intermediate rows.One solution I can envision is to put a limit on the distance,
for example the total count of nodes.Yours,
Laurenz Albe
Jing Fan wrote:
I have two group operations.
One is inside the CTE ( union
select src_id, dest_id, min(dist) ),
another is outside the CTE.
Do you mean that even the grouping inside the CTE will be calculated only after the CTE has been
calculated?
I mean the one outside the CTE.
The one inside does not do anything at all, you could omit it.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Why the one inside does not do anything? It won't be executed?
Best,
Jing
On Tue, Nov 5, 2013 at 8:52 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Jing Fan wrote:
I have two group operations.
One is inside the CTE ( union
select src_id, dest_id, min(dist) ),
another is outside the CTE.
Do you mean that even the grouping inside the CTE will be calculatedonly after the CTE has been
calculated?
I mean the one outside the CTE.
The one inside does not do anything at all, you could omit it.
Yours,
Laurenz Albe
Jing Fan wrote:
Why the one inside does not do anything? It won't be executed?
It is executed.
It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
If the grouping inside CTE is executed, I don't think it would generate
result like
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11
for we have min(dist),
so it should be like
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
other values will be eliminated by min(). It actually generate no new
tuples and the iteration should stop.
Best,
Jing
On Tue, Nov 5, 2013 at 9:28 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Jing Fan wrote:
Why the one inside does not do anything? It won't be executed?
It is executed.
It might filter out the occasional row, but if you look at
the example I gave you, you'll see that it won't do anything
to keep it from recursing.Yours,
Laurenz Albe
Jing Fan wrote:
If the grouping inside CTE is executed, I don't think it would generate result like
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11for we have min(dist),
so it should be likesrc_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2other values will be eliminated by min(). It actually generate no new tuples and the iteration should
stop.
You forget that the grouping query only spans the second branch
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.
Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:
WITH RECURSIVE paths (path) AS (
SELECT ARRAY[src_id, dest_id] FROM edge
UNION ALL
SELECT edge.src_id || paths.path
FROM paths, edge
WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;
The whole exercise sounds a bit like homework to me.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I am sorry but I still don't understand why it doesn't work. Possibly I
misunderstand how with recursive works?
In my opinion,
with recursive table as{
seed statement
union
recursive statement
}
In every iteration, It will just generate results from seed statement union
recursive statement and put them into a new temporary table, and then
compare the results with the former temporary table and check if there are
any new tuples. If no new tuples, just stop iteration. Is there any tricky
things about recursive statement?
Thank you very much:)
Best,
Jing
On Wed, Nov 6, 2013 at 2:13 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Jing Fan wrote:
If the grouping inside CTE is executed, I don't think it would generate
result like
src_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2
3384 | 6236 | 3
3384 | 1739 | 4
3384 | 6236 | 5
3384 | 1739 | 6
3384 | 6236 | 7
3384 | 1739 | 8
3384 | 6236 | 9
3384 | 1739 | 10
3384 | 6236 | 11for we have min(dist),
so it should be likesrc_id | dest_id | dist
--------+---------+------
3384 | 6236 | 1
3384 | 1739 | 2other values will be eliminated by min(). It actually generate no new
tuples and the iteration should
stop.
You forget that the grouping query only spans the second branch
of the UNION, where you add the new entries.
So the new entries and the old entries won't be grouped together,
and the new paths that are longer than the old ones won't get removed.Unfortunately you cannot have the UNION in a subquery for
recursive CTEs, but you could use arrays to achieve what you want:WITH RECURSIVE paths (path) AS (
SELECT ARRAY[src_id, dest_id] FROM edge
UNION ALL
SELECT edge.src_id || paths.path
FROM paths, edge
WHERE edge.dest_id = paths.path[array_lower(paths.path, 1)]
AND edge.src_id <> ALL (paths.path)
)
SELECT path[1], path[array_upper(path, 1)], min(array_length(path, 1))
FROM paths
GROUP BY 1, 2;The whole exercise sounds a bit like homework to me.
Yours,
Laurenz Albe
Jing Fan wrote:
I am sorry but I still don't understand why it doesn't work. Possibly I misunderstand how with
recursive works?
In my opinion,
with recursive table as{
seed statement
union
recursive statement
}
In every iteration, It will just generate results from seed statement union recursive statement and
put them into a new temporary table, and then compare the results with the former temporary table and
check if there are any new tuples. If no new tuples, just stop iteration. Is there any tricky things
about recursive statement?
That is correct.
Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.
Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.
Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).
Then the UNION will take care of the first triple, but the second
one will be added in this iteration.
And so on ad infinitum.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
But after this iteration, the paths will be:
A B 1
B C 1
C B 1
A C 2
A B 3
in next iteration, the recursive statement will generate (A,C,2), (A,B,3),
and (A,C,4), after the group by, it will still be (A,C,2) and (A,B,3)
so I think it should stop after this iteration.
On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Jing Fan wrote:
I am sorry but I still don't understand why it doesn't work. Possibly I
misunderstand how with
recursive works?
In my opinion,
with recursive table as{
seed statement
union
recursive statement
}
In every iteration, It will just generate results from seed statementunion recursive statement and
put them into a new temporary table, and then compare the results with
the former temporary table and
check if there are any new tuples. If no new tuples, just stop
iteration. Is there any tricky things
about recursive statement?
That is correct.
Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).Then the UNION will take care of the first triple, but the second
one will be added in this iteration.And so on ad infinitum.
Yours,
Laurenz Albe
Jing Fan wrote:
On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).Then the UNION will take care of the first triple, but the second
one will be added in this iteration.And so on ad infinitum.
But after this iteration, the paths will be:
A B 1
B C 1
C B 1
A C 2
A B 3in next iteration, the recursive statement will generate (A,C,2), (A,B,3), and (A,C,4), after the
group by, it will still be (A,C,2) and (A,B,3)
so I think it should stop after this iteration.
I see, I didn't notice that.
Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:
In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.
So in our case, the working table will look like this:
Initially:
A B 1
B C 1
C B 1
After the first iteration:
A C 2
After the third iteration:
A B 3
After the fourth iteration:
A C 4
... and so on.
Your GROUP BY assumes that the working table contains
all previously generated rows.
Yours,
Laurenz Albe
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yeah, that can explain why it doesn't work.
Thank you very much:)
On Wed, Nov 6, 2013 at 8:40 AM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote:
Show quoted text
Jing Fan wrote:
On Wed, Nov 6, 2013 at 8:10 AM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:
Let's assume that we have three nodes A, B and C.
Also, A points to B, B points to C and C points to B.Let's assume that we already generated (A, B, 1) and (A, C, 2)
in previous iterations.Then the "recursive statement" will generate the new
rows (A, C, 2) and (A, B, 3).
The SELECT ... GROUP BY only surrounds the recursive statement,
So the result will still be (A, C, 2) and (A, B, 3).Then the UNION will take care of the first triple, but the second
one will be added in this iteration.And so on ad infinitum.
But after this iteration, the paths will be:
A B 1
B C 1
C B 1
A C 2
A B 3in next iteration, the recursive statement will generate (A,C,2),
(A,B,3), and (A,C,4), after the
group by, it will still be (A,C,2) and (A,B,3)
so I think it should stop after this iteration.I see, I didn't notice that.
Actually there is a mistake in my explanation above, see
http://www.postgresql.org/docs/9.3/static/queries-with.html#QUERIES-WITH-SELECT
"Recursive Query Evaluation" for a detailed explanation:In step 2b, the "working table" is replaced with the "intermediate table",
so the next iteration does not see all previously generated rows,
but only the ones that were generated in the previous iteration.So in our case, the working table will look like this:
Initially:
A B 1
B C 1
C B 1After the first iteration:
A C 2After the third iteration:
A B 3After the fourth iteration:
A C 4... and so on.
Your GROUP BY assumes that the working table contains
all previously generated rows.Yours,
Laurenz Albe