recursive query too big to complete. are there any strategies to limit/partition?

Started by Jonathan Vanascoabout 9 years ago3 messagesgeneral
Jump to latest
#1Jonathan Vanasco
postgres@2xlp.com

There are over 20 million records in a self-referential database table, where one record may point to another record as a descendant.

Because of a bug in application code, there was no limit on recursion. The max was supposed to be 4. A few outlier records have between 5 and 5000 descendants (there could be more. I manually found one chain of 5000.

I need to find all the chains of 5+ and mark them for update/deletion. While the database is about 10GB, the recursive search is maxing out on diskspace and causing a failure (there was over over 100GB of workspace free)

Is there any way to make a recursive query work, or will I have to use another means and just iterate over the entire dataset (either in postgres or an external service)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Jonathan Vanasco (#1)
Re: recursive query too big to complete. are there any strategies to limit/partition?

On Thu, Jan 26, 2017 at 4:37 PM, Jonathan Vanasco <postgres@2xlp.com> wrote:

There are over 20 million records in a self-referential database table,
where one record may point to another record as a descendant.

Because of a bug in application code, there was no limit on recursion.
The max was supposed to be 4. A few outlier records have between 5 and
5000 descendants (there could be more. I manually found one chain of 5000.

I need to find all the chains of 5+ and mark them for update/deletion.
While the database is about 10GB, the recursive search is maxing out on
diskspace and causing a failure (there was over over 100GB of workspace
free)

Is there any way to make a recursive query work, or will I have to use
another means and just iterate over the entire dataset (either in postgres
or an external service)

​Thinking aloud - why doesn't just finding every record with 5 descendants
not work? Any chain longer than 5 would have at least 5 items.

​Even without recursion you could build out a five-way self-join and any
records that make it that far are guilty. I suppose this assumes your
setup is non-cyclic.

David J.

#3Jonathan Vanasco
postgres@2xlp.com
In reply to: David G. Johnston (#2)
Re: recursive query too big to complete. are there any strategies to limit/partition?

On Jan 26, 2017, at 7:07 PM, David G. Johnston wrote:

​Thinking aloud - why doesn't just finding every record with 5 descendants not work? Any chain longer than 5 would have at least 5 items.

Oh it works. This is why I ask these questions -- new perspectives!

​Even without recursion you could build out a five-way self-join and any records that make it that far are guilty. I suppose this assumes your setup is non-cyclic.

There could be cyclic records, but that's easy to filter out. A first approach took 40 seconds to run. A little tweaking is necessary, but this is a great start.

THANK YOU! You saved me!