SQL advice needed

Started by Torsten Förtschabout 12 years ago5 messagesgeneral
Jump to latest
#1Torsten Förtsch
torsten.foertsch@gmx.net

Hi,

I have a volatile function that returns multiple rows. It may also
return nothing. Now, I want to write an SQL statement that calls this
function until it returns an empty result set and returns all the rows.

So, in principle I want to:

WITH RECURSIVE
t AS (
SELECT * FROM xx()
UNION ALL
SELECT * FROM xx()
)
SELECT * FROM t;

But that's not recursive because the union all part lacks a reference to t.

Next I tried this:

WITH RECURSIVE
t AS (
SELECT * FROM xx()
UNION ALL
SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t)
)
SELECT * FROM t;

But the reference to t is not allowed in a subquery.

What's the best (or at least a working) way to achieve what I want?

I can do it in plpgsql. But that would mean to accumulate the complete
result in memory first, right? I need to avoid that.

Thanks,
Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Torsten Förtsch (#1)
Re: SQL advice needed

On Mon, Mar 17, 2014 at 3:21 PM, Torsten Förtsch
<torsten.foertsch@gmx.net> wrote:

Hi,

I have a volatile function that returns multiple rows. It may also
return nothing. Now, I want to write an SQL statement that calls this
function until it returns an empty result set and returns all the rows.

So, in principle I want to:

WITH RECURSIVE
t AS (
SELECT * FROM xx()
UNION ALL
SELECT * FROM xx()
)
SELECT * FROM t;

But that's not recursive because the union all part lacks a reference to t.

Next I tried this:

WITH RECURSIVE
t AS (
SELECT * FROM xx()
UNION ALL
SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t)
)
SELECT * FROM t;

But the reference to t is not allowed in a subquery.

What's the best (or at least a working) way to achieve what I want?

I can do it in plpgsql. But that would mean to accumulate the complete
result in memory first, right? I need to avoid that.

I would test that assumption. This is better handled in loop IMO.

LOOP
RETURN QUERY SELECT * FROM xx();
IF NOT found
THEN
RETURN;
END IF;
END LOOP;

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Torsten Förtsch (#1)
Re: SQL advice needed

Torsten Förtsch wrote

Hi,

I have a volatile function that returns multiple rows. It may also
return nothing. Now, I want to write an SQL statement that calls this
function until it returns an empty result set and returns all the rows.

What's the best (or at least a working) way to achieve what I want?

I can do it in plpgsql. But that would mean to accumulate the complete
result in memory first, right? I need to avoid that.

You are describing procedural logic. If you need intermediate steps before
"returns all the rows" then either those intermediate steps stay in memory
OR you stick them on a table somewhere and, when your procedure is done,
send back a cursor over that, possibly temporary, table.

I don't think abusing WITH/RECURSIVE is going to be viable.

You should also consider whether you can do what you need using set-logic
(i.e., pure SQL). At worse it will be a learning exercise and a performance
comparator.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/SQL-advice-needed-tp5796431p5796436.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Torsten Förtsch
torsten.foertsch@gmx.net
In reply to: Merlin Moncure (#2)
Re: SQL advice needed

On 17/03/14 21:42, Merlin Moncure wrote:

I can do it in plpgsql. But that would mean to accumulate the complete

result in memory first, right? I need to avoid that.

I would test that assumption. This is better handled in loop IMO.

LOOP
RETURN QUERY SELECT * FROM xx();
IF NOT found
THEN
RETURN;
END IF;
END LOOP;

At least according to the manual it is stored in memory:

<cite>
Note: The current implementation of RETURN NEXT and RETURN QUERY stores
the entire result set before returning from the function, as discussed
above. That means that if a PL/pgSQL function produces a very large
result set, performance might be poor: data will be written to disk to
avoid memory exhaustion, but the function itself will not return until
the entire result set has been generated. A future version of PL/pgSQL
might allow users to define set-returning functions that do not have
this limitation. Currently, the point at which data begins being written
to disk is controlled by the work_mem configuration variable.
Administrators who have sufficient memory to store larger result sets in
memory should consider increasing this parameter.
</cite>

I didn't test that, though.

Torsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Torsten Förtsch (#4)
Re: SQL advice needed

On Mon, Mar 17, 2014 at 4:20 PM, Torsten Förtsch
<torsten.foertsch@gmx.net> wrote:

On 17/03/14 21:42, Merlin Moncure wrote:

I can do it in plpgsql. But that would mean to accumulate the complete

result in memory first, right? I need to avoid that.

I would test that assumption. This is better handled in loop IMO.

LOOP
RETURN QUERY SELECT * FROM xx();
IF NOT found
THEN
RETURN;
END IF;
END LOOP;

At least according to the manual it is stored in memory:

<cite>
Note: The current implementation of RETURN NEXT and RETURN QUERY stores
the entire result set before returning from the function, as discussed
above. That means that if a PL/pgSQL function produces a very large
result set, performance might be poor: data will be written to disk to
avoid memory exhaustion, but the function itself will not return until
the entire result set has been generated. A future version of PL/pgSQL
might allow users to define set-returning functions that do not have
this limitation. Currently, the point at which data begins being written
to disk is controlled by the work_mem configuration variable.
Administrators who have sufficient memory to store larger result sets in
memory should consider increasing this parameter.
</cite>

I didn't test that, though.

The manual says describes the exact opposite of what you said you
thought it does -- large result sets are paged out to disk, not stored
in memory (this is a feature). CTEs use a similar tactic so it's a
wash. The performance overhead of a tuplestore is probably not as bad
as you think -- just test it out some simulated results and monitor
performance. Either way, work_mem controls it. It's generally
dangerous to crank work_mem to huge values but it's ok to set it
temporarily via SET to huge values (say to 1GB) for a query
particularly if you know that it's only getting issued by one caller
at a time.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general