The flinfo->fn_extra question, from me this time.

Started by Chapman Flackover 6 years ago22 messages

chap@anastigmatix.net

over 6 years ago

Hi hackers,

I see evidence on this list that it's sort of a rite of passage
to ask the flinfo->fn_extra question, and my time has come.

So please let me know if I seem to correctly understand the limits
on its use.

I gather that various extensions use it to stash various things. But
(I assume) ... they will only touch fn_extra in FmgrInfo structs that
pertain to *their own functions*. (Please say that's true?)

IOW, it is assured that, if I am a language handler, when I am called
to handle a function in my language, fn_extra is mine to use as I please ...

... with the one big exception, if I am handling a function in my language
that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
and then I can stash my stuff in gunk->user_fctx for the duration of that
SRF call.

Does that seem to catch the essentials?

Thanks,
-Chap

p.s.: noticed in fmgr/README: "Note that simple "strict" functions can
ignore both isnull and args[i].isnull, since they won't even get called
when there are any TRUE values in args[].isnull."

I get why a strict function can ignore args[i].isnull, but is the part
about ignoring isnull a mistake? A strict function can be passed all
non-null arguments and still return null if it wants to, right?

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Chapman Flack (#1)

Re: The flinfo->fn_extra question, from me this time.

Chapman Flack <chap@anastigmatix.net> writes:

So please let me know if I seem to correctly understand the limits
on its use.

I gather that various extensions use it to stash various things. But
(I assume) ... they will only touch fn_extra in FmgrInfo structs that
pertain to *their own functions*. (Please say that's true?)

IOW, it is assured that, if I am a language handler, when I am called
to handle a function in my language, fn_extra is mine to use as I please ...

Yup.

... with the one big exception, if I am handling a function in my language
that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
and then I can stash my stuff in gunk->user_fctx for the duration of that
SRF call.

Yup. (Of course, you don't have to use the SRF_FIRSTCALL_INIT
infrastructure.)

Keep in mind that in most contexts, whatever you cache in fn_extra
will only be there for the life of the current query.

regards, tom lane

Chapman Flack

chap@anastigmatix.net

over 6 years ago

In reply to: Tom Lane (#2)

Re: The flinfo->fn_extra question, from me this time.

On 06/15/19 21:21, Tom Lane wrote:

Yup. (Of course, you don't have to use the SRF_FIRSTCALL_INIT
infrastructure.)

That had crossed my mind ... but it seems there's around 80 or 100
lines of good stuff there that'd be a shame to duplicate. If only
init_MultiFuncCall() took an extra void ** argument, and the stock
SRF_FIRSTCALL_INIT passed &(fcinfo->flinfo->fn_extra), seems like
most of it would be reusable. shutdown_MultiFuncCall would need to work
slightly differently, and a caller who wanted to be different would need
a customized variant of SRF_PERCALL_SETUP, but that's two lines.

Cheers,
-Chap

Chapman Flack

chap@anastigmatix.net

over 6 years ago

In reply to: Chapman Flack (#3)

Re: The flinfo->fn_extra question, from me this time.

On 06/15/19 21:46, Chapman Flack wrote:

On 06/15/19 21:21, Tom Lane wrote:

Yup. (Of course, you don't have to use the SRF_FIRSTCALL_INIT
infrastructure.)

That had crossed my mind ... but it seems there's around 80 or 100
lines of good stuff there that'd be a shame to duplicate. If only

I suppose that's only if I want to continue using SFRM_ValuePerCall mode.

SFRM_Materialize mode could remove a good deal of complexity currently
in PL/Java around managing memory contexts, SPI_connect, etc. through
multiple calls ... and I'd also have fn_extra all to myself.

Until now, I had assumed that SFRM_ValuePerCall mode might offer some
benefits, such as the possibility of pipelining certain queries and not
building up a whole tuplestore in advance.

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Am I right about that? Are there other sites from which a SRF might be
called that I haven't found, where ValuePerCall mode might actually
support some form of pipelining? Are there actual cases where allowedModes
might not contain SFRM_Materialize?

Or is the ValuePerCall variant currently there just to support possible
future such cases, none of which exist at the moment?

Regards,
-Chap

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Chapman Flack (#4)

Re: The flinfo->fn_extra question, from me this time.

Chapman Flack <chap@anastigmatix.net> writes:

Until now, I had assumed that SFRM_ValuePerCall mode might offer some
benefits, such as the possibility of pipelining certain queries and not
building up a whole tuplestore in advance.

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Yes, that's the case for a SRF in FROM. A SRF in the targetlist
actually does get the chance to pipeline, if it implements ValuePerCall.

The FROM case could be improved perhaps, if somebody wanted to put
time into it. You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but in principle you could return
rows immediately while stashing them aside in a tuplestore.

regards, tom lane

Dent John

denty@QQdd.eu

over 6 years ago

In reply to: Tom Lane (#5)

Re: The flinfo->fn_extra question, from me this time.

On 21 Jul 2019, at 22:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chapman Flack <chap@anastigmatix.net> writes:

Until now, I had assumed that SFRM_ValuePerCall mode might offer some
benefits, such as the possibility of pipelining certain queries and not
building up a whole tuplestore in advance.

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Yes, that's the case for a SRF in FROM. A SRF in the targetlist
actually does get the chance to pipeline, if it implements ValuePerCall.

The FROM case could be improved perhaps, if somebody wanted to put
time into it.

While looking at whether REFCURSOR output could be pipelined into the executor [1]/messages/by-id/B2AFCAB5-FACD-44BF-963F-7DD2735FAB5D@QQdd.eu, I’ve stumbled upon the same.

By any chance, do either of you know if there are initiatives to make the changes mentioned?

You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but […]

I’m also interested in your comment here. If the function was STABLE, could not the function scan simply be restarted? (Rather than needing to create the tuplestore for all cases.)

I guess perhaps the backwards scan is where it falls down though...

[…] in principle you could return
rows immediately while stashing them aside in a tuplestore.

Does the planner have any view on this? When I first saw what was going on, I presumed the planner had decided the cost of multiple function scans was greater than the cost of materialising it in a temporary store.

It occurs to me that, if we made a switch towards pipelining the function scan results directly out, then we might be loose efficiency where there are a significant number of scans and/or the function cost high. Is that why you were suggesting to as well stash them aside?

denty.

[1]: /messages/by-id/B2AFCAB5-FACD-44BF-963F-7DD2735FAB5D@QQdd.eu

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Dent John (#6)

Re: The flinfo->fn_extra question, from me this time.

Dent John <denty@QQdd.eu> writes:

On 21 Jul 2019, at 22:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chapman Flack <chap@anastigmatix.net> writes:

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Yes, that's the case for a SRF in FROM. A SRF in the targetlist
actually does get the chance to pipeline, if it implements ValuePerCall.
The FROM case could be improved perhaps, if somebody wanted to put
time into it.

By any chance, do either of you know if there are initiatives to make the changes mentioned?

I don't know of anybody working on it.

You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but […]

I’m also interested in your comment here. If the function was STABLE, could not the function scan simply be restarted? (Rather than needing to create the tuplestore for all cases.)
I guess perhaps the backwards scan is where it falls down though...

My point was that you can't simply remove the tuplestore-building code
path. The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

Does the planner have any view on this?

cost_functionscan and cost_rescan would likely need some adjustment if
possible. However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Tom Lane (#5)

Re: The flinfo->fn_extra question, from me this time.

On Sun, Jul 21, 2019 at 5:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The FROM case could be improved perhaps, if somebody wanted to put
time into it. You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but in principle you could return
rows immediately while stashing them aside in a tuplestore.

But you could skip it if you could prove that no rescans or backward
fetches are possible for a particular node, something that we also
want for Gather, as discussed not long ago.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Dent John

denty@QQdd.eu

over 6 years ago

In reply to: Tom Lane (#7)

1 attachment(s)

Re: The flinfo->fn_extra question, from me this time.

On 22 Sep 2019, at 16:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hi Tom,

I don't know of anybody working on it.

Okay. I had a look at this. I tried to apply Andre’s patch [1]/messages/by-id/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de </messages/by-id/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de> from some time ago, but that turned out not so easy. I guess the code has moved on since. So I’ve attempted to re-invent the same spirit, stealing from his patch, and from how the tSRF code does things. The patch isn’t final, but it demonstrates a concept.

However, given your comments below, I wonder if you might comment on the approach before I go further?

(Patch is presently still against 12beta2.)

You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but […]

I do recognise this. The patch teaches ExecMaterializesOutput() and ExecSupportsBackwardScan() that T_FunctionScan nodes don't materialise their output.

(Actually, Andre’s patch did the educating of ExecMaterializesOutput() and ExecSupportsBackwardScan() — it’s not my invention.)

I haven’t worked out how to easily demonstrate the backward scan case, but joins (which presumably are the typical cause of rescan) now yield an intermediate Materialize node.

postgres=# explain (analyze, buffers) select * from unnest (array_fill ('scanner'::text, array[10])) t1, unnest (array_fill ('dummy'::text, array[10000000])) t2 limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.01..1.36 rows=100 width=64) (actual time=0.009..0.067 rows=100 loops=1)
-> Nested Loop (cost=0.01..1350000.13 rows=100000000 width=64) (actual time=0.008..0.049 rows=100 loops=1)
-> Function Scan on unnest t2 (cost=0.00..100000.00 rows=10000000 width=32) (actual time=0.003..0.006 rows=10 loops=1)
-> Materialize (cost=0.00..0.15 rows=10 width=32) (actual time=0.000..0.002 rows=10 loops=10)
-> Function Scan on unnest t1 (cost=0.00..0.10 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
Planning Time: 127.875 ms
Execution Time: 0.102 ms
(7 rows)

My point was that you can't simply remove the tuplestore-building code
path. The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

Understood, and I agree. I think it’s preferable to allow the planner control over when to explicitly materialise.

But if I’m not wrong, at present, the planner doesn’t really trade-off the cost of rescan versus materialisation, but instead adopts a simple heuristic of materialising one or other side during a join. We can see this in the plans if the unnest()s are moved into the target list and buried in a subquery. For example:

postgres=# explain (analyze, buffers) select * from (select unnest (array_fill ('scanner'::text, array[10]))) t1, (select unnest (array_fill ('dummy'::text, array[10000000]))) t2 limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.40 rows=100 width=64) (actual time=0.011..0.106 rows=100 loops=1)
-> Nested Loop (cost=0.00..1400000.21 rows=100000000 width=64) (actual time=0.010..0.081 rows=100 loops=1)
-> ProjectSet (cost=0.00..50000.02 rows=10000000 width=32) (actual time=0.004..0.024 rows=10 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Materialize (cost=0.00..0.22 rows=10 width=32) (actual time=0.001..0.002 rows=10 loops=10)
-> ProjectSet (cost=0.00..0.07 rows=10 width=32) (actual time=0.001..0.004 rows=10 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 180.482 ms
Execution Time: 0.148 ms
(9 rows)

I am tempted to stop short of educating the planner about the possibility to re-scan (thus dropping the materialise node) during a join. It seems feasible, and sometimes advantageous. (Perhaps if the join quals would cause a huge amount of the output to be filtered??) But it also seems better to treat it as an entirely separate issue.

cost_functionscan and cost_rescan would likely need some adjustment if
possible.

I looked at cost_functionscan(), but I think it is already of the view that a function can pipeline. It computes a startup_cost and a run_cost, where run_cost is the per-tuple cost * num_rows. With this understanding, it is actually wrong given the current materialisation-always behaviour. I think this means I don’t need to make any fundamental changes in order to correctly cost the new behaviour.

However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

Yes. There is a flaw. But with the costing support function interface, it’s possible to supply costs that correctly relate to the SRF’s abilities.

I guess there can be a case where the SRF supports ValuePerCall, and supplies costs accordingly, but at execution time, decides to not to use it. That seems a curious situation, but it will, at worst, cost us a bit more buffer space.

In the opposite case, where the SRF can’t support ValuePerCall, the risk is that the planner has decided it wants to interject a Materialize node, and the result will be buffer-to-buffer copying. If the function has a costing support function, it should all be costed correctly, but it’s obviously not ideal. Currently, my patch doesn’t do anything about this case. My plan would be to allow the Materialize node to be supplied with a tuplestore from the FunctionScan node at execution time. I guess this optimisation would similarly help non-ValuePerCall tSRFs.

After all this, I’m wondering how you view the proposal?

For sake of comparison, 12beta1 achieves the following plans:

postgres=# create or replace function test1() returns setof record language plpgsql as $$ begin return query (select 'a', generate_series (1, 1e6)); end; $$; -- using plpgsql because it can’t pipeline
CREATE FUNCTION
postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
Planning Time: 0.068 ms
Execution Time: 589.651 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
Planning Time: 0.059 ms
Execution Time: 348.334 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
Planning Time: 165.502 ms
Execution Time: 5629.094 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
Planning Time: 110.952 ms
Execution Time: 1080.609 ms

Versus 12beta2+patch, which seem favourable in the main, at least for these pathological cases:

postgres=# explain (verbose, analyse, buffers) select key, count (value), sum (value) from test1() as (key text, value numeric) group by key;
...
Planning Time: 0.068 ms
Execution Time: 591.749 ms

postgres=# explain (verbose, analyse, buffers) select * from test1() as (key text, value numeric) limit 50;
...
Planning Time: 0.051 ms
Execution Time: 289.820 ms

postgres=# explain (analyze, buffers) select count (a.a), sum (a.a) from unnest (array_fill (1::numeric, array[10000000])) a;
...
Planning Time: 169.260 ms
Execution Time: 4759.781 ms

postgres=# explain (analyze, buffers) select * from unnest (array_fill (1::numeric, array[10000000])) limit 50;
...
Planning Time: 163.374 ms
Execution Time: 0.051 ms
denty.

[1]: /messages/by-id/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de </messages/by-id/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de>

Attachments:

pipeline-functionscan.patchapplication/octet-stream; name=pipeline-functionscan.patch; x-unix-mode=0644Download

diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1f18e5d..f3205e4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -554,7 +554,6 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
-		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_Material:
@@ -613,7 +612,6 @@ ExecMaterializesOutput(NodeTag plantype)
 	switch (plantype)
 	{
 		case T_Material:
-		case T_FunctionScan:
 		case T_TableFuncScan:
 		case T_CteScan:
 		case T_NamedTuplestoreScan:
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index c8a3efc..86ef5ce 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,7 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeFunctionscan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -45,6 +46,9 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
 										Tuplestorestate *resultStore,
 										TupleDesc resultDesc);
 static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+static void slot_puttuple_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off, TupleDesc resultdesc, Datum result);
+static void slot_copyslot_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off, TupleDesc resultdesc, TupleTableSlot *result);
+static void slot_putscalar_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off, Datum result, bool isNull);
 
 
 /*
@@ -89,33 +93,102 @@ ExecInitTableFunctionResult(Expr *expr,
 	return state;
 }
 
+static void
+ExecFetchFromTableFunctionTuplestore(SetExprState *setexpr,
+									 TupleDesc expectedDesc,
+									 TupleTableSlot *resultslot,
+									 AttrNumber scanslot_off,
+									 ExprDoneCond *isDone)
+{
+	MemoryContext oldContext;
+	bool		foundTup;
+	
+	/*
+	 * Have to make sure tuple in slot lives long enough, otherwise
+	 * clearing the slot could end up trying to free something already
+	 * freed.
+	 */
+	oldContext = MemoryContextSwitchTo(resultslot->tts_mcxt);
+	foundTup = tuplestore_gettupleslot(setexpr->funcResultStore, true, false,
+									   setexpr->funcResultSlot);
+	MemoryContextSwitchTo(oldContext);
+	
+	if (foundTup)
+	{
+		*isDone = ExprMultipleResult;
+		
+		if (setexpr->funcReturnsTuple)
+		{
+			/* We must expand the whole tuple. */
+			/*
+			 * Copy it to the result cols.
+			 */
+			slot_getallattrs(setexpr->funcResultSlot);
+			
+			slot_copyslot_offset (resultslot, expectedDesc, scanslot_off, setexpr->funcResultSlot->tts_tupleDescriptor, setexpr->funcResultSlot);
+		}
+		else
+		{
+			bool		isNull = false;
+			
+			/* Extract the first column and return it as a scalar. */
+			Datum result = slot_getattr(setexpr->funcResultSlot, 1, &isNull);
+			
+			slot_putscalar_offset (resultslot, expectedDesc, scanslot_off, result, isNull);
+		}
+	}
+	else
+	{
+		/* Exhausted the tuplestore, so clean up */
+		tuplestore_end(setexpr->funcResultStore);
+		setexpr->funcResultStore = NULL;
+		*isDone = ExprEndResult;
+	}
+}
+
 /*
  *		ExecMakeTableFunctionResult
  *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
+ * Evaluate a table function, storing a single row in scanslot starting at
+ * attribute scanslot_off.
  *
  * This is used by nodeFunctionscan.c.
  */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
+void
+ExecMakeTableFunctionResult(FunctionScanPerFuncState *fs,
 							ExprContext *econtext,
 							MemoryContext argContext,
-							TupleDesc expectedDesc,
-							bool randomAccess)
+							TupleTableSlot *resultslot,
+							AttrNumber scanslot_off,
+							ExprDoneCond *isDone)
 {
-	Tuplestorestate *tupstore = NULL;
-	TupleDesc	tupdesc = NULL;
+	SetExprState *setexpr = fs->setexpr;
 	Oid			funcrettype;
 	bool		returnsTuple;
 	bool		returnsSet = false;
 	FunctionCallInfo fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
-	HeapTupleData tmptup;
+	ReturnSetInfo *rsinfo;
 	MemoryContext callerContext;
 	MemoryContext oldcontext;
-	bool		first_time = true;
+
+restart:
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+	
+	/*
+	 * If a previous call of the function returned a set result in the form of
+	 * a tuplestore, continue reading rows from the tuplestore until it's
+	 * empty.
+	 */
+	if (setexpr->funcResultStore)
+	{
+		ExecFetchFromTableFunctionTuplestore(setexpr, fs->tupdesc, resultslot, scanslot_off, isDone);
+		
+		/* No matter what, we are done here. */
+		return;
+	}
 
 	callerContext = CurrentMemoryContext;
 
@@ -130,18 +203,21 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 	 * resultinfo, but set it up anyway because we use some of the fields as
 	 * our own state variables.
 	 */
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = expectedDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-	if (randomAccess)
-		rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
-	fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
+	rsinfo = (ReturnSetInfo *) setexpr->fcinfo->resultinfo;
+	
+	if (rsinfo == NULL)
+	{
+		oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = fs->tupdesc;
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		setexpr->fcinfo->resultinfo = (Node *) rsinfo;
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	fcinfo = setexpr->fcinfo;
 
 	/*
 	 * Normally the passed expression tree will be a SetExprState, since the
@@ -162,23 +238,32 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 		InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
 								 list_length(setexpr->args),
 								 setexpr->fcinfo->fncollation,
-								 NULL, (Node *) &rsinfo);
+								 NULL, (Node *) rsinfo);
 
 		/*
 		 * Evaluate the function's argument list.
 		 *
-		 * We can't do this in the per-tuple context: the argument values
-		 * would disappear when we reset that context in the inner loop.  And
-		 * the caller's CurrentMemoryContext is typically a query-lifespan
-		 * context, so we don't want to leak memory there.  We require the
-		 * caller to pass a separate memory context that can be used for this,
-		 * and can be reset each time through to avoid bloat.
+		 * arguments is a list of expressions to evaluate before passing to the
+		 * function manager.  We skip the evaluation if it was already done in the
+		 * previous call (ie, we are continuing the evaluation of a set-valued
+		 * function).  Otherwise, collect the current argument values into fcinfo.
+		 *
+		 * The arguments have to live in a context that lives at least until all
+		 * rows from this SRF have been returned, otherwise ValuePerCall SRFs
+		 * would reference freed memory after the first returned row.
 		 */
-		MemoryContextReset(argContext);
-		oldcontext = MemoryContextSwitchTo(argContext);
-		ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-		MemoryContextSwitchTo(oldcontext);
-
+		if (!setexpr->setArgsValid)
+		{
+			oldcontext = MemoryContextSwitchTo(argContext);
+			ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
+			MemoryContextSwitchTo(oldcontext);
+		}
+		else
+		{
+			/* Reset flag (we may set it again below) */
+			setexpr->setArgsValid = false;
+		}
+		
 		/*
 		 * If function is strict, and there are any NULL arguments, skip
 		 * calling the function and act like it returned NULL (or an empty
@@ -207,166 +292,161 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 	MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
 	/*
-	 * Loop to handle the ValuePerCall protocol (which is also the same
+	 * Handle the ValuePerCall protocol (which is also the same
 	 * behavior needed in the generic ExecEvalExpr path).
 	 */
-	for (;;)
+	Datum		result;
+
+	/* Call the function or expression one time */
+	if (!setexpr->elidedFuncState)
 	{
-		Datum		result;
+		pgstat_init_function_usage(fcinfo, &fcusage);
 
-		CHECK_FOR_INTERRUPTS();
+		fcinfo->isnull = false;
+		rsinfo->isDone = ExprSingleResult;
+		result = FunctionCallInvoke(fcinfo);
 
+		pgstat_end_function_usage(&fcusage,
+								  rsinfo->isDone != ExprMultipleResult);
+	}
+	else
+	{
+		result =
+			ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
+		rsinfo->isDone = ExprSingleResult;
+	}
+
+	/* Which protocol does function want to use? */
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
+	{
 		/*
-		 * reset per-tuple memory context before each call of the function or
-		 * expression. This cleans up any local memory the function may leak
-		 * when called.
+		 * Check for end of result set.
 		 */
-		ResetExprContext(econtext);
-
-		/* Call the function or expression one time */
-		if (!setexpr->elidedFuncState)
-		{
-			pgstat_init_function_usage(fcinfo, &fcusage);
-
-			fcinfo->isnull = false;
-			rsinfo.isDone = ExprSingleResult;
-			result = FunctionCallInvoke(fcinfo);
-
-			pgstat_end_function_usage(&fcusage,
-									  rsinfo.isDone != ExprMultipleResult);
-		}
+		if (rsinfo->isDone == ExprEndResult)
+			goto no_function_result;
 		else
-		{
-			result =
-				ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-			rsinfo.isDone = ExprSingleResult;
-		}
-
-		/* Which protocol does function want to use? */
-		if (rsinfo.returnMode == SFRM_ValuePerCall)
 		{
 			/*
-			 * Check for end of result set.
-			 */
-			if (rsinfo.isDone == ExprEndResult)
-				break;
-
-			/*
-			 * If first time through, build tuplestore for result.  For a
-			 * scalar function result type, also make a suitable tupdesc.
+			 * Save the current argument values to re-use on the next call.
 			 */
-			if (first_time)
+			if (*isDone == ExprMultipleResult)
 			{
-				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-				tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-				rsinfo.setResult = tupstore;
-				if (!returnsTuple)
+				setexpr->setArgsValid = true;
+				/* Register cleanup callback if we didn't already */
+				if (!setexpr->shutdown_reg)
 				{
-					tupdesc = CreateTemplateTupleDesc(1);
-					TupleDescInitEntry(tupdesc,
-									   (AttrNumber) 1,
-									   "column",
-									   funcrettype,
-									   -1,
-									   0);
-					rsinfo.setDesc = tupdesc;
+					RegisterExprContextCallback(econtext,
+												ShutdownSetExpr,
+												PointerGetDatum(setexpr));
+					setexpr->shutdown_reg = true;
 				}
+			}
+		}
+
+		HeapTupleHeader td = NULL;
+
+		/*
+		 * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
+		 */
+		if (fs->returned_tupdesc == NULL)
+		{
+			if (!returnsTuple)
+			{
+				/*
+				 * This is the first non-NULL result from the
+				 * function.  Use the type info embedded in the
+				 * rowtype Datum to look up the needed tupdesc.  Make
+				 * a copy for the query.
+				 */
+				// FIXME: is this a too-long lived context?
+				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+				fs->returned_tupdesc = CreateTemplateTupleDesc(1);
+				TupleDescInitEntry(fs->tupdesc,
+								   (AttrNumber) 1,
+								   "column",
+								   funcrettype,
+								   -1,
+								   0);
+				MemoryContextSwitchTo(oldcontext);
+			}
+			else if (!fcinfo->isnull)
+			{
+				td = DatumGetHeapTupleHeader(result);
+				 
+				/*
+				 * This is the first non-NULL result from the
+				 * function.  Use the type info embedded in the
+				 * rowtype Datum to look up the needed tupdesc.  Make
+				 * a copy for the query.
+				 */
+				// FIXME: is this a too-long lived context?
+				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+				fs->returned_tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
+												   HeapTupleHeaderGetTypMod(td));
 				MemoryContextSwitchTo(oldcontext);
 			}
+		}
 
-			/*
-			 * Store current resultset item.
-			 */
-			if (returnsTuple)
+		/*
+		 * Store current resultset item.
+		 */
+		if (returnsTuple)
+		{
+			if (!fcinfo->isnull)
 			{
-				if (!fcinfo->isnull)
-				{
-					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-					if (tupdesc == NULL)
-					{
-						/*
-						 * This is the first non-NULL result from the
-						 * function.  Use the type info embedded in the
-						 * rowtype Datum to look up the needed tupdesc.  Make
-						 * a copy for the query.
-						 */
-						oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-						tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
-															  HeapTupleHeaderGetTypMod(td));
-						rsinfo.setDesc = tupdesc;
-						MemoryContextSwitchTo(oldcontext);
-					}
-					else
-					{
-						/*
-						 * Verify all later returned rows have same subtype;
-						 * necessary in case the type is RECORD.
-						 */
-						if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-							HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-							ereport(ERROR,
-									(errcode(ERRCODE_DATATYPE_MISMATCH),
-									 errmsg("rows returned by function are not all of the same row type")));
-					}
-
-					/*
-					 * tuplestore_puttuple needs a HeapTuple not a bare
-					 * HeapTupleHeader, but it doesn't need all the fields.
-					 */
-					tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-					tmptup.t_data = td;
-
-					tuplestore_puttuple(tupstore, &tmptup);
-				}
-				else
-				{
-					/*
-					 * NULL result from a tuple-returning function; expand it
-					 * to a row of all nulls.  We rely on the expectedDesc to
-					 * form such rows.  (Note: this would be problematic if
-					 * tuplestore_putvalues saved the tdtypeid/tdtypmod from
-					 * the provided descriptor, since that might not match
-					 * what we get from the function itself.  But it doesn't.)
-					 */
-					int			natts = expectedDesc->natts;
-					bool	   *nullflags;
-
-					nullflags = (bool *) palloc(natts * sizeof(bool));
-					memset(nullflags, true, natts * sizeof(bool));
-					tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-				}
+				if (td == NULL)
+					td = DatumGetHeapTupleHeader(result);
+				
+				/*
+				 * Verify all later returned rows have same subtype;
+				 * necessary in case the type is RECORD.
+				 */
+				if (HeapTupleHeaderGetTypeId(td) != fs->returned_tupdesc->tdtypeid ||
+					HeapTupleHeaderGetTypMod(td) != fs->returned_tupdesc->tdtypmod)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATATYPE_MISMATCH),
+							 errmsg("rows returned by function are not all of the same row type")));
+
+				slot_puttuple_offset (resultslot, fs->tupdesc, scanslot_off, rsinfo->setDesc, result);
 			}
 			else
 			{
-				/* Scalar-type case: just store the function result */
-				tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
+				/*
+				 * NULL result from a tuple-returning function; expand it
+				 * to a row of all nulls.  We rely on the expectedDesc to
+				 * form such rows.  (Note: this would be problematic if
+				 * tuplestore_putvalues saved the tdtypeid/tdtypmod from
+				 * the provided descriptor, since that might not match
+				 * what we get from the function itself.  But it doesn't.)
+				 */
+				slot_puttuple_offset (resultslot, fs->tupdesc, scanslot_off, rsinfo->setDesc, 0);
 			}
-
-			/*
-			 * Are we done?
-			 */
-			if (rsinfo.isDone != ExprMultipleResult)
-				break;
 		}
-		else if (rsinfo.returnMode == SFRM_Materialize)
+		else
 		{
-			/* check we're on the same page as the function author */
-			if (!first_time || rsinfo.isDone != ExprSingleResult)
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-						 errmsg("table-function protocol for materialize mode was not followed")));
-			/* Done evaluating the set result */
-			break;
+			/* Scalar-type case: just store the function result */
+			slot_putscalar_offset (resultslot, fs->tupdesc, scanslot_off, result, fcinfo->isnull);
 		}
-		else
+	}
+	else if (rsinfo->returnMode == SFRM_Materialize)
+	{
+		/* check we're on the same page as the function author */
+		if (rsinfo->isDone != ExprSingleResult)
 			ereport(ERROR,
 					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-					 errmsg("unrecognized table-function returnMode: %d",
-							(int) rsinfo.returnMode)));
-
-		first_time = false;
+					 errmsg("table-function protocol for materialize mode was not followed")));
+		/* prepare to return values from the tuplestore */
+		ExecPrepareTuplestoreResult(setexpr, econtext,
+									rsinfo->setResult,
+									rsinfo->setDesc);
+		/* Done evaluating the set result */
+		goto restart;
 	}
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+				 errmsg("unrecognized table-function returnMode: %d",
+						(int) rsinfo->returnMode)));
 
 no_function_result:
 
@@ -376,20 +456,11 @@ no_function_result:
 	 * non-set-returning function then insert a single all-nulls row.  As
 	 * above, we depend on the expectedDesc to manufacture the dummy row.
 	 */
-	if (rsinfo.setResult == NULL)
+	if (rsinfo->setResult == NULL)
 	{
-		MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-		tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-		rsinfo.setResult = tupstore;
 		if (!returnsSet)
 		{
-			int			natts = expectedDesc->natts;
-			bool	   *nullflags;
-
-			MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-			nullflags = (bool *) palloc(natts * sizeof(bool));
-			memset(nullflags, true, natts * sizeof(bool));
-			tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
+			slot_puttuple_offset (resultslot, fs->tupdesc, scanslot_off, rsinfo->setDesc, 0);
 		}
 	}
 
@@ -397,23 +468,25 @@ no_function_result:
 	 * If function provided a tupdesc, cross-check it.  We only really need to
 	 * do this for functions returning RECORD, but might as well do it always.
 	 */
-	if (rsinfo.setDesc)
+	if (rsinfo->setDesc)
 	{
-		tupledesc_match(expectedDesc, rsinfo.setDesc);
+		tupledesc_match(fs->tupdesc, rsinfo->setDesc);
 
 		/*
 		 * If it is a dynamically-allocated TupleDesc, free it: it is
 		 * typically allocated in a per-query context, so we must avoid
 		 * leaking it across multiple usages.
 		 */
-		if (rsinfo.setDesc->tdrefcount == -1)
-			FreeTupleDesc(rsinfo.setDesc);
+		//if (rsinfo->setDesc->tdrefcount == -1)
+		//	FreeTupleDesc(rsinfo->setDesc);
+		// FIXME: work out when to release this...
 	}
+	
+	*isDone = rsinfo->isDone;
 
 	MemoryContextSwitchTo(callerContext);
 
-	/* All done, pass back the tuplestore */
-	return rsinfo.setResult;
+	/* All done, result is in the tuplestore */
 }
 
 
@@ -486,7 +559,7 @@ ExecMakeFunctionResultSet(SetExprState *fcache,
 	Datum		result;
 	FunctionCallInfo fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
+	ReturnSetInfo *rsinfo;
 	bool		callit;
 	int			i;
 
@@ -569,16 +642,23 @@ restart:
 	 */
 
 	/* Prepare a resultinfo node for communication. */
-	fcinfo->resultinfo = (Node *) &rsinfo;
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = fcache->funcResultDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
-	/* note we do not set SFRM_Materialize_Random or _Preferred */
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	
+	if (rsinfo == NULL)
+	{
+		MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->type = T_ReturnSetInfo;
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = fcache->funcResultDesc;
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		/* isDone is filled below */
+		rsinfo->setResult = NULL;
+		rsinfo->setDesc = NULL;
+		fcinfo->resultinfo = (Node *) rsinfo;
+		MemoryContextSwitchTo(oldcontext);
+	}
 
 	/*
 	 * If function is strict, and there are any NULL arguments, skip calling
@@ -602,13 +682,13 @@ restart:
 		pgstat_init_function_usage(fcinfo, &fcusage);
 
 		fcinfo->isnull = false;
-		rsinfo.isDone = ExprSingleResult;
+		rsinfo->isDone = ExprSingleResult;
 		result = FunctionCallInvoke(fcinfo);
 		*isNull = fcinfo->isnull;
-		*isDone = rsinfo.isDone;
+		*isDone = rsinfo->isDone;
 
 		pgstat_end_function_usage(&fcusage,
-								  rsinfo.isDone != ExprMultipleResult);
+								  rsinfo->isDone != ExprMultipleResult);
 	}
 	else
 	{
@@ -619,7 +699,7 @@ restart:
 	}
 
 	/* Which protocol does function want to use? */
-	if (rsinfo.returnMode == SFRM_ValuePerCall)
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
 	{
 		if (*isDone != ExprEndResult)
 		{
@@ -640,19 +720,20 @@ restart:
 			}
 		}
 	}
-	else if (rsinfo.returnMode == SFRM_Materialize)
+	else if (rsinfo->returnMode == SFRM_Materialize)
 	{
 		/* check we're on the same page as the function author */
-		if (rsinfo.isDone != ExprSingleResult)
+		if (rsinfo->isDone != ExprSingleResult)
 			ereport(ERROR,
 					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 					 errmsg("table-function protocol for materialize mode was not followed")));
-		if (rsinfo.setResult != NULL)
+		if (rsinfo->setResult != NULL)
 		{
 			/* prepare to return values from the tuplestore */
 			ExecPrepareTuplestoreResult(fcache, econtext,
-										rsinfo.setResult,
-										rsinfo.setDesc);
+										rsinfo->setResult,
+										rsinfo->setDesc);
+
 			/* loop back to top to start returning from tuplestore */
 			goto restart;
 		}
@@ -665,7 +746,7 @@ restart:
 		ereport(ERROR,
 				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 				 errmsg("unrecognized table-function returnMode: %d",
-						(int) rsinfo.returnMode)));
+						(int) rsinfo->returnMode)));
 
 	return result;
 }
@@ -712,6 +793,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
 							 numargs,
 							 input_collation, NULL, NULL);
+	sexpr->fcinfo->resultinfo = NULL;
 
 	/* If function returns set, check if that's allowed by caller */
 	if (sexpr->func.fn_retset && !allowSRF)
@@ -804,6 +886,12 @@ ShutdownSetExpr(Datum arg)
 
 	/* Clear any active set-argument state */
 	sexpr->setArgsValid = false;
+	
+	if (sexpr->fcinfo->resultinfo != NULL)
+	{
+		pfree (sexpr->fcinfo->resultinfo);
+		sexpr->fcinfo->resultinfo = NULL;
+	}
 
 	/* execUtils will deregister the callback... */
 	sexpr->shutdown_reg = false;
@@ -960,3 +1048,67 @@ tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
 							   i + 1)));
 	}
 }
+
+static void
+slot_puttuple_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					  TupleDesc resultdesc, Datum result)
+{
+	if (result != 0)
+	{
+		HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+		/*
+		 * tuplestore_puttuple needs a HeapTuple not a bare
+		 * HeapTupleHeader, but it doesn't need all the fields.
+		 */
+		HeapTupleData tmptup;
+		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+		tmptup.t_data = td;
+
+		heap_deform_tuple (&tmptup, expectedDesc, &(scanslot->tts_values[scanslot_off]), &(scanslot->tts_isnull[scanslot_off]));
+	}
+	else
+	{
+		/* Ensure any remaining result cols are initialsed to NULL. */
+		for (int i = 0; i < expectedDesc->natts; i++)
+		{
+			scanslot->tts_values[scanslot_off + i] = (Datum) 0;
+			scanslot->tts_isnull[scanslot_off + i] = true;
+		}
+	}
+}
+
+static void
+slot_copyslots_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					   int natts, Datum *datums, bool *isnulls)
+{
+	int i;
+	for (i = 0; i < natts; i++)
+	{
+		if (i >= expectedDesc->natts)
+			break;
+		
+		scanslot->tts_values[scanslot_off + i] = datums[i];
+		scanslot->tts_isnull[scanslot_off + i] = isnulls[i];
+	}
+	
+	/* Ensure any remaining result cols are initialsed to NULL. */
+	for (; i < expectedDesc->natts; i++)
+	{
+		scanslot->tts_values[scanslot_off + i] = (Datum) 0;
+		scanslot->tts_isnull[scanslot_off + i] = true;
+	}
+}
+
+static void
+slot_copyslot_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					  TupleDesc resultdesc, TupleTableSlot *result)
+{
+	slot_copyslots_offset (scanslot, expectedDesc, scanslot_off, resultdesc->natts, &(result->tts_values[0]), &(result->tts_isnull[0]));
+}
+
+static void
+slot_putscalar_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off, Datum result, bool isNull)
+{
+	slot_copyslots_offset (scanslot, expectedDesc, scanslot_off, 1, &result, &isNull);
+}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index 0370f2e..34054b7 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -30,19 +30,6 @@
 #include "utils/memutils.h"
 
 
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-	SetExprState *setexpr;		/* state of the expression being evaluated */
-	TupleDesc	tupdesc;		/* desc of the function result type */
-	int			colcount;		/* expected number of result columns */
-	Tuplestorestate *tstore;	/* holds the function result set */
-	int64		rowcount;		/* # of rows in result set, -1 if not known */
-	TupleTableSlot *func_slot;	/* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
 static TupleTableSlot *FunctionNext(FunctionScanState *node);
 
 
@@ -61,59 +48,42 @@ FunctionNext(FunctionScanState *node)
 {
 	EState	   *estate;
 	ScanDirection direction;
-	TupleTableSlot *scanslot;
+	TupleTableSlot *resultslot;
+	MemoryContext oldcontext;
+	ExprDoneCond		doneCond;
 	bool		alldone;
 	int64		oldpos;
 	int			funcno;
 	int			att;
 
+	// FIXME: assert not backwards
+
 	/*
 	 * get information from the estate and scan state
 	 */
 	estate = node->ss.ps.state;
 	direction = estate->es_direction;
-	scanslot = node->ss.ss_ScanTupleSlot;
+	resultslot = node->ss.ss_ScanTupleSlot;
 
+	ExecClearTuple(resultslot);
+
+	/* Call SRFs, as well as plain expressions, in per-tuple context */
+	oldcontext = MemoryContextSwitchTo(node->ss.ps.ps_ExprContext->ecxt_per_tuple_memory);
+	
 	if (node->simple)
 	{
 		/*
-		 * Fast path for the trivial case: the function return type and scan
-		 * result type are the same, so we fetch the function result straight
-		 * into the scan result slot. No need to update ordinality or
-		 * rowcounts either.
-		 */
-		Tuplestorestate *tstore = node->funcstates[0].tstore;
-
-		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
-		 */
-		if (tstore == NULL)
-		{
-			node->funcstates[0].tstore = tstore =
-				ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											node->funcstates[0].tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
+		 * Read tuple from function and put it in the scanslot.
 		 */
-		(void) tuplestore_gettupleslot(tstore,
-									   ScanDirectionIsForward(direction),
-									   false,
-									   scanslot);
-		return scanslot;
+		ExecMakeTableFunctionResult(&node->funcstates[0],
+									node->ss.ps.ps_ExprContext,
+									node->argcontext,
+									resultslot,
+									0, &doneCond);
+
+		alldone = (doneCond == ExprEndResult);
+		
+		goto return_resultslot;
 	}
 
 	/*
@@ -135,93 +105,31 @@ FunctionNext(FunctionScanState *node)
 	 * return types), and then copy the values to scanslot (which matches the
 	 * scan result type), setting the ordinal column (if any) as well.
 	 */
-	ExecClearTuple(scanslot);
 	att = 0;
 	alldone = true;
 	for (funcno = 0; funcno < node->nfuncs; funcno++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[funcno];
-		int			i;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
+		 * Read a tuples from function and put it in the scanslot.
 		 */
-		if (fs->tstore == NULL)
-		{
-			fs->tstore =
-				ExecMakeTableFunctionResult(fs->setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											fs->tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(fs->tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
-		 *
-		 * If we have a rowcount for the function, and we know the previous
-		 * read position was out of bounds, don't try the read. This allows
-		 * backward scan to work when there are mixed row counts present.
-		 */
-		if (fs->rowcount != -1 && fs->rowcount < oldpos)
-			ExecClearTuple(fs->func_slot);
-		else
-			(void) tuplestore_gettupleslot(fs->tstore,
-										   ScanDirectionIsForward(direction),
-										   false,
-										   fs->func_slot);
-
-		if (TupIsNull(fs->func_slot))
-		{
-			/*
-			 * If we ran out of data for this function in the forward
-			 * direction then we now know how many rows it returned. We need
-			 * to know this in order to handle backwards scans. The row count
-			 * we store is actually 1+ the actual number, because we have to
-			 * position the tuplestore 1 off its end sometimes.
-			 */
-			if (ScanDirectionIsForward(direction) && fs->rowcount == -1)
-				fs->rowcount = node->ordinal;
-
-			/*
-			 * populate the result cols with nulls
-			 */
-			for (i = 0; i < fs->colcount; i++)
-			{
-				scanslot->tts_values[att] = (Datum) 0;
-				scanslot->tts_isnull[att] = true;
-				att++;
-			}
-		}
-		else
+		ExecMakeTableFunctionResult(fs,
+									node->ss.ps.ps_ExprContext,
+									node->argcontext,
+									resultslot,
+									att,
+									&doneCond);
+
+		if (doneCond != ExprEndResult)
 		{
-			/*
-			 * we have a result, so just copy it to the result cols.
-			 */
-			slot_getallattrs(fs->func_slot);
-
-			for (i = 0; i < fs->colcount; i++)
-			{
-				scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-				scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
-				att++;
-			}
-
 			/*
 			 * We're not done until every function result is exhausted; we pad
 			 * the shorter results with nulls until then.
 			 */
 			alldone = false;
 		}
+		att += fs->colcount;
 	}
 
 	/*
@@ -229,18 +137,23 @@ FunctionNext(FunctionScanState *node)
 	 */
 	if (node->ordinality)
 	{
-		scanslot->tts_values[att] = Int64GetDatumFast(node->ordinal);
-		scanslot->tts_isnull[att] = false;
+		resultslot->tts_values[att] = Int64GetDatumFast(node->ordinal);
+		resultslot->tts_isnull[att] = false;
 	}
 
+return_resultslot:
+	MemoryContextSwitchTo(oldcontext);
+	
+	if (alldone)
+		return NULL;
+	
 	/*
 	 * If alldone, we just return the previously-cleared scanslot.  Otherwise,
 	 * finish creating the virtual tuple.
 	 */
-	if (!alldone)
-		ExecStoreVirtualTuple(scanslot);
+	ExecStoreVirtualTuple(resultslot);
 
-	return scanslot;
+	return resultslot;
 }
 
 /*
@@ -353,14 +266,6 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 										scanstate->ss.ps.ps_ExprContext,
 										&scanstate->ss.ps);
 
-		/*
-		 * Don't allocate the tuplestores; the actual calls to the functions
-		 * do that.  NULL means that we have not called the function yet (or
-		 * need to call it again after a rescan).
-		 */
-		fs->tstore = NULL;
-		fs->rowcount = -1;
-
 		/*
 		 * Now determine if the function returns a simple or composite type,
 		 * and build an appropriate tupdesc.  Note that in the composite case,
@@ -416,19 +321,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 
 		fs->tupdesc = tupdesc;
 		fs->colcount = colcount;
-
-		/*
-		 * We only need separate slots for the function results if we are
-		 * doing ordinality or multiple functions; otherwise, we'll fetch
-		 * function results directly into the scan slot.
-		 */
-		if (!scanstate->simple)
-		{
-			fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-												   &TTSOpsMinimalTuple);
-		}
-		else
-			fs->func_slot = NULL;
+		fs->returned_tupdesc = NULL; /* will be initialzied during FunctionNext */
 
 		natts += colcount;
 		i++;
@@ -521,8 +414,6 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 void
 ExecEndFunctionScan(FunctionScanState *node)
 {
-	int			i;
-
 	/*
 	 * Free the exprcontext
 	 */
@@ -534,23 +425,6 @@ ExecEndFunctionScan(FunctionScanState *node)
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
-	/*
-	 * Release slots and tuplestore resources
-	 */
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-
-		if (fs->tstore != NULL)
-		{
-			tuplestore_end(node->funcstates[i].tstore);
-			fs->tstore = NULL;
-		}
-	}
 }
 
 /* ----------------------------------------------------------------
@@ -571,9 +445,12 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	for (i = 0; i < node->nfuncs; i++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
+		
+		if (fs->returned_tupdesc != NULL)
+		{
+			FreeTupleDesc (fs->returned_tupdesc);
+			fs->returned_tupdesc = NULL;
+		}
 	}
 
 	ExecScanReScan(&node->ss);
@@ -597,12 +474,7 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 			if (bms_overlap(chgparam, rtfunc->funcparams))
 			{
-				if (node->funcstates[i].tstore != NULL)
-				{
-					tuplestore_end(node->funcstates[i].tstore);
-					node->funcstates[i].tstore = NULL;
-				}
-				node->funcstates[i].rowcount = -1;
+				// FIXME: trigger something...!
 			}
 			i++;
 		}
@@ -614,7 +486,6 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	/* Make sure we rewind any remaining tuplestores */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		if (node->funcstates[i].tstore != NULL)
-			tuplestore_rescan(node->funcstates[i].tstore);
+		// FIXME: trigger rewind
 	}
 }
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d056fd6..c40f6e6 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -405,11 +405,12 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
  */
 extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
 												 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
+extern void ExecMakeTableFunctionResult(struct FunctionScanPerFuncState *fs,
 													ExprContext *econtext,
 													MemoryContext argContext,
-													TupleDesc expectedDesc,
-													bool randomAccess);
+													TupleTableSlot *scanslot,
+													AttrNumber scanslot_off,
+													ExprDoneCond *isDone);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
 											   ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 4f7d60d..9362952 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,17 @@
 
 #include "nodes/execnodes.h"
 
+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+	SetExprState *setexpr;		/* state of the expression being evaluated */
+	TupleDesc	tupdesc;		/* desc of the function result type */
+	int			colcount;		/* expected number of result columns */
+	TupleDesc	returned_tupdesc;		/* desc of the function's last-returned result type */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);

#10

Dent John

denty@QQdd.eu

about 6 years ago

In reply to: Dent John (#9)

Re: The flinfo->fn_extra question, from me this time.

Hi,

Turns out — to my embarrassment — that pretty much all of the regression tests failed with my patch. No idea if anyone spotted that and withheld reply in revenge, but I wouldn’t blame if you did!

I have spent a bit more time on it. The attached patch is a much better show, though there are still a few regressions and undoubtedly it’s still rough.

(Attached patch is against 12.0.)

As was perhaps predictable, some of the regression tests do indeed break in the case of rescan. To cite the specific class of fail, it’s this:

SELECT * FROM (VALUES (1),(2),(3)) v(r), ROWS FROM( rngfunc_sql(11,11), rngfunc_mat(10+r,13) );
r | i | s | i | s
---+----+---+----+—
1 | 11 | 1 | 11 | 1
1 | | | 12 | 2
1 | | | 13 | 3
- 2 | 11 | 1 | 12 | 4
+ 2 | 11 | 2 | 12 | 4
2 | | | 13 | 5
- 3 | 11 | 1 | 13 | 6
+ 3 | 11 | 3 | 13 | 6
(6 rows)

The reason for the change is that ’s' comes from rngfunc_mat(), which computes s as nextval(). The patch currently prefers to re-execute the function in place of materialising it into a tuplestore.

Tom suggested not dropping the tuplestore creation logic. I can’t fathom a way of avoiding change for folk that have gotten used to the current behaviour without doing that. So I’m tempted to pipeline the rows back from a function (if it returns ValuePerCall), and also record it in a tuplestore, just in case rescan happens. There’s still wastage in this approach, but it would save the current behaviour, while stil enabling the early abort of ValuePerCall SRFs at relatively low cost, which is certainly one of my goals.

I’d welcome opinion on whether there are downsides the that approach, as I might move to integrate that next.

But I would also like to kick around ideas for how to avoid entirely the tuplestore.

Earlier, I suggested that we might make the decision logic prefer to materialise a tuplestore for VOLATILE functions, and prefer to pipeline directly from STABLE (and IMMUTABLE) functions. The docs on volatility categories describe that the optimiser will evaluate a VOLATILE function for every row it is needed, whereas it might cache STABLE and IMMUTABLE with greater aggression. It’s basically the polar opposite of what I want to achieve.

It is arguably also in conflict with current behaviour. I think we should make the docs clearer about that.

So, on second thoughts, I don’t think overloading the meaning of STABLE, et al., is the right thing to do. I wonder if we could invent a new modifier to CREATE FUNCTION, perhaps “PIPELINED”, which would simply declare a function's ability and preference for ValuePerCall mode.

Or perhaps modify the ROWS FROM extension, and adopt WITH’s [ NOT ] MATERIALIZED clause. For example, the following would achieve the above proposed behaviour:

ROWS FROM( rngfunc_sql(11,11) MATERIALIZED, rngfunc_mat(10+r,13) MATERIALIZED )

Of course, NOT MATERIALIZED would achieve ValuePerCall mode, and omit materialisation. I guess MATERIALIZED would have to be the default.

I wonder if another alternative would be to decide materialization based on what the outer plan includes. I guess we can tell if we’re part of a join, or if the plan requires the ability to scan backwards. Could that work?

denty.

#11

Dent John

denty@QQdd.eu

about 6 years ago

In reply to: Dent John (#10)

1 attachment(s)

Re: The flinfo->fn_extra question, from me this time.

(And here’s aforementioned attachment… doh.)

Attachments:

pipeline-functionscan-v2.patchapplication/octet-stream; name=pipeline-functionscan-v2.patch; x-unix-mode=0644Download

diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1f18e5d..f3205e4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -554,7 +554,6 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
-		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_Material:
@@ -613,7 +612,6 @@ ExecMaterializesOutput(NodeTag plantype)
 	switch (plantype)
 	{
 		case T_Material:
-		case T_FunctionScan:
 		case T_TableFuncScan:
 		case T_CteScan:
 		case T_NamedTuplestoreScan:
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index c8a3efc..1037909 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,7 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeFunctionscan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -45,6 +46,23 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
 										Tuplestorestate *resultStore,
 										TupleDesc resultDesc);
 static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+static void ExecPrepareFuncResultslot(SetExprState *sexpr,
+										TupleDesc resultDesc);
+static void slot_puttuple_offset (TupleTableSlot *scanslot,
+								  TupleDesc expectedDesc,
+								  AttrNumber scanslot_off,
+								  TupleDesc resultdesc,
+								  Datum result);
+static void slot_copyslot_offset (TupleTableSlot *scanslot,
+								  TupleDesc expectedDesc,
+								  AttrNumber scanslot_off,
+								  TupleDesc resultdesc,
+								  TupleTableSlot *result);
+static void slot_putscalar_offset (TupleTableSlot *scanslot,
+								   TupleDesc expectedDesc,
+								   AttrNumber scanslot_off,
+								   Datum result,
+								   bool isNull);
 
 
 /*
@@ -89,39 +107,134 @@ ExecInitTableFunctionResult(Expr *expr,
 	return state;
 }
 
+static void
+ExecFetchFromTableFunctionTuplestore(SetExprState *setexpr,
+									 TupleDesc expectedDesc,
+									 TupleTableSlot *resultslot,
+									 AttrNumber scanslot_off,
+									 ExprDoneCond *isDone)
+{
+	MemoryContext oldContext;
+	bool		foundTup;
+	
+	/*
+	 * Have to make sure tuple in slot lives long enough, otherwise
+	 * clearing the slot could end up trying to free something already
+	 * freed.
+	 */
+	oldContext = MemoryContextSwitchTo(resultslot->tts_mcxt);
+	foundTup = tuplestore_gettupleslot(setexpr->funcResultStore, true, false,
+									   setexpr->funcResultSlot);
+	MemoryContextSwitchTo(oldContext);
+	
+	if (foundTup)
+	{
+		*isDone = ExprMultipleResult;
+		
+		if (setexpr->funcReturnsTuple)
+		{
+			/* We must expand the whole tuple. */
+			slot_getallattrs(setexpr->funcResultSlot);
+			
+			/*
+			 * Copy it to the result cols.
+			 */
+			slot_copyslot_offset (resultslot, expectedDesc, scanslot_off, setexpr->funcResultSlot->tts_tupleDescriptor, setexpr->funcResultSlot);
+		}
+		else
+		{
+			bool		isNull = false;
+			
+			/* Extract the first column and return it as a scalar. */
+			Datum result = slot_getattr(setexpr->funcResultSlot, 1, &isNull);
+			
+			slot_putscalar_offset (resultslot, expectedDesc, scanslot_off, result, isNull);
+		}
+	}
+	else
+	{
+		/* Exhausted the tuplestore, so clean up */
+		tuplestore_end(setexpr->funcResultStore);
+		setexpr->funcResultStore = NULL;
+		
+		/* We must store a row of NULLs in case we are used in ROWS FROM */
+		slot_puttuple_offset (resultslot, setexpr->funcResultDesc, scanslot_off, NULL, 0);
+
+		*isDone = ExprEndResult;
+	}
+}
+
 /*
  *		ExecMakeTableFunctionResult
  *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
+ * Evaluate a table function, storing a single row in scanslot starting at
+ * attribute scanslot_off.
  *
  * This is used by nodeFunctionscan.c.
  */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
+void
+ExecMakeTableFunctionResult(FunctionScanPerFuncState *fs,
 							ExprContext *econtext,
 							MemoryContext argContext,
-							TupleDesc expectedDesc,
-							bool randomAccess)
+							TupleTableSlot *resultslot,
+							AttrNumber scanslot_off,
+							ExprDoneCond *isDone)
 {
-	Tuplestorestate *tupstore = NULL;
-	TupleDesc	tupdesc = NULL;
-	Oid			funcrettype;
-	bool		returnsTuple;
-	bool		returnsSet = false;
-	FunctionCallInfo fcinfo;
+	SetExprState *setexpr = fs->setexpr;
+	bool		call_fn = true; /* whether to actually call the SRF */
+	bool		already_stored = false; /* has the result been stored? */
+	FunctionCallInfo fcinfo = setexpr->fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
-	HeapTupleData tmptup;
-	MemoryContext callerContext;
+	ReturnSetInfo *rsinfo;
 	MemoryContext oldcontext;
-	bool		first_time = true;
+	Datum		result = 0;
+
+restart:
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+	
+	/*
+	 * If a previous call of the function returned a set result in the form of
+	 * a tuplestore, continue reading rows from the tuplestore until it's
+	 * empty.
+	 */
+	if (setexpr->funcResultStore)
+	{
+		rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; /* always set if funcResultStore is set */
+
+		ExecFetchFromTableFunctionTuplestore(setexpr, setexpr->funcResultDesc, resultslot, scanslot_off, &rsinfo->isDone);
+
+		/*
+		 * We are done here: fall through below with isDone as either
+		 * ExprMultipleResult or ExprEndResult.
+		 */
+
+		already_stored = true;
+		call_fn = false;
+	}
+	/*
+	 * The elidedFuncState case isn't related to the SFRM_Materialize/
+	 * FetchFromTuplestore decision, except that it cannot occur in that
+	 * case, so we code it as if/elseif, rather than if/if.
+	 */
+	else if (setexpr->elidedFuncState)
+	{
 
-	callerContext = CurrentMemoryContext;
+		/* For SRFs, fcinfo would have been allocated by init_sexpr(). */
+		if (fcinfo == NULL)
+		{
+			oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+			
+			/* By performing InitFunctionCallInfoData here, we avoid palloc0() */
+			setexpr->fcinfo = fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
 
-	funcrettype = exprType((Node *) setexpr->expr);
+			MemoryContextSwitchTo(oldcontext);
 
-	returnsTuple = type_is_rowtype(funcrettype);
+			/* Treat setexpr as a generic expression */
+			InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
+		}
+	}
 
 	/*
 	 * Prepare a resultinfo node for communication.  We always do this even if
@@ -130,18 +243,50 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 	 * resultinfo, but set it up anyway because we use some of the fields as
 	 * our own state variables.
 	 */
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = expectedDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-	if (randomAccess)
-		rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
+	rsinfo = (ReturnSetInfo *) setexpr->fcinfo->resultinfo;
+	
+	if (rsinfo == NULL)
+	{
+		oldcontext = MemoryContextSwitchTo(argContext);
+
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = setexpr->funcResultDesc;
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		fcinfo->resultinfo = (Node *) rsinfo;
 
-	fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
+		MemoryContextSwitchTo(oldcontext);
+	}
+	else
+	{
+		/*
+		 * If rsinfo was already present, it means we're being asked
+		 * to continue projecting. In turn, if last time we projected
+		 * a SingleResult, then all future calls should be handled as
+		 * if it was the last row from an SRF.
+		 *
+		 * Note: this is different from the ProjectSet case, which
+		 * instead re-invokes the non-SRF function for each row.
+		 */
+		if (rsinfo->isDone == ExprSingleResult)
+			rsinfo->isDone = ExprEndResult;
+	}
+
+	/*
+	 * If we're asked to continuing to project output rows despite the SRF
+	 * being exhausted (indicated by isDone being alreday set to ExprEndResult),
+	 * return NULLs forever.
+	 */
+	if (rsinfo->isDone == ExprEndResult && !already_stored)
+	{
+		call_fn = false; /* don't invoke the function again */
+		rsinfo->returnMode = SFRM_ValuePerCall; /* returning a row at a time */
+		result = 0; /* result and NULL is set later */
+		fcinfo->isnull = true;
+
+		/* the actual result is written later */
+	}
 
 	/*
 	 * Normally the passed expression tree will be a SetExprState, since the
@@ -153,37 +298,46 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 	 * don't get a chance to pass a special ReturnSetInfo to any functions
 	 * buried in the expression.
 	 */
-	if (!setexpr->elidedFuncState)
+	if (call_fn && !setexpr->elidedFuncState)
 	{
 		/*
 		 * This path is similar to ExecMakeFunctionResultSet.
 		 */
-		returnsSet = setexpr->funcReturnsSet;
 		InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
 								 list_length(setexpr->args),
 								 setexpr->fcinfo->fncollation,
-								 NULL, (Node *) &rsinfo);
+								 NULL, (Node *) rsinfo);
 
 		/*
 		 * Evaluate the function's argument list.
 		 *
-		 * We can't do this in the per-tuple context: the argument values
-		 * would disappear when we reset that context in the inner loop.  And
-		 * the caller's CurrentMemoryContext is typically a query-lifespan
-		 * context, so we don't want to leak memory there.  We require the
-		 * caller to pass a separate memory context that can be used for this,
-		 * and can be reset each time through to avoid bloat.
+		 * arguments is a list of expressions to evaluate before passing to the
+		 * function manager.  We skip the evaluation if it was already done in the
+		 * previous call (ie, we are continuing the evaluation of a set-valued
+		 * function).  Otherwise, collect the current argument values into fcinfo.
+		 *
+		 * The arguments have to live in a context that lives at least until all
+		 * rows from this SRF have been returned, otherwise ValuePerCall SRFs
+		 * would reference freed memory after the first returned row.
 		 */
-		MemoryContextReset(argContext);
-		oldcontext = MemoryContextSwitchTo(argContext);
-		ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-		MemoryContextSwitchTo(oldcontext);
+		if (!setexpr->setArgsValid)
+		{
+			oldcontext = MemoryContextSwitchTo(argContext);
+			ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
+			MemoryContextSwitchTo(oldcontext);
+		}
+		else
+		{
+			/* Reset flag (we may set it again below) */
+			setexpr->setArgsValid = false;
+		}
 
 		/*
 		 * If function is strict, and there are any NULL arguments, skip
 		 * calling the function and act like it returned NULL (or an empty
 		 * set, in the returns-set case).
 		 */
+		call_fn = true;
 		if (setexpr->func.fn_strict)
 		{
 			int			i;
@@ -191,100 +345,90 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 			for (i = 0; i < fcinfo->nargs; i++)
 			{
 				if (fcinfo->args[i].isnull)
-					goto no_function_result;
+				{
+					call_fn = false;
+
+					result = 0;
+					fcinfo->isnull = true;
+					rsinfo->isDone = ExprEndResult;
+
+					break;
+				}
 			}
 		}
 	}
-	else
-	{
-		/* Treat setexpr as a generic expression */
-		InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
-	}
-
-	/*
-	 * Switch to short-lived context for calling the function or expression.
-	 */
-	MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
-	/*
-	 * Loop to handle the ValuePerCall protocol (which is also the same
-	 * behavior needed in the generic ExecEvalExpr path).
-	 */
-	for (;;)
+	/* Call the function or expression one time */
+	if (call_fn)
 	{
-		Datum		result;
-
-		CHECK_FOR_INTERRUPTS();
-
-		/*
-		 * reset per-tuple memory context before each call of the function or
-		 * expression. This cleans up any local memory the function may leak
-		 * when called.
-		 */
-		ResetExprContext(econtext);
-
-		/* Call the function or expression one time */
 		if (!setexpr->elidedFuncState)
 		{
 			pgstat_init_function_usage(fcinfo, &fcusage);
 
 			fcinfo->isnull = false;
-			rsinfo.isDone = ExprSingleResult;
+			rsinfo->isDone = ExprSingleResult;
 			result = FunctionCallInvoke(fcinfo);
 
 			pgstat_end_function_usage(&fcusage,
-									  rsinfo.isDone != ExprMultipleResult);
+									  rsinfo->isDone != ExprMultipleResult);
 		}
 		else
 		{
 			result =
 				ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-			rsinfo.isDone = ExprSingleResult;
+			rsinfo->isDone = ExprSingleResult;
 		}
+	}
+	
+	/* Which protocol does function want to use? */
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
+	{
+		HeapTupleHeader td = NULL;
 
-		/* Which protocol does function want to use? */
-		if (rsinfo.returnMode == SFRM_ValuePerCall)
+		if (rsinfo->isDone != ExprEndResult)
 		{
 			/*
-			 * Check for end of result set.
+			 * Save the current argument values to re-use on the next call.
 			 */
-			if (rsinfo.isDone == ExprEndResult)
-				break;
+			if (rsinfo->isDone == ExprMultipleResult)
+			{
+				setexpr->setArgsValid = true;
+				/* Register cleanup callback if we didn't already */
+				if (!setexpr->shutdown_reg)
+				{
+					RegisterExprContextCallback(econtext,
+												ShutdownSetExpr,
+												PointerGetDatum(setexpr));
+					setexpr->shutdown_reg = true;
+				}
+			}
 
 			/*
-			 * If first time through, build tuplestore for result.  For a
-			 * scalar function result type, also make a suitable tupdesc.
+			 * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
 			 */
-			if (first_time)
+			if (rsinfo->setDesc == NULL)
 			{
-				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-				tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-				rsinfo.setResult = tupstore;
-				if (!returnsTuple)
+				if (!setexpr->funcReturnsTuple)
 				{
-					tupdesc = CreateTemplateTupleDesc(1);
-					TupleDescInitEntry(tupdesc,
+					/*
+					 * Make a copy for the query.
+					 */
+					oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+					rsinfo->setDesc = CreateTemplateTupleDesc(1);
+					TupleDescInitEntry(rsinfo->setDesc,
 									   (AttrNumber) 1,
 									   "column",
-									   funcrettype,
+									   exprType((Node *) setexpr->expr),
 									   -1,
 									   0);
-					rsinfo.setDesc = tupdesc;
+					MemoryContextSwitchTo(oldcontext);
 				}
-				MemoryContextSwitchTo(oldcontext);
-			}
-
-			/*
-			 * Store current resultset item.
-			 */
-			if (returnsTuple)
-			{
-				if (!fcinfo->isnull)
+				else if (!fcinfo->isnull)
 				{
-					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-					if (tupdesc == NULL)
+					if (result != 0)
 					{
+						td = DatumGetHeapTupleHeader(result);
+
 						/*
 						 * This is the first non-NULL result from the
 						 * function.  Use the type info embedded in the
@@ -292,32 +436,45 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 						 * a copy for the query.
 						 */
 						oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-						tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
+						rsinfo->setDesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
 															  HeapTupleHeaderGetTypMod(td));
-						rsinfo.setDesc = tupdesc;
 						MemoryContextSwitchTo(oldcontext);
 					}
-					else
-					{
-						/*
-						 * Verify all later returned rows have same subtype;
-						 * necessary in case the type is RECORD.
-						 */
-						if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-							HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-							ereport(ERROR,
-									(errcode(ERRCODE_DATATYPE_MISMATCH),
-									 errmsg("rows returned by function are not all of the same row type")));
-					}
+				}
+			}
+
+			/* If we obtained a tupdesc, check it is appropriate */
+			if (rsinfo->setDesc && setexpr->funcResultDesc &&
+				!fs->tupdesc_checked)
+			{
+				tupledesc_match (setexpr->funcResultDesc, rsinfo->setDesc);
+				fs->tupdesc_checked = true;
+			}
+		}
 
+		if (!already_stored)
+		{
+			/*
+			 * Store current resultset item.
+			 */
+			if (setexpr->funcReturnsTuple)
+			{
+				if (!fcinfo->isnull)
+				{
+					if (td == NULL)
+						td = DatumGetHeapTupleHeader(result);
+					
 					/*
-					 * tuplestore_puttuple needs a HeapTuple not a bare
-					 * HeapTupleHeader, but it doesn't need all the fields.
+					 * Verify all later returned rows have same subtype;
+					 * necessary in case the type is RECORD.
 					 */
-					tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-					tmptup.t_data = td;
+					if (HeapTupleHeaderGetTypeId(td) != rsinfo->setDesc->tdtypeid ||
+						HeapTupleHeaderGetTypMod(td) != rsinfo->setDesc->tdtypmod)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("rows returned by function are not all of the same row type")));
 
-					tuplestore_puttuple(tupstore, &tmptup);
+					slot_puttuple_offset (resultslot, setexpr->funcResultDesc, scanslot_off, rsinfo->setDesc, result);
 				}
 				else
 				{
@@ -329,91 +486,57 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 					 * the provided descriptor, since that might not match
 					 * what we get from the function itself.  But it doesn't.)
 					 */
-					int			natts = expectedDesc->natts;
-					bool	   *nullflags;
-
-					nullflags = (bool *) palloc(natts * sizeof(bool));
-					memset(nullflags, true, natts * sizeof(bool));
-					tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
+					slot_puttuple_offset (resultslot, setexpr->funcResultDesc, scanslot_off, rsinfo->setDesc, 0);
 				}
 			}
 			else
 			{
 				/* Scalar-type case: just store the function result */
-				tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
+				slot_putscalar_offset (resultslot, setexpr->funcResultDesc, scanslot_off, result, fcinfo->isnull);
 			}
-
-			/*
-			 * Are we done?
-			 */
-			if (rsinfo.isDone != ExprMultipleResult)
-				break;
 		}
-		else if (rsinfo.returnMode == SFRM_Materialize)
-		{
-			/* check we're on the same page as the function author */
-			if (!first_time || rsinfo.isDone != ExprSingleResult)
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-						 errmsg("table-function protocol for materialize mode was not followed")));
-			/* Done evaluating the set result */
-			break;
-		}
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-					 errmsg("unrecognized table-function returnMode: %d",
-							(int) rsinfo.returnMode)));
-
-		first_time = false;
 	}
-
-no_function_result:
-
-	/*
-	 * If we got nothing from the function (ie, an empty-set or NULL result),
-	 * we have to create the tuplestore to return, and if it's a
-	 * non-set-returning function then insert a single all-nulls row.  As
-	 * above, we depend on the expectedDesc to manufacture the dummy row.
-	 */
-	if (rsinfo.setResult == NULL)
+	else if (rsinfo->returnMode == SFRM_Materialize)
 	{
-		MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-		tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-		rsinfo.setResult = tupstore;
-		if (!returnsSet)
+		/* check we're on the same page as the function author */
+		if (rsinfo->isDone != ExprSingleResult)
+			ereport(ERROR,
+					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+					 errmsg("table-function protocol for materialize mode was not followed")));
+		/* prepare to return values from the tuplestore */
+		ExecPrepareFuncResultslot(setexpr, rsinfo->setDesc);
+		
+		setexpr->funcResultStore = rsinfo->setResult;
+		
+		if (rsinfo->setDesc && setexpr->funcResultDesc)
 		{
-			int			natts = expectedDesc->natts;
-			bool	   *nullflags;
-
-			MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-			nullflags = (bool *) palloc(natts * sizeof(bool));
-			memset(nullflags, true, natts * sizeof(bool));
-			tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
+			tupledesc_match(setexpr->funcResultDesc, rsinfo->setDesc);
+			fs->tupdesc_checked = true;
 		}
-	}
 
-	/*
-	 * If function provided a tupdesc, cross-check it.  We only really need to
-	 * do this for functions returning RECORD, but might as well do it always.
-	 */
-	if (rsinfo.setDesc)
-	{
-		tupledesc_match(expectedDesc, rsinfo.setDesc);
+		/* Register cleanup callback if we didn't already */
+		if (!setexpr->shutdown_reg)
+		{
+			RegisterExprContextCallback(econtext,
+										ShutdownSetExpr,
+										PointerGetDatum(setexpr));
+			setexpr->shutdown_reg = true;
+		}
 
-		/*
-		 * If it is a dynamically-allocated TupleDesc, free it: it is
-		 * typically allocated in a per-query context, so we must avoid
-		 * leaking it across multiple usages.
-		 */
-		if (rsinfo.setDesc->tdrefcount == -1)
-			FreeTupleDesc(rsinfo.setDesc);
+		/* Now process from tuplestore, returning one value per call */
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		
+		goto restart;
 	}
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+				 errmsg("unrecognized table-function returnMode: %d",
+						(int) rsinfo->returnMode)));
 
-	MemoryContextSwitchTo(callerContext);
+	*isDone = rsinfo->isDone;
 
-	/* All done, pass back the tuplestore */
-	return rsinfo.setResult;
+	/* All done, result is in the tupleslot */
 }
 
 
@@ -650,6 +773,7 @@ restart:
 		if (rsinfo.setResult != NULL)
 		{
 			/* prepare to return values from the tuplestore */
+			ExecPrepareFuncResultslot(fcache, rsinfo.setDesc);
 			ExecPrepareTuplestoreResult(fcache, econtext,
 										rsinfo.setResult,
 										rsinfo.setDesc);
@@ -712,6 +836,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
 							 numargs,
 							 input_collation, NULL, NULL);
+	sexpr->fcinfo->resultinfo = NULL;
 
 	/* If function returns set, check if that's allowed by caller */
 	if (sexpr->func.fn_retset && !allowSRF)
@@ -835,21 +960,16 @@ ExecEvalFuncArgs(FunctionCallInfo fcinfo,
 }
 
 /*
- *		ExecPrepareTuplestoreResult
+ *		ExecPrepareFuncResultslot
  *
- * Subroutine for ExecMakeFunctionResultSet: prepare to extract rows from a
- * tuplestore function result.  We must set up a funcResultSlot (unless
- * already done in a previous call cycle) and verify that the function
- * returned the expected tuple descriptor.
+ * Subroutine for ExecMakeFunctionResultSet: in preparation to extract rows from a
+ * tuplestore function result, we must set up a funcResultSlot (unless
+ * already done in a previous call cycle).
  */
 static void
-ExecPrepareTuplestoreResult(SetExprState *sexpr,
-							ExprContext *econtext,
-							Tuplestorestate *resultStore,
+ExecPrepareFuncResultslot(SetExprState *sexpr,
 							TupleDesc resultDesc)
 {
-	sexpr->funcResultStore = resultStore;
-
 	if (sexpr->funcResultSlot == NULL)
 	{
 		/* Create a slot so we can read data out of the tuplestore */
@@ -882,6 +1002,23 @@ ExecPrepareTuplestoreResult(SetExprState *sexpr,
 														 &TTSOpsMinimalTuple);
 		MemoryContextSwitchTo(oldcontext);
 	}
+}
+
+/*
+ *		ExecPrepareTuplestoreResult
+ *
+ * Subroutine for ExecMakeFunctionResultSet: in preparation to extract rows from a
+ * tuplestore function result, we must verify that the function
+ * returned the expected tuple descriptor, and ensure we are called back to clean up
+ * at the end of the scan.
+ */
+static void
+ExecPrepareTuplestoreResult(SetExprState *sexpr,
+							ExprContext *econtext,
+							Tuplestorestate *resultStore,
+							TupleDesc resultDesc)
+{
+	sexpr->funcResultStore = resultStore;
 
 	/*
 	 * If function provided a tupdesc, cross-check it.  We only really need to
@@ -960,3 +1097,68 @@ tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
 							   i + 1)));
 	}
 }
+
+static void
+slot_puttuple_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					  TupleDesc resultdesc, Datum result)
+{
+	if (result != 0)
+	{
+		HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+		/*
+		 * tuplestore_puttuple needs a HeapTuple not a bare
+		 * HeapTupleHeader, but it doesn't need all the fields.
+		 */
+		HeapTupleData tmptup;
+		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+		tmptup.t_data = td;
+
+		/* FIXME: seems we may be able to optimise the case where there is just one Function being scanned. Presently, this path causes the tuple to be read from disk, and it happens because we place results into a VirtualTupleSlot. In turn, this is needed because of the multiple-function ROWS FROM (...) case. In a single function case, we could perhaps simply pass on the returned TupleSlot, regardless of whether it has been read into memory. */
+		heap_deform_tuple (&tmptup, expectedDesc, &(scanslot->tts_values[scanslot_off]), &(scanslot->tts_isnull[scanslot_off]));
+	}
+	else
+	{
+		/* Ensure any remaining result cols are initialsed to NULL. */
+		for (int i = 0; i < expectedDesc->natts; i++)
+		{
+			scanslot->tts_values[scanslot_off + i] = (Datum) 0;
+			scanslot->tts_isnull[scanslot_off + i] = true;
+		}
+	}
+}
+
+static void
+slot_copyslots_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					   int natts, Datum *datums, bool *isnulls)
+{
+	int i;
+	for (i = 0; i < natts; i++)
+	{
+		if (i >= expectedDesc->natts)
+			break;
+		
+		scanslot->tts_values[scanslot_off + i] = datums[i];
+		scanslot->tts_isnull[scanslot_off + i] = isnulls[i];
+	}
+	
+	/* Ensure any remaining result cols are initialsed to NULL. */
+	for (; i < expectedDesc->natts; i++)
+	{
+		scanslot->tts_values[scanslot_off + i] = (Datum) 0;
+		scanslot->tts_isnull[scanslot_off + i] = true;
+	}
+}
+
+static void
+slot_copyslot_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					  TupleDesc resultdesc, TupleTableSlot *result)
+{
+	slot_copyslots_offset (scanslot, expectedDesc, scanslot_off, resultdesc->natts, &(result->tts_values[0]), &(result->tts_isnull[0]));
+}
+
+static void
+slot_putscalar_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off, Datum result, bool isNull)
+{
+	slot_copyslots_offset (scanslot, expectedDesc, scanslot_off, 1, &result, &isNull);
+}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index 0370f2e..d5472ac 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -30,19 +30,6 @@
 #include "utils/memutils.h"
 
 
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-	SetExprState *setexpr;		/* state of the expression being evaluated */
-	TupleDesc	tupdesc;		/* desc of the function result type */
-	int			colcount;		/* expected number of result columns */
-	Tuplestorestate *tstore;	/* holds the function result set */
-	int64		rowcount;		/* # of rows in result set, -1 if not known */
-	TupleTableSlot *func_slot;	/* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
 static TupleTableSlot *FunctionNext(FunctionScanState *node);
 
 
@@ -62,8 +49,9 @@ FunctionNext(FunctionScanState *node)
 	EState	   *estate;
 	ScanDirection direction;
 	TupleTableSlot *scanslot;
+	MemoryContext oldcontext;
+	ExprDoneCond		doneCond;
 	bool		alldone;
-	int64		oldpos;
 	int			funcno;
 	int			att;
 
@@ -74,59 +62,34 @@ FunctionNext(FunctionScanState *node)
 	direction = estate->es_direction;
 	scanslot = node->ss.ss_ScanTupleSlot;
 
-	if (node->simple)
-	{
-		/*
-		 * Fast path for the trivial case: the function return type and scan
-		 * result type are the same, so we fetch the function result straight
-		 * into the scan result slot. No need to update ordinality or
-		 * rowcounts either.
-		 */
-		Tuplestorestate *tstore = node->funcstates[0].tstore;
-
-		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
-		 */
-		if (tstore == NULL)
-		{
-			node->funcstates[0].tstore = tstore =
-				ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											node->funcstates[0].tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(tstore);
-		}
+	ExecClearTuple(scanslot);
 
-		/*
-		 * Get the next tuple from tuplestore.
-		 */
-		(void) tuplestore_gettupleslot(tstore,
-									   ScanDirectionIsForward(direction),
-									   false,
-									   scanslot);
-		return scanslot;
+	/* Call SRFs, as well as plain expressions, in per-tuple context */
+	oldcontext = MemoryContextSwitchTo(node->ss.ps.ps_ExprContext->ecxt_per_tuple_memory);
+	
+	/*
+	 * Check to see if we're still projecting out tuples from a previous scan
+	 * tuple (because there is a function-returning-set in the projection
+	 * expressions). If not, indicate we are finished now.
+	 */
+	if (!node->pending_srf_tuples)
+	{
+		alldone = true;
+		goto return_resultslot;
 	}
 
 	/*
-	 * Increment or decrement ordinal counter before checking for end-of-data,
-	 * so that we can move off either end of the result by 1 (and no more than
-	 * 1) without losing correct count.  See PortalRunSelect for why we can
+	 * Assume no further tuples are produced unless an ExprMultipleResult is
+	 * encountered from a set returning function.
+	 */
+	node->pending_srf_tuples = false;
+
+	/*
+	 * Increment ordinal counter before checking for end-of-data.
+	 * See PortalRunSelect for why we can
 	 * assume that we won't be called repeatedly in the end-of-data state.
 	 */
-	oldpos = node->ordinal;
-	if (ScanDirectionIsForward(direction))
-		node->ordinal++;
-	else
-		node->ordinal--;
+	node->ordinal++;
 
 	/*
 	 * Main loop over functions.
@@ -141,87 +104,29 @@ FunctionNext(FunctionScanState *node)
 	for (funcno = 0; funcno < node->nfuncs; funcno++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[funcno];
-		int			i;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
+		 * Read a tuples from function and put it in the scanslot.
 		 */
-		if (fs->tstore == NULL)
+		ExecMakeTableFunctionResult(fs,
+									node->ss.ps.ps_ExprContext,
+									node->argcontext,
+									scanslot,
+									att,
+									&doneCond);
+
+		if (doneCond != ExprEndResult)
 		{
-			fs->tstore =
-				ExecMakeTableFunctionResult(fs->setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											fs->tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(fs->tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
-		 *
-		 * If we have a rowcount for the function, and we know the previous
-		 * read position was out of bounds, don't try the read. This allows
-		 * backward scan to work when there are mixed row counts present.
-		 */
-		if (fs->rowcount != -1 && fs->rowcount < oldpos)
-			ExecClearTuple(fs->func_slot);
-		else
-			(void) tuplestore_gettupleslot(fs->tstore,
-										   ScanDirectionIsForward(direction),
-										   false,
-										   fs->func_slot);
-
-		if (TupIsNull(fs->func_slot))
-		{
-			/*
-			 * If we ran out of data for this function in the forward
-			 * direction then we now know how many rows it returned. We need
-			 * to know this in order to handle backwards scans. The row count
-			 * we store is actually 1+ the actual number, because we have to
-			 * position the tuplestore 1 off its end sometimes.
-			 */
-			if (ScanDirectionIsForward(direction) && fs->rowcount == -1)
-				fs->rowcount = node->ordinal;
-
-			/*
-			 * populate the result cols with nulls
-			 */
-			for (i = 0; i < fs->colcount; i++)
-			{
-				scanslot->tts_values[att] = (Datum) 0;
-				scanslot->tts_isnull[att] = true;
-				att++;
-			}
-		}
-		else
-		{
-			/*
-			 * we have a result, so just copy it to the result cols.
-			 */
-			slot_getallattrs(fs->func_slot);
-
-			for (i = 0; i < fs->colcount; i++)
-			{
-				scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-				scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
-				att++;
-			}
-
 			/*
 			 * We're not done until every function result is exhausted; we pad
 			 * the shorter results with nulls until then.
 			 */
 			alldone = false;
 		}
+		if (doneCond == ExprMultipleResult)
+			node->pending_srf_tuples = true;
+
+		att += fs->colcount;
 	}
 
 	/*
@@ -233,6 +138,9 @@ FunctionNext(FunctionScanState *node)
 		scanslot->tts_isnull[att] = false;
 	}
 
+return_resultslot:
+	MemoryContextSwitchTo(oldcontext);
+	
 	/*
 	 * If alldone, we just return the previously-cleared scanslot.  Otherwise,
 	 * finish creating the virtual tuple.
@@ -353,14 +261,6 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 										scanstate->ss.ps.ps_ExprContext,
 										&scanstate->ss.ps);
 
-		/*
-		 * Don't allocate the tuplestores; the actual calls to the functions
-		 * do that.  NULL means that we have not called the function yet (or
-		 * need to call it again after a rescan).
-		 */
-		fs->tstore = NULL;
-		fs->rowcount = -1;
-
 		/*
 		 * Now determine if the function returns a simple or composite type,
 		 * and build an appropriate tupdesc.  Note that in the composite case,
@@ -371,6 +271,12 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 											&funcrettype,
 											&tupdesc);
 
+		/*
+		 * FIXME: we set funcReturnsTuple, but it is a slightly different
+		 * check to what type_is_rowtype() executes. Don't know if it is
+		 * a problem.
+		 */
+		
 		if (functypclass == TYPEFUNC_COMPOSITE ||
 			functypclass == TYPEFUNC_COMPOSITE_DOMAIN)
 		{
@@ -379,6 +285,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			Assert(tupdesc->natts >= colcount);
 			/* Must copy it out of typcache for safety */
 			tupdesc = CreateTupleDescCopy(tupdesc);
+			fs->setexpr->funcReturnsTuple = true;
 		}
 		else if (functypclass == TYPEFUNC_SCALAR)
 		{
@@ -393,6 +300,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			TupleDescInitEntryCollation(tupdesc,
 										(AttrNumber) 1,
 										exprCollation(funcexpr));
+			fs->setexpr->funcReturnsTuple = false;
 		}
 		else if (functypclass == TYPEFUNC_RECORD)
 		{
@@ -407,6 +315,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			 * case it doesn't.)
 			 */
 			BlessTupleDesc(tupdesc);
+			fs->setexpr->funcReturnsTuple = true;
 		}
 		else
 		{
@@ -414,21 +323,10 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			elog(ERROR, "function in FROM has unsupported return type");
 		}
 
-		fs->tupdesc = tupdesc;
+		fs->setexpr->funcResultDesc = tupdesc;
 		fs->colcount = colcount;
 
-		/*
-		 * We only need separate slots for the function results if we are
-		 * doing ordinality or multiple functions; otherwise, we'll fetch
-		 * function results directly into the scan slot.
-		 */
-		if (!scanstate->simple)
-		{
-			fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-												   &TTSOpsMinimalTuple);
-		}
-		else
-			fs->func_slot = NULL;
+		fs->tupdesc_checked = false;
 
 		natts += colcount;
 		i++;
@@ -443,7 +341,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	if (scanstate->simple)
 	{
-		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].tupdesc);
+		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].setexpr->funcResultDesc);
 		scan_tupdesc->tdtypeid = RECORDOID;
 		scan_tupdesc->tdtypmod = -1;
 	}
@@ -458,7 +356,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 
 		for (i = 0; i < nfuncs; i++)
 		{
-			TupleDesc	tupdesc = scanstate->funcstates[i].tupdesc;
+			TupleDesc	tupdesc = scanstate->funcstates[i].setexpr->funcResultDesc;
 			int			colcount = scanstate->funcstates[i].colcount;
 			int			j;
 
@@ -497,6 +395,11 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
+	
+	/*
+	 * Start out assuming there will be tuples returned.
+	 */
+	scanstate->pending_srf_tuples = true;
 
 	/*
 	 * Create a memory context that ExecMakeTableFunctionResult can use to
@@ -521,8 +424,6 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 void
 ExecEndFunctionScan(FunctionScanState *node)
 {
-	int			i;
-
 	/*
 	 * Free the exprcontext
 	 */
@@ -534,23 +435,6 @@ ExecEndFunctionScan(FunctionScanState *node)
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
-	/*
-	 * Release slots and tuplestore resources
-	 */
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-
-		if (fs->tstore != NULL)
-		{
-			tuplestore_end(node->funcstates[i].tstore);
-			fs->tstore = NULL;
-		}
-	}
 }
 
 /* ----------------------------------------------------------------
@@ -568,13 +452,6 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-	}
 
 	ExecScanReScan(&node->ss);
 
@@ -597,12 +474,11 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 			if (bms_overlap(chgparam, rtfunc->funcparams))
 			{
-				if (node->funcstates[i].tstore != NULL)
+				if (node->funcstates[i].setexpr->funcResultStore != NULL)
 				{
-					tuplestore_end(node->funcstates[i].tstore);
-					node->funcstates[i].tstore = NULL;
+					tuplestore_end(node->funcstates[i].setexpr->funcResultStore);
+					node->funcstates[i].setexpr->funcResultStore = NULL;
 				}
-				node->funcstates[i].rowcount = -1;
 			}
 			i++;
 		}
@@ -614,7 +490,16 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	/* Make sure we rewind any remaining tuplestores */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		if (node->funcstates[i].tstore != NULL)
-			tuplestore_rescan(node->funcstates[i].tstore);
+		if (node->funcstates[i].setexpr->funcResultStore != NULL)
+			tuplestore_rescan (node->funcstates[i].setexpr->funcResultStore);
+
+		/* No matter what, we renew the ResultSetInfo structure */
+		if (node->funcstates[i].setexpr->fcinfo != NULL)
+			node->funcstates[i].setexpr->fcinfo->resultinfo = NULL;
 	}
+	
+	/*
+	 * Start out assuming there will be tuples returned.
+	 */
+	node->pending_srf_tuples = true;
 }
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9be0b38..6d79394 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -405,11 +405,12 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
  */
 extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
 												 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
+extern void ExecMakeTableFunctionResult(struct FunctionScanPerFuncState *fs,
 													ExprContext *econtext,
 													MemoryContext argContext,
-													TupleDesc expectedDesc,
-													bool randomAccess);
+													TupleTableSlot *scanslot,
+													AttrNumber scanslot_off,
+													ExprDoneCond *isDone);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
 											   ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 4f7d60d..8fd572f 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,16 @@
 
 #include "nodes/execnodes.h"
 
+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+	SetExprState *setexpr;		/* state of the expression being evaluated */
+	int			colcount;		/* expected number of result columns */
+	bool		tupdesc_checked; /* has the return tupdesc been checked? */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9ac7bc1..0ff58fc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1652,6 +1652,7 @@ typedef struct SubqueryScanState
  *		funcstates			per-function execution states (private in
  *							nodeFunctionscan.c)
  *		argcontext			memory context to evaluate function arguments in
+ *		pending_srf_tuples	still evaluating any SRFs?
  * ----------------
  */
 struct FunctionScanPerFuncState;
@@ -1666,6 +1667,7 @@ typedef struct FunctionScanState
 	int			nfuncs;
 	struct FunctionScanPerFuncState *funcstates;	/* array of length nfuncs */
 	MemoryContext argcontext;
+	bool		pending_srf_tuples;
 } FunctionScanState;
 
 /* ----------------

#12

Pavel Stehule

pavel.stehule@gmail.com

about 6 years ago

In reply to: Dent John (#11)

Re: The flinfo->fn_extra question, from me this time.

ne 3. 11. 2019 v 12:51 odesílatel Dent John <denty@qqdd.eu> napsal:

(And here’s aforementioned attachment… doh.)

can be nice, if patch has some regress tests - it is good for memory
refreshing what is target of patch.

Regards

Pavel

#13

Dent John

denty@QQdd.eu

about 6 years ago

In reply to: Pavel Stehule (#12)

Re: The flinfo->fn_extra question, from me this time.

On 3 Nov 2019, at 13:33, Pavel Stehule <pavel.stehule@gmail.com> wrote:

can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

With a suitably small work_mem constraint, it is possible to show the absence of buffers resulting from the tuplestore. It’ll need some commentary explaining what is being looked for, and why. But it’s a good idea.

I’ll take a look.

denty.

#14

Dent John

denty@QQdd.eu

about 6 years ago

In reply to: Dent John (#13)

1 attachment(s)

Re: The flinfo->fn_extra question, from me this time.

On 3 Nov 2019, at 13:33, Pavel Stehule <pavel.stehule@gmail.com> wrote:

can be nice, if patch has some regress tests - it is good for memory refreshing what is target of patch.

I’ve updated the patch, and added some regression tests.

denty.

Attachments:

pipeline-functionscan-v3.patchapplication/octet-stream; name=pipeline-functionscan-v3.patch; x-unix-mode=0644Download

diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1f18e5d..f3205e4 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -554,7 +554,6 @@ ExecSupportsBackwardScan(Plan *node)
 
 		case T_SeqScan:
 		case T_TidScan:
-		case T_FunctionScan:
 		case T_ValuesScan:
 		case T_CteScan:
 		case T_Material:
@@ -613,7 +612,6 @@ ExecMaterializesOutput(NodeTag plantype)
 	switch (plantype)
 	{
 		case T_Material:
-		case T_FunctionScan:
 		case T_TableFuncScan:
 		case T_CteScan:
 		case T_NamedTuplestoreScan:
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index c8a3efc..1037909 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,7 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeFunctionscan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -45,6 +46,23 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
 										Tuplestorestate *resultStore,
 										TupleDesc resultDesc);
 static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+static void ExecPrepareFuncResultslot(SetExprState *sexpr,
+										TupleDesc resultDesc);
+static void slot_puttuple_offset (TupleTableSlot *scanslot,
+								  TupleDesc expectedDesc,
+								  AttrNumber scanslot_off,
+								  TupleDesc resultdesc,
+								  Datum result);
+static void slot_copyslot_offset (TupleTableSlot *scanslot,
+								  TupleDesc expectedDesc,
+								  AttrNumber scanslot_off,
+								  TupleDesc resultdesc,
+								  TupleTableSlot *result);
+static void slot_putscalar_offset (TupleTableSlot *scanslot,
+								   TupleDesc expectedDesc,
+								   AttrNumber scanslot_off,
+								   Datum result,
+								   bool isNull);
 
 
 /*
@@ -89,39 +107,134 @@ ExecInitTableFunctionResult(Expr *expr,
 	return state;
 }
 
+static void
+ExecFetchFromTableFunctionTuplestore(SetExprState *setexpr,
+									 TupleDesc expectedDesc,
+									 TupleTableSlot *resultslot,
+									 AttrNumber scanslot_off,
+									 ExprDoneCond *isDone)
+{
+	MemoryContext oldContext;
+	bool		foundTup;
+	
+	/*
+	 * Have to make sure tuple in slot lives long enough, otherwise
+	 * clearing the slot could end up trying to free something already
+	 * freed.
+	 */
+	oldContext = MemoryContextSwitchTo(resultslot->tts_mcxt);
+	foundTup = tuplestore_gettupleslot(setexpr->funcResultStore, true, false,
+									   setexpr->funcResultSlot);
+	MemoryContextSwitchTo(oldContext);
+	
+	if (foundTup)
+	{
+		*isDone = ExprMultipleResult;
+		
+		if (setexpr->funcReturnsTuple)
+		{
+			/* We must expand the whole tuple. */
+			slot_getallattrs(setexpr->funcResultSlot);
+			
+			/*
+			 * Copy it to the result cols.
+			 */
+			slot_copyslot_offset (resultslot, expectedDesc, scanslot_off, setexpr->funcResultSlot->tts_tupleDescriptor, setexpr->funcResultSlot);
+		}
+		else
+		{
+			bool		isNull = false;
+			
+			/* Extract the first column and return it as a scalar. */
+			Datum result = slot_getattr(setexpr->funcResultSlot, 1, &isNull);
+			
+			slot_putscalar_offset (resultslot, expectedDesc, scanslot_off, result, isNull);
+		}
+	}
+	else
+	{
+		/* Exhausted the tuplestore, so clean up */
+		tuplestore_end(setexpr->funcResultStore);
+		setexpr->funcResultStore = NULL;
+		
+		/* We must store a row of NULLs in case we are used in ROWS FROM */
+		slot_puttuple_offset (resultslot, setexpr->funcResultDesc, scanslot_off, NULL, 0);
+
+		*isDone = ExprEndResult;
+	}
+}
+
 /*
  *		ExecMakeTableFunctionResult
  *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
+ * Evaluate a table function, storing a single row in scanslot starting at
+ * attribute scanslot_off.
  *
  * This is used by nodeFunctionscan.c.
  */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
+void
+ExecMakeTableFunctionResult(FunctionScanPerFuncState *fs,
 							ExprContext *econtext,
 							MemoryContext argContext,
-							TupleDesc expectedDesc,
-							bool randomAccess)
+							TupleTableSlot *resultslot,
+							AttrNumber scanslot_off,
+							ExprDoneCond *isDone)
 {
-	Tuplestorestate *tupstore = NULL;
-	TupleDesc	tupdesc = NULL;
-	Oid			funcrettype;
-	bool		returnsTuple;
-	bool		returnsSet = false;
-	FunctionCallInfo fcinfo;
+	SetExprState *setexpr = fs->setexpr;
+	bool		call_fn = true; /* whether to actually call the SRF */
+	bool		already_stored = false; /* has the result been stored? */
+	FunctionCallInfo fcinfo = setexpr->fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
-	HeapTupleData tmptup;
-	MemoryContext callerContext;
+	ReturnSetInfo *rsinfo;
 	MemoryContext oldcontext;
-	bool		first_time = true;
+	Datum		result = 0;
+
+restart:
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+	
+	/*
+	 * If a previous call of the function returned a set result in the form of
+	 * a tuplestore, continue reading rows from the tuplestore until it's
+	 * empty.
+	 */
+	if (setexpr->funcResultStore)
+	{
+		rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; /* always set if funcResultStore is set */
+
+		ExecFetchFromTableFunctionTuplestore(setexpr, setexpr->funcResultDesc, resultslot, scanslot_off, &rsinfo->isDone);
+
+		/*
+		 * We are done here: fall through below with isDone as either
+		 * ExprMultipleResult or ExprEndResult.
+		 */
+
+		already_stored = true;
+		call_fn = false;
+	}
+	/*
+	 * The elidedFuncState case isn't related to the SFRM_Materialize/
+	 * FetchFromTuplestore decision, except that it cannot occur in that
+	 * case, so we code it as if/elseif, rather than if/if.
+	 */
+	else if (setexpr->elidedFuncState)
+	{
 
-	callerContext = CurrentMemoryContext;
+		/* For SRFs, fcinfo would have been allocated by init_sexpr(). */
+		if (fcinfo == NULL)
+		{
+			oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+			
+			/* By performing InitFunctionCallInfoData here, we avoid palloc0() */
+			setexpr->fcinfo = fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
 
-	funcrettype = exprType((Node *) setexpr->expr);
+			MemoryContextSwitchTo(oldcontext);
 
-	returnsTuple = type_is_rowtype(funcrettype);
+			/* Treat setexpr as a generic expression */
+			InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
+		}
+	}
 
 	/*
 	 * Prepare a resultinfo node for communication.  We always do this even if
@@ -130,18 +243,50 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 	 * resultinfo, but set it up anyway because we use some of the fields as
 	 * our own state variables.
 	 */
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = expectedDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-	if (randomAccess)
-		rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
+	rsinfo = (ReturnSetInfo *) setexpr->fcinfo->resultinfo;
+	
+	if (rsinfo == NULL)
+	{
+		oldcontext = MemoryContextSwitchTo(argContext);
+
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = setexpr->funcResultDesc;
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		fcinfo->resultinfo = (Node *) rsinfo;
 
-	fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
+		MemoryContextSwitchTo(oldcontext);
+	}
+	else
+	{
+		/*
+		 * If rsinfo was already present, it means we're being asked
+		 * to continue projecting. In turn, if last time we projected
+		 * a SingleResult, then all future calls should be handled as
+		 * if it was the last row from an SRF.
+		 *
+		 * Note: this is different from the ProjectSet case, which
+		 * instead re-invokes the non-SRF function for each row.
+		 */
+		if (rsinfo->isDone == ExprSingleResult)
+			rsinfo->isDone = ExprEndResult;
+	}
+
+	/*
+	 * If we're asked to continuing to project output rows despite the SRF
+	 * being exhausted (indicated by isDone being alreday set to ExprEndResult),
+	 * return NULLs forever.
+	 */
+	if (rsinfo->isDone == ExprEndResult && !already_stored)
+	{
+		call_fn = false; /* don't invoke the function again */
+		rsinfo->returnMode = SFRM_ValuePerCall; /* returning a row at a time */
+		result = 0; /* result and NULL is set later */
+		fcinfo->isnull = true;
+
+		/* the actual result is written later */
+	}
 
 	/*
 	 * Normally the passed expression tree will be a SetExprState, since the
@@ -153,37 +298,46 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 	 * don't get a chance to pass a special ReturnSetInfo to any functions
 	 * buried in the expression.
 	 */
-	if (!setexpr->elidedFuncState)
+	if (call_fn && !setexpr->elidedFuncState)
 	{
 		/*
 		 * This path is similar to ExecMakeFunctionResultSet.
 		 */
-		returnsSet = setexpr->funcReturnsSet;
 		InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
 								 list_length(setexpr->args),
 								 setexpr->fcinfo->fncollation,
-								 NULL, (Node *) &rsinfo);
+								 NULL, (Node *) rsinfo);
 
 		/*
 		 * Evaluate the function's argument list.
 		 *
-		 * We can't do this in the per-tuple context: the argument values
-		 * would disappear when we reset that context in the inner loop.  And
-		 * the caller's CurrentMemoryContext is typically a query-lifespan
-		 * context, so we don't want to leak memory there.  We require the
-		 * caller to pass a separate memory context that can be used for this,
-		 * and can be reset each time through to avoid bloat.
+		 * arguments is a list of expressions to evaluate before passing to the
+		 * function manager.  We skip the evaluation if it was already done in the
+		 * previous call (ie, we are continuing the evaluation of a set-valued
+		 * function).  Otherwise, collect the current argument values into fcinfo.
+		 *
+		 * The arguments have to live in a context that lives at least until all
+		 * rows from this SRF have been returned, otherwise ValuePerCall SRFs
+		 * would reference freed memory after the first returned row.
 		 */
-		MemoryContextReset(argContext);
-		oldcontext = MemoryContextSwitchTo(argContext);
-		ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-		MemoryContextSwitchTo(oldcontext);
+		if (!setexpr->setArgsValid)
+		{
+			oldcontext = MemoryContextSwitchTo(argContext);
+			ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
+			MemoryContextSwitchTo(oldcontext);
+		}
+		else
+		{
+			/* Reset flag (we may set it again below) */
+			setexpr->setArgsValid = false;
+		}
 
 		/*
 		 * If function is strict, and there are any NULL arguments, skip
 		 * calling the function and act like it returned NULL (or an empty
 		 * set, in the returns-set case).
 		 */
+		call_fn = true;
 		if (setexpr->func.fn_strict)
 		{
 			int			i;
@@ -191,100 +345,90 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 			for (i = 0; i < fcinfo->nargs; i++)
 			{
 				if (fcinfo->args[i].isnull)
-					goto no_function_result;
+				{
+					call_fn = false;
+
+					result = 0;
+					fcinfo->isnull = true;
+					rsinfo->isDone = ExprEndResult;
+
+					break;
+				}
 			}
 		}
 	}
-	else
-	{
-		/* Treat setexpr as a generic expression */
-		InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
-	}
-
-	/*
-	 * Switch to short-lived context for calling the function or expression.
-	 */
-	MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
 
-	/*
-	 * Loop to handle the ValuePerCall protocol (which is also the same
-	 * behavior needed in the generic ExecEvalExpr path).
-	 */
-	for (;;)
+	/* Call the function or expression one time */
+	if (call_fn)
 	{
-		Datum		result;
-
-		CHECK_FOR_INTERRUPTS();
-
-		/*
-		 * reset per-tuple memory context before each call of the function or
-		 * expression. This cleans up any local memory the function may leak
-		 * when called.
-		 */
-		ResetExprContext(econtext);
-
-		/* Call the function or expression one time */
 		if (!setexpr->elidedFuncState)
 		{
 			pgstat_init_function_usage(fcinfo, &fcusage);
 
 			fcinfo->isnull = false;
-			rsinfo.isDone = ExprSingleResult;
+			rsinfo->isDone = ExprSingleResult;
 			result = FunctionCallInvoke(fcinfo);
 
 			pgstat_end_function_usage(&fcusage,
-									  rsinfo.isDone != ExprMultipleResult);
+									  rsinfo->isDone != ExprMultipleResult);
 		}
 		else
 		{
 			result =
 				ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-			rsinfo.isDone = ExprSingleResult;
+			rsinfo->isDone = ExprSingleResult;
 		}
+	}
+	
+	/* Which protocol does function want to use? */
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
+	{
+		HeapTupleHeader td = NULL;
 
-		/* Which protocol does function want to use? */
-		if (rsinfo.returnMode == SFRM_ValuePerCall)
+		if (rsinfo->isDone != ExprEndResult)
 		{
 			/*
-			 * Check for end of result set.
+			 * Save the current argument values to re-use on the next call.
 			 */
-			if (rsinfo.isDone == ExprEndResult)
-				break;
+			if (rsinfo->isDone == ExprMultipleResult)
+			{
+				setexpr->setArgsValid = true;
+				/* Register cleanup callback if we didn't already */
+				if (!setexpr->shutdown_reg)
+				{
+					RegisterExprContextCallback(econtext,
+												ShutdownSetExpr,
+												PointerGetDatum(setexpr));
+					setexpr->shutdown_reg = true;
+				}
+			}
 
 			/*
-			 * If first time through, build tuplestore for result.  For a
-			 * scalar function result type, also make a suitable tupdesc.
+			 * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
 			 */
-			if (first_time)
+			if (rsinfo->setDesc == NULL)
 			{
-				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-				tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-				rsinfo.setResult = tupstore;
-				if (!returnsTuple)
+				if (!setexpr->funcReturnsTuple)
 				{
-					tupdesc = CreateTemplateTupleDesc(1);
-					TupleDescInitEntry(tupdesc,
+					/*
+					 * Make a copy for the query.
+					 */
+					oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+					rsinfo->setDesc = CreateTemplateTupleDesc(1);
+					TupleDescInitEntry(rsinfo->setDesc,
 									   (AttrNumber) 1,
 									   "column",
-									   funcrettype,
+									   exprType((Node *) setexpr->expr),
 									   -1,
 									   0);
-					rsinfo.setDesc = tupdesc;
+					MemoryContextSwitchTo(oldcontext);
 				}
-				MemoryContextSwitchTo(oldcontext);
-			}
-
-			/*
-			 * Store current resultset item.
-			 */
-			if (returnsTuple)
-			{
-				if (!fcinfo->isnull)
+				else if (!fcinfo->isnull)
 				{
-					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-					if (tupdesc == NULL)
+					if (result != 0)
 					{
+						td = DatumGetHeapTupleHeader(result);
+
 						/*
 						 * This is the first non-NULL result from the
 						 * function.  Use the type info embedded in the
@@ -292,32 +436,45 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 						 * a copy for the query.
 						 */
 						oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-						tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
+						rsinfo->setDesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
 															  HeapTupleHeaderGetTypMod(td));
-						rsinfo.setDesc = tupdesc;
 						MemoryContextSwitchTo(oldcontext);
 					}
-					else
-					{
-						/*
-						 * Verify all later returned rows have same subtype;
-						 * necessary in case the type is RECORD.
-						 */
-						if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-							HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-							ereport(ERROR,
-									(errcode(ERRCODE_DATATYPE_MISMATCH),
-									 errmsg("rows returned by function are not all of the same row type")));
-					}
+				}
+			}
+
+			/* If we obtained a tupdesc, check it is appropriate */
+			if (rsinfo->setDesc && setexpr->funcResultDesc &&
+				!fs->tupdesc_checked)
+			{
+				tupledesc_match (setexpr->funcResultDesc, rsinfo->setDesc);
+				fs->tupdesc_checked = true;
+			}
+		}
 
+		if (!already_stored)
+		{
+			/*
+			 * Store current resultset item.
+			 */
+			if (setexpr->funcReturnsTuple)
+			{
+				if (!fcinfo->isnull)
+				{
+					if (td == NULL)
+						td = DatumGetHeapTupleHeader(result);
+					
 					/*
-					 * tuplestore_puttuple needs a HeapTuple not a bare
-					 * HeapTupleHeader, but it doesn't need all the fields.
+					 * Verify all later returned rows have same subtype;
+					 * necessary in case the type is RECORD.
 					 */
-					tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-					tmptup.t_data = td;
+					if (HeapTupleHeaderGetTypeId(td) != rsinfo->setDesc->tdtypeid ||
+						HeapTupleHeaderGetTypMod(td) != rsinfo->setDesc->tdtypmod)
+						ereport(ERROR,
+								(errcode(ERRCODE_DATATYPE_MISMATCH),
+								 errmsg("rows returned by function are not all of the same row type")));
 
-					tuplestore_puttuple(tupstore, &tmptup);
+					slot_puttuple_offset (resultslot, setexpr->funcResultDesc, scanslot_off, rsinfo->setDesc, result);
 				}
 				else
 				{
@@ -329,91 +486,57 @@ ExecMakeTableFunctionResult(SetExprState *setexpr,
 					 * the provided descriptor, since that might not match
 					 * what we get from the function itself.  But it doesn't.)
 					 */
-					int			natts = expectedDesc->natts;
-					bool	   *nullflags;
-
-					nullflags = (bool *) palloc(natts * sizeof(bool));
-					memset(nullflags, true, natts * sizeof(bool));
-					tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
+					slot_puttuple_offset (resultslot, setexpr->funcResultDesc, scanslot_off, rsinfo->setDesc, 0);
 				}
 			}
 			else
 			{
 				/* Scalar-type case: just store the function result */
-				tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
+				slot_putscalar_offset (resultslot, setexpr->funcResultDesc, scanslot_off, result, fcinfo->isnull);
 			}
-
-			/*
-			 * Are we done?
-			 */
-			if (rsinfo.isDone != ExprMultipleResult)
-				break;
 		}
-		else if (rsinfo.returnMode == SFRM_Materialize)
-		{
-			/* check we're on the same page as the function author */
-			if (!first_time || rsinfo.isDone != ExprSingleResult)
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-						 errmsg("table-function protocol for materialize mode was not followed")));
-			/* Done evaluating the set result */
-			break;
-		}
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-					 errmsg("unrecognized table-function returnMode: %d",
-							(int) rsinfo.returnMode)));
-
-		first_time = false;
 	}
-
-no_function_result:
-
-	/*
-	 * If we got nothing from the function (ie, an empty-set or NULL result),
-	 * we have to create the tuplestore to return, and if it's a
-	 * non-set-returning function then insert a single all-nulls row.  As
-	 * above, we depend on the expectedDesc to manufacture the dummy row.
-	 */
-	if (rsinfo.setResult == NULL)
+	else if (rsinfo->returnMode == SFRM_Materialize)
 	{
-		MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-		tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-		rsinfo.setResult = tupstore;
-		if (!returnsSet)
+		/* check we're on the same page as the function author */
+		if (rsinfo->isDone != ExprSingleResult)
+			ereport(ERROR,
+					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+					 errmsg("table-function protocol for materialize mode was not followed")));
+		/* prepare to return values from the tuplestore */
+		ExecPrepareFuncResultslot(setexpr, rsinfo->setDesc);
+		
+		setexpr->funcResultStore = rsinfo->setResult;
+		
+		if (rsinfo->setDesc && setexpr->funcResultDesc)
 		{
-			int			natts = expectedDesc->natts;
-			bool	   *nullflags;
-
-			MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-			nullflags = (bool *) palloc(natts * sizeof(bool));
-			memset(nullflags, true, natts * sizeof(bool));
-			tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
+			tupledesc_match(setexpr->funcResultDesc, rsinfo->setDesc);
+			fs->tupdesc_checked = true;
 		}
-	}
 
-	/*
-	 * If function provided a tupdesc, cross-check it.  We only really need to
-	 * do this for functions returning RECORD, but might as well do it always.
-	 */
-	if (rsinfo.setDesc)
-	{
-		tupledesc_match(expectedDesc, rsinfo.setDesc);
+		/* Register cleanup callback if we didn't already */
+		if (!setexpr->shutdown_reg)
+		{
+			RegisterExprContextCallback(econtext,
+										ShutdownSetExpr,
+										PointerGetDatum(setexpr));
+			setexpr->shutdown_reg = true;
+		}
 
-		/*
-		 * If it is a dynamically-allocated TupleDesc, free it: it is
-		 * typically allocated in a per-query context, so we must avoid
-		 * leaking it across multiple usages.
-		 */
-		if (rsinfo.setDesc->tdrefcount == -1)
-			FreeTupleDesc(rsinfo.setDesc);
+		/* Now process from tuplestore, returning one value per call */
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		
+		goto restart;
 	}
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+				 errmsg("unrecognized table-function returnMode: %d",
+						(int) rsinfo->returnMode)));
 
-	MemoryContextSwitchTo(callerContext);
+	*isDone = rsinfo->isDone;
 
-	/* All done, pass back the tuplestore */
-	return rsinfo.setResult;
+	/* All done, result is in the tupleslot */
 }
 
 
@@ -650,6 +773,7 @@ restart:
 		if (rsinfo.setResult != NULL)
 		{
 			/* prepare to return values from the tuplestore */
+			ExecPrepareFuncResultslot(fcache, rsinfo.setDesc);
 			ExecPrepareTuplestoreResult(fcache, econtext,
 										rsinfo.setResult,
 										rsinfo.setDesc);
@@ -712,6 +836,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
 							 numargs,
 							 input_collation, NULL, NULL);
+	sexpr->fcinfo->resultinfo = NULL;
 
 	/* If function returns set, check if that's allowed by caller */
 	if (sexpr->func.fn_retset && !allowSRF)
@@ -835,21 +960,16 @@ ExecEvalFuncArgs(FunctionCallInfo fcinfo,
 }
 
 /*
- *		ExecPrepareTuplestoreResult
+ *		ExecPrepareFuncResultslot
  *
- * Subroutine for ExecMakeFunctionResultSet: prepare to extract rows from a
- * tuplestore function result.  We must set up a funcResultSlot (unless
- * already done in a previous call cycle) and verify that the function
- * returned the expected tuple descriptor.
+ * Subroutine for ExecMakeFunctionResultSet: in preparation to extract rows from a
+ * tuplestore function result, we must set up a funcResultSlot (unless
+ * already done in a previous call cycle).
  */
 static void
-ExecPrepareTuplestoreResult(SetExprState *sexpr,
-							ExprContext *econtext,
-							Tuplestorestate *resultStore,
+ExecPrepareFuncResultslot(SetExprState *sexpr,
 							TupleDesc resultDesc)
 {
-	sexpr->funcResultStore = resultStore;
-
 	if (sexpr->funcResultSlot == NULL)
 	{
 		/* Create a slot so we can read data out of the tuplestore */
@@ -882,6 +1002,23 @@ ExecPrepareTuplestoreResult(SetExprState *sexpr,
 														 &TTSOpsMinimalTuple);
 		MemoryContextSwitchTo(oldcontext);
 	}
+}
+
+/*
+ *		ExecPrepareTuplestoreResult
+ *
+ * Subroutine for ExecMakeFunctionResultSet: in preparation to extract rows from a
+ * tuplestore function result, we must verify that the function
+ * returned the expected tuple descriptor, and ensure we are called back to clean up
+ * at the end of the scan.
+ */
+static void
+ExecPrepareTuplestoreResult(SetExprState *sexpr,
+							ExprContext *econtext,
+							Tuplestorestate *resultStore,
+							TupleDesc resultDesc)
+{
+	sexpr->funcResultStore = resultStore;
 
 	/*
 	 * If function provided a tupdesc, cross-check it.  We only really need to
@@ -960,3 +1097,68 @@ tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
 							   i + 1)));
 	}
 }
+
+static void
+slot_puttuple_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					  TupleDesc resultdesc, Datum result)
+{
+	if (result != 0)
+	{
+		HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+		/*
+		 * tuplestore_puttuple needs a HeapTuple not a bare
+		 * HeapTupleHeader, but it doesn't need all the fields.
+		 */
+		HeapTupleData tmptup;
+		tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+		tmptup.t_data = td;
+
+		/* FIXME: seems we may be able to optimise the case where there is just one Function being scanned. Presently, this path causes the tuple to be read from disk, and it happens because we place results into a VirtualTupleSlot. In turn, this is needed because of the multiple-function ROWS FROM (...) case. In a single function case, we could perhaps simply pass on the returned TupleSlot, regardless of whether it has been read into memory. */
+		heap_deform_tuple (&tmptup, expectedDesc, &(scanslot->tts_values[scanslot_off]), &(scanslot->tts_isnull[scanslot_off]));
+	}
+	else
+	{
+		/* Ensure any remaining result cols are initialsed to NULL. */
+		for (int i = 0; i < expectedDesc->natts; i++)
+		{
+			scanslot->tts_values[scanslot_off + i] = (Datum) 0;
+			scanslot->tts_isnull[scanslot_off + i] = true;
+		}
+	}
+}
+
+static void
+slot_copyslots_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					   int natts, Datum *datums, bool *isnulls)
+{
+	int i;
+	for (i = 0; i < natts; i++)
+	{
+		if (i >= expectedDesc->natts)
+			break;
+		
+		scanslot->tts_values[scanslot_off + i] = datums[i];
+		scanslot->tts_isnull[scanslot_off + i] = isnulls[i];
+	}
+	
+	/* Ensure any remaining result cols are initialsed to NULL. */
+	for (; i < expectedDesc->natts; i++)
+	{
+		scanslot->tts_values[scanslot_off + i] = (Datum) 0;
+		scanslot->tts_isnull[scanslot_off + i] = true;
+	}
+}
+
+static void
+slot_copyslot_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off,
+					  TupleDesc resultdesc, TupleTableSlot *result)
+{
+	slot_copyslots_offset (scanslot, expectedDesc, scanslot_off, resultdesc->natts, &(result->tts_values[0]), &(result->tts_isnull[0]));
+}
+
+static void
+slot_putscalar_offset (TupleTableSlot *scanslot, TupleDesc expectedDesc, AttrNumber scanslot_off, Datum result, bool isNull)
+{
+	slot_copyslots_offset (scanslot, expectedDesc, scanslot_off, 1, &result, &isNull);
+}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index 0370f2e..d5472ac 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -30,19 +30,6 @@
 #include "utils/memutils.h"
 
 
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-	SetExprState *setexpr;		/* state of the expression being evaluated */
-	TupleDesc	tupdesc;		/* desc of the function result type */
-	int			colcount;		/* expected number of result columns */
-	Tuplestorestate *tstore;	/* holds the function result set */
-	int64		rowcount;		/* # of rows in result set, -1 if not known */
-	TupleTableSlot *func_slot;	/* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
 static TupleTableSlot *FunctionNext(FunctionScanState *node);
 
 
@@ -62,8 +49,9 @@ FunctionNext(FunctionScanState *node)
 	EState	   *estate;
 	ScanDirection direction;
 	TupleTableSlot *scanslot;
+	MemoryContext oldcontext;
+	ExprDoneCond		doneCond;
 	bool		alldone;
-	int64		oldpos;
 	int			funcno;
 	int			att;
 
@@ -74,59 +62,34 @@ FunctionNext(FunctionScanState *node)
 	direction = estate->es_direction;
 	scanslot = node->ss.ss_ScanTupleSlot;
 
-	if (node->simple)
-	{
-		/*
-		 * Fast path for the trivial case: the function return type and scan
-		 * result type are the same, so we fetch the function result straight
-		 * into the scan result slot. No need to update ordinality or
-		 * rowcounts either.
-		 */
-		Tuplestorestate *tstore = node->funcstates[0].tstore;
-
-		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
-		 */
-		if (tstore == NULL)
-		{
-			node->funcstates[0].tstore = tstore =
-				ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											node->funcstates[0].tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(tstore);
-		}
+	ExecClearTuple(scanslot);
 
-		/*
-		 * Get the next tuple from tuplestore.
-		 */
-		(void) tuplestore_gettupleslot(tstore,
-									   ScanDirectionIsForward(direction),
-									   false,
-									   scanslot);
-		return scanslot;
+	/* Call SRFs, as well as plain expressions, in per-tuple context */
+	oldcontext = MemoryContextSwitchTo(node->ss.ps.ps_ExprContext->ecxt_per_tuple_memory);
+	
+	/*
+	 * Check to see if we're still projecting out tuples from a previous scan
+	 * tuple (because there is a function-returning-set in the projection
+	 * expressions). If not, indicate we are finished now.
+	 */
+	if (!node->pending_srf_tuples)
+	{
+		alldone = true;
+		goto return_resultslot;
 	}
 
 	/*
-	 * Increment or decrement ordinal counter before checking for end-of-data,
-	 * so that we can move off either end of the result by 1 (and no more than
-	 * 1) without losing correct count.  See PortalRunSelect for why we can
+	 * Assume no further tuples are produced unless an ExprMultipleResult is
+	 * encountered from a set returning function.
+	 */
+	node->pending_srf_tuples = false;
+
+	/*
+	 * Increment ordinal counter before checking for end-of-data.
+	 * See PortalRunSelect for why we can
 	 * assume that we won't be called repeatedly in the end-of-data state.
 	 */
-	oldpos = node->ordinal;
-	if (ScanDirectionIsForward(direction))
-		node->ordinal++;
-	else
-		node->ordinal--;
+	node->ordinal++;
 
 	/*
 	 * Main loop over functions.
@@ -141,87 +104,29 @@ FunctionNext(FunctionScanState *node)
 	for (funcno = 0; funcno < node->nfuncs; funcno++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[funcno];
-		int			i;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
+		 * Read a tuples from function and put it in the scanslot.
 		 */
-		if (fs->tstore == NULL)
+		ExecMakeTableFunctionResult(fs,
+									node->ss.ps.ps_ExprContext,
+									node->argcontext,
+									scanslot,
+									att,
+									&doneCond);
+
+		if (doneCond != ExprEndResult)
 		{
-			fs->tstore =
-				ExecMakeTableFunctionResult(fs->setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											fs->tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(fs->tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
-		 *
-		 * If we have a rowcount for the function, and we know the previous
-		 * read position was out of bounds, don't try the read. This allows
-		 * backward scan to work when there are mixed row counts present.
-		 */
-		if (fs->rowcount != -1 && fs->rowcount < oldpos)
-			ExecClearTuple(fs->func_slot);
-		else
-			(void) tuplestore_gettupleslot(fs->tstore,
-										   ScanDirectionIsForward(direction),
-										   false,
-										   fs->func_slot);
-
-		if (TupIsNull(fs->func_slot))
-		{
-			/*
-			 * If we ran out of data for this function in the forward
-			 * direction then we now know how many rows it returned. We need
-			 * to know this in order to handle backwards scans. The row count
-			 * we store is actually 1+ the actual number, because we have to
-			 * position the tuplestore 1 off its end sometimes.
-			 */
-			if (ScanDirectionIsForward(direction) && fs->rowcount == -1)
-				fs->rowcount = node->ordinal;
-
-			/*
-			 * populate the result cols with nulls
-			 */
-			for (i = 0; i < fs->colcount; i++)
-			{
-				scanslot->tts_values[att] = (Datum) 0;
-				scanslot->tts_isnull[att] = true;
-				att++;
-			}
-		}
-		else
-		{
-			/*
-			 * we have a result, so just copy it to the result cols.
-			 */
-			slot_getallattrs(fs->func_slot);
-
-			for (i = 0; i < fs->colcount; i++)
-			{
-				scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-				scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
-				att++;
-			}
-
 			/*
 			 * We're not done until every function result is exhausted; we pad
 			 * the shorter results with nulls until then.
 			 */
 			alldone = false;
 		}
+		if (doneCond == ExprMultipleResult)
+			node->pending_srf_tuples = true;
+
+		att += fs->colcount;
 	}
 
 	/*
@@ -233,6 +138,9 @@ FunctionNext(FunctionScanState *node)
 		scanslot->tts_isnull[att] = false;
 	}
 
+return_resultslot:
+	MemoryContextSwitchTo(oldcontext);
+	
 	/*
 	 * If alldone, we just return the previously-cleared scanslot.  Otherwise,
 	 * finish creating the virtual tuple.
@@ -353,14 +261,6 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 										scanstate->ss.ps.ps_ExprContext,
 										&scanstate->ss.ps);
 
-		/*
-		 * Don't allocate the tuplestores; the actual calls to the functions
-		 * do that.  NULL means that we have not called the function yet (or
-		 * need to call it again after a rescan).
-		 */
-		fs->tstore = NULL;
-		fs->rowcount = -1;
-
 		/*
 		 * Now determine if the function returns a simple or composite type,
 		 * and build an appropriate tupdesc.  Note that in the composite case,
@@ -371,6 +271,12 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 											&funcrettype,
 											&tupdesc);
 
+		/*
+		 * FIXME: we set funcReturnsTuple, but it is a slightly different
+		 * check to what type_is_rowtype() executes. Don't know if it is
+		 * a problem.
+		 */
+		
 		if (functypclass == TYPEFUNC_COMPOSITE ||
 			functypclass == TYPEFUNC_COMPOSITE_DOMAIN)
 		{
@@ -379,6 +285,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			Assert(tupdesc->natts >= colcount);
 			/* Must copy it out of typcache for safety */
 			tupdesc = CreateTupleDescCopy(tupdesc);
+			fs->setexpr->funcReturnsTuple = true;
 		}
 		else if (functypclass == TYPEFUNC_SCALAR)
 		{
@@ -393,6 +300,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			TupleDescInitEntryCollation(tupdesc,
 										(AttrNumber) 1,
 										exprCollation(funcexpr));
+			fs->setexpr->funcReturnsTuple = false;
 		}
 		else if (functypclass == TYPEFUNC_RECORD)
 		{
@@ -407,6 +315,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			 * case it doesn't.)
 			 */
 			BlessTupleDesc(tupdesc);
+			fs->setexpr->funcReturnsTuple = true;
 		}
 		else
 		{
@@ -414,21 +323,10 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			elog(ERROR, "function in FROM has unsupported return type");
 		}
 
-		fs->tupdesc = tupdesc;
+		fs->setexpr->funcResultDesc = tupdesc;
 		fs->colcount = colcount;
 
-		/*
-		 * We only need separate slots for the function results if we are
-		 * doing ordinality or multiple functions; otherwise, we'll fetch
-		 * function results directly into the scan slot.
-		 */
-		if (!scanstate->simple)
-		{
-			fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-												   &TTSOpsMinimalTuple);
-		}
-		else
-			fs->func_slot = NULL;
+		fs->tupdesc_checked = false;
 
 		natts += colcount;
 		i++;
@@ -443,7 +341,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	if (scanstate->simple)
 	{
-		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].tupdesc);
+		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].setexpr->funcResultDesc);
 		scan_tupdesc->tdtypeid = RECORDOID;
 		scan_tupdesc->tdtypmod = -1;
 	}
@@ -458,7 +356,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 
 		for (i = 0; i < nfuncs; i++)
 		{
-			TupleDesc	tupdesc = scanstate->funcstates[i].tupdesc;
+			TupleDesc	tupdesc = scanstate->funcstates[i].setexpr->funcResultDesc;
 			int			colcount = scanstate->funcstates[i].colcount;
 			int			j;
 
@@ -497,6 +395,11 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	scanstate->ss.ps.qual =
 		ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);
+	
+	/*
+	 * Start out assuming there will be tuples returned.
+	 */
+	scanstate->pending_srf_tuples = true;
 
 	/*
 	 * Create a memory context that ExecMakeTableFunctionResult can use to
@@ -521,8 +424,6 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 void
 ExecEndFunctionScan(FunctionScanState *node)
 {
-	int			i;
-
 	/*
 	 * Free the exprcontext
 	 */
@@ -534,23 +435,6 @@ ExecEndFunctionScan(FunctionScanState *node)
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
-
-	/*
-	 * Release slots and tuplestore resources
-	 */
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-
-		if (fs->tstore != NULL)
-		{
-			tuplestore_end(node->funcstates[i].tstore);
-			fs->tstore = NULL;
-		}
-	}
 }
 
 /* ----------------------------------------------------------------
@@ -568,13 +452,6 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-	}
 
 	ExecScanReScan(&node->ss);
 
@@ -597,12 +474,11 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 			if (bms_overlap(chgparam, rtfunc->funcparams))
 			{
-				if (node->funcstates[i].tstore != NULL)
+				if (node->funcstates[i].setexpr->funcResultStore != NULL)
 				{
-					tuplestore_end(node->funcstates[i].tstore);
-					node->funcstates[i].tstore = NULL;
+					tuplestore_end(node->funcstates[i].setexpr->funcResultStore);
+					node->funcstates[i].setexpr->funcResultStore = NULL;
 				}
-				node->funcstates[i].rowcount = -1;
 			}
 			i++;
 		}
@@ -614,7 +490,16 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	/* Make sure we rewind any remaining tuplestores */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		if (node->funcstates[i].tstore != NULL)
-			tuplestore_rescan(node->funcstates[i].tstore);
+		if (node->funcstates[i].setexpr->funcResultStore != NULL)
+			tuplestore_rescan (node->funcstates[i].setexpr->funcResultStore);
+
+		/* No matter what, we renew the ResultSetInfo structure */
+		if (node->funcstates[i].setexpr->fcinfo != NULL)
+			node->funcstates[i].setexpr->fcinfo->resultinfo = NULL;
 	}
+	
+	/*
+	 * Start out assuming there will be tuples returned.
+	 */
+	node->pending_srf_tuples = true;
 }
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9be0b38..6d79394 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -405,11 +405,12 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
  */
 extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
 												 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
+extern void ExecMakeTableFunctionResult(struct FunctionScanPerFuncState *fs,
 													ExprContext *econtext,
 													MemoryContext argContext,
-													TupleDesc expectedDesc,
-													bool randomAccess);
+													TupleTableSlot *scanslot,
+													AttrNumber scanslot_off,
+													ExprDoneCond *isDone);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
 											   ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 4f7d60d..8fd572f 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,16 @@
 
 #include "nodes/execnodes.h"
 
+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+	SetExprState *setexpr;		/* state of the expression being evaluated */
+	int			colcount;		/* expected number of result columns */
+	bool		tupdesc_checked; /* has the return tupdesc been checked? */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9ac7bc1..0ff58fc 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1652,6 +1652,7 @@ typedef struct SubqueryScanState
  *		funcstates			per-function execution states (private in
  *							nodeFunctionscan.c)
  *		argcontext			memory context to evaluate function arguments in
+ *		pending_srf_tuples	still evaluating any SRFs?
  * ----------------
  */
 struct FunctionScanPerFuncState;
@@ -1666,6 +1667,7 @@ typedef struct FunctionScanState
 	int			nfuncs;
 	struct FunctionScanPerFuncState *funcstates;	/* array of length nfuncs */
 	MemoryContext argcontext;
+	bool		pending_srf_tuples;
 } FunctionScanState;
 
 /* ----------------
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 36a5929..6defb8a 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -2098,3 +2098,98 @@ select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 (0 rows)
 
 drop type rngfunc2;
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+create function rngfunc_vpc (n integer, out a text, out b text)
+  returns setof record
+  immutable
+  language sql
+  as $$ select 'foo ' || i, 'bar ' || i from generate_series(1,$1) i $$;
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+create function rngfunc_mat (n integer, out a text, out b text)
+  returns setof record
+  immutable
+  language plpgsql
+  as $$ begin return query select 'foo ' || i, 'bar ' || i from generate_series(1,$1) i; end; $$;
+-- A VPC SRF that runs to completion should spill buffers to disk.
+-- 
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large amount of data to force buffers to be written out. A
+-- positive result is the presence of temporary buffers.
+--
+-- FIXME: note that current implementation ExecMakeTableFunctionResult() does not
+-- create a backing store, even if the SRF returns a large resut set.
+--
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_vpc(100000) t;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Function Scan on pg_catalog.generate_series i (actual rows=100000 loops=1)
+   Output: ('foo '::text || (i.i)::text), ('bar '::text || (i.i)::text)
+   Function Call: generate_series(1, 100000)
+   Buffers: temp read=364 written=364
+(4 rows)
+
+reset work_mem;
+-- A VPC SRF that aborts early should not spill any buffers to disk.
+-- 
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large set that it would, if run to completion, force
+-- buffers to be written out. A positive result is the absence of
+-- any temporary buffers.
+--
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_vpc(100000) t limit 50;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Limit (actual rows=50 loops=1)
+   Output: (('foo '::text || (i.i)::text)), (('bar '::text || (i.i)::text))
+   ->  Function Scan on pg_catalog.generate_series i (actual rows=50 loops=1)
+         Output: ('foo '::text || (i.i)::text), ('bar '::text || (i.i)::text)
+         Function Call: generate_series(1, 100000)
+(5 rows)
+
+reset work_mem;
+-- A Marerialize SRF that runs to completion should spill buffers to disk.
+--
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large amount of data to force buffers to be written out. A
+-- positive result is the presence of temporary buffers.
+--
+-- FIXME: the buffers value seems to bounce between being shared and not,
+-- giving an unstable plan, which means this test shows fail. We might
+-- have to work out a more reliable way of testing.
+-- 
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_mat(100000) t;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Function Scan on public.rngfunc_mat t (actual rows=100000 loops=1)
+   Output: a, b
+   Function Call: rngfunc_mat(100000)
+   Buffers: temp read=364 written=364
+(4 rows)
+
+reset work_mem;
+-- A Marerialize SRF that aborts early should will still spill buffers to disk.
+--
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large amount of data to force buffers to be written out. A
+-- positive result is the presence of temporary buffers.
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_mat(100000) t limit 50;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Limit (actual rows=50 loops=1)
+   Output: a, b
+   Buffers: temp read=1 written=364
+   ->  Function Scan on public.rngfunc_mat t (actual rows=50 loops=1)
+         Output: a, b
+         Function Call: rngfunc_mat(100000)
+         Buffers: temp read=1 written=364
+(7 rows)
+
+reset work_mem;
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------
diff --git a/src/test/regress/sql/rangefuncs.sql b/src/test/regress/sql/rangefuncs.sql
index 5d29d2e..4ee3587 100644
--- a/src/test/regress/sql/rangefuncs.sql
+++ b/src/test/regress/sql/rangefuncs.sql
@@ -656,3 +656,70 @@ select *, row_to_json(u) from unnest(array[null::rngfunc2, (1,'foo')::rngfunc2,
 select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 
 drop type rngfunc2;
+
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+create function rngfunc_vpc (n integer, out a text, out b text)
+  returns setof record
+  immutable
+  language sql
+  as $$ select 'foo ' || i, 'bar ' || i from generate_series(1,$1) i $$;
+
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+create function rngfunc_mat (n integer, out a text, out b text)
+  returns setof record
+  immutable
+  language plpgsql
+  as $$ begin return query select 'foo ' || i, 'bar ' || i from generate_series(1,$1) i; end; $$;
+
+-- A VPC SRF that runs to completion should spill buffers to disk.
+-- 
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large amount of data to force buffers to be written out. A
+-- positive result is the presence of temporary buffers.
+--
+-- FIXME: note that current implementation ExecMakeTableFunctionResult() does not
+-- create a backing store, even if the SRF returns a large resut set.
+--
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_vpc(100000) t;
+reset work_mem;
+
+-- A VPC SRF that aborts early should not spill any buffers to disk.
+-- 
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large set that it would, if run to completion, force
+-- buffers to be written out. A positive result is the absence of
+-- any temporary buffers.
+--
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_vpc(100000) t limit 50;
+reset work_mem;
+
+-- A Marerialize SRF that runs to completion should spill buffers to disk.
+--
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large amount of data to force buffers to be written out. A
+-- positive result is the presence of temporary buffers.
+--
+-- FIXME: the buffers value seems to bounce between being shared and not,
+-- giving an unstable plan, which means this test shows fail. We might
+-- have to work out a more reliable way of testing.
+-- 
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_mat(100000) t;
+reset work_mem;
+
+-- A Marerialize SRF that aborts early should will still spill buffers to disk.
+--
+-- To illustrate this, we set work_mem unreasonably low, and emit a
+-- sufficiently large amount of data to force buffers to be written out. A
+-- positive result is the presence of temporary buffers.
+set work_mem='64kB';
+explain (verbose, analyze, buffers, costs off, timing off, summary off) select * from rngfunc_mat(100000) t limit 50;
+reset work_mem;
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------

#15

Dent John

denty@QQdd.eu

about 6 years ago

In reply to: Dent John (#14)

1 attachment(s)

Re: The flinfo->fn_extra question, from me this time.

Hi folks,

I’ve updated the patch, addressed the rescan issue, and restructured the tests.

I’ve taken a slightly different approach this time, re-using the (already pipeline-supporting) machinery of the Materialize node, and extended it to allow an SFRM_Materialize SRF to donate the tuplestore it returns. I feel this yields a better code structure, as well getting as more reuse.

It also opens up more informative and transparent EXPLAIN output. For example, the following shows Materialize explicitly, whereas previously a FunctionScan would have silently materialised the result of both generate_series() invocations.

postgres=# explain (analyze, costs off, timing off, summary off)
select * from generate_series(11,15) r, generate_series(11,14) s;
QUERY PLAN
------------------------------------------------------------------
Nested Loop (actual rows=20 loops=1)
-> Function Scan on generate_series s (actual rows=4 loops=1)
-> SRF Scan (actual rows=4 loops=1)
SFRM: ValuePerCall
-> Function Scan on generate_series r (actual rows=5 loops=4)
-> Materialize (actual rows=5 loops=4)
-> SRF Scan (actual rows=5 loops=1)
SFRM: ValuePerCall

I also thought again about when to materialise, and particularly Robert’s suggestion[1]/messages/by-id/CA+Tgmobw+PhNVciLesd-mQQ4As9D8L2-F7AiKqv465RhDkPf2Q@mail.gmail.com </messages/by-id/CA+Tgmobw+PhNVciLesd-mQQ4As9D8L2-F7AiKqv465RhDkPf2Q@mail.gmail.com> (which is in also this thread, but I didn’t originally understand the implication of). If I’m not wrong, between occasional explicit use of a Materialize node by the planner, and more careful observation of EXEC_FLAG_REWIND and EXEC_FLAG_BACKWARD in FunctionScan’s initialisation, we do actually have what is needed to pipeline without materialisation in at least some cases. There is not a mechanism to preferentially re-execute a SRF rather than materialise it, but because materialisation only seems to be necessary in the face of a join or a scrollable cursor, I’m not considering much of a problem anymore.

The EXPLAIN output needs a bit of work, costing is still a sore point, and it’s not quite as straight-line performant as my first attempt, as well as there undoubtedly being some unanticipated breakages and rough edges.

But the concept seems to work roughly as I intended (i.e., allowing FunctionScan to pipeline). Unless there are any objections, I will push it into the January commit fest for progressing.

(Revised patch attached.)

denty.

[1]: /messages/by-id/CA+Tgmobw+PhNVciLesd-mQQ4As9D8L2-F7AiKqv465RhDkPf2Q@mail.gmail.com </messages/by-id/CA+Tgmobw+PhNVciLesd-mQQ4As9D8L2-F7AiKqv465RhDkPf2Q@mail.gmail.com>

Attachments:

pipeline-functionscan-v4.patchapplication/octet-stream; name=pipeline-functionscan-v4.patch; x-unix-mode=0644Download

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 6bc4e4c..2333ce0 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
+#include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -927,3 +928,53 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)
 
 	return desc;
 }
+
+/*
+ * Check that function result tuple type (src_tupdesc) matches or can
+ * be considered to match what the query expects (dst_tupdesc). If
+ * they don't match, ereport.
+ *
+ * We really only care about number of attributes and data type.
+ * Also, we can ignore type mismatch on columns that are dropped in the
+ * destination type, so long as the physical storage matches.  This is
+ * helpful in some cases involving out-of-date cached plans.
+ */
+void
+tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
+{
+	int			i;
+
+	if (dst_tupdesc->natts != src_tupdesc->natts)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("function return row and query-specified return row do not match"),
+				 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
+								  "Returned row contains %d attributes, but query expects %d.",
+								  src_tupdesc->natts,
+								  src_tupdesc->natts, dst_tupdesc->natts)));
+
+	for (i = 0; i < dst_tupdesc->natts; i++)
+	{
+		Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
+		Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
+
+		if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
+			continue;			/* no worries */
+		if (!dattr->attisdropped)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("function return row and query-specified return row do not match"),
+					 errdetail("Returned type %s at ordinal position %d, but query expects %s.",
+							   format_type_be(sattr->atttypid),
+							   i + 1,
+							   format_type_be(dattr->atttypid))));
+
+		if (dattr->attlen != sattr->attlen ||
+			dattr->attalign != sattr->attalign)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("function return row and query-specified return row do not match"),
+					 errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
+							   i + 1)));
+	}
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9296963..0aea788 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,8 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "nodes/extensible.h"
@@ -1158,6 +1160,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
+		case T_SRFScanPlan:
+			pname = sname = "SRF Scan";
+			break;
 		case T_FunctionScan:
 			pname = sname = "Function Scan";
 			break;
@@ -1714,6 +1719,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				}
 			}
 			break;
+		case T_SRFScanPlan:
+			if (es->analyze)
+			{
+				SRFScanState *sss = (SRFScanState *) planstate;
+
+				if (sss->setexpr)
+				{
+					SetExprState *setexpr = (SetExprState *) sss->setexpr;
+					FunctionCallInfo fcinfo = setexpr->fcinfo;
+					ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+					if (rsinfo)
+					{
+						ExplainPropertyText("SFRM",
+							rsinfo->returnMode == SFRM_ValuePerCall ? "ValuePerCall" :
+								rsinfo->returnMode == SFRM_Materialize ? "Materialize" :
+									"Unknown",
+											es);
+
+						if (rsinfo->returnMode == SFRM_Materialize)
+							ExplainPropertyBool("Donated tuplestore",
+												setexpr->funcResultStoreDonated, es);
+					}
+				}
+			}
 		case T_FunctionScan:
 			if (es->verbose)
 			{
@@ -1947,6 +1977,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		IsA(plan, FunctionScan) ||
 		(IsA(planstate, CustomScanState) &&
 		 ((CustomScanState *) planstate)->custom_ps != NIL) ||
 		planstate->subPlan;
@@ -1971,6 +2002,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainNode(innerPlanState(planstate), ancestors,
 					"Inner", NULL, es);
 
+	/* FunctionScan subnodes */
+	if (IsA(planstate, FunctionScanState))
+		for(int i=0; i<((FunctionScanState *)planstate)->nfuncs; i++)
+		{
+			bool oldverbose = es->verbose;
+			es->verbose = false;
+			ExplainNode(&((FunctionScanState *)planstate)->funcstates[i].scanstate->ps,
+						ancestors, "Function", NULL, es);
+			es->verbose = oldverbose;
+		}
+
 	/* special child plans */
 	switch (nodeTag(plan))
 	{
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index cc09895..e946239 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -19,7 +19,7 @@ OBJS = execAmi.o execCurrent.o execExpr.o execExprInterp.o \
        execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
        nodeBitmapAnd.o nodeBitmapOr.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o \
-       nodeCustom.o nodeFunctionscan.o nodeGather.o \
+       nodeCustom.o nodeFunctionscan.o nodeSRFScan.o nodeGather.o \
        nodeHash.o nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o nodeGatherMerge.o \
        nodeMaterial.o nodeMergeAppend.o nodeMergejoin.o nodeModifyTable.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1f18e5d..a5fd2d3 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -25,6 +25,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -206,6 +207,10 @@ ExecReScan(PlanState *node)
 			ExecReScanFunctionScan((FunctionScanState *) node);
 			break;
 
+		case T_SRFScanState:
+			ExecReScanSRF((SRFScanState *) node);
+			break;
+
 		case T_TableFuncScanState:
 			ExecReScanTableFuncScan((TableFuncScanState *) node);
 			break;
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c227282..f548bd6 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -83,6 +83,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -253,6 +254,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_SRFScanPlan:
+			result = (PlanState *) ExecInitSRFScan((SRFScanPlan *) node,
+														 estate, eflags);
+			break;
+
 		case T_ValuesScan:
 			result = (PlanState *) ExecInitValuesScan((ValuesScan *) node,
 													  estate, eflags);
@@ -640,6 +646,10 @@ ExecEndNode(PlanState *node)
 			ExecEndFunctionScan((FunctionScanState *) node);
 			break;
 
+		case T_SRFScanState:
+			ExecEndSRFScan((SRFScanState *) node);
+			break;
+
 		case T_TableFuncScanState:
 			ExecEndTableFuncScan((TableFuncScanState *) node);
 			break;
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index c8a3efc..a3df390 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,9 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeMaterial.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -44,17 +47,17 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
 										ExprContext *econtext,
 										Tuplestorestate *resultStore,
 										TupleDesc resultDesc);
-static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
 
 
 /*
- * Prepare function call in FROM (ROWS FROM) for execution.
+ * Prepare function call in FROM (ROWS FROM) or targetlist SRF function
+ * call for execution for execution.
  *
- * This is used by nodeFunctionscan.c.
+ * This is used by nodeFunctionscan.c and nodeProjectSet.c.
  */
 SetExprState *
-ExecInitTableFunctionResult(Expr *expr,
-							ExprContext *econtext, PlanState *parent)
+ExecInitFunctionResultSet(Expr *expr,
+						  ExprContext *econtext, PlanState *parent)
 {
 	SetExprState *state = makeNode(SetExprState);
 
@@ -62,402 +65,52 @@ ExecInitTableFunctionResult(Expr *expr,
 	state->expr = expr;
 	state->func.fn_oid = InvalidOid;
 
-	/*
-	 * Normally the passed expression tree will be a FuncExpr, since the
-	 * grammar only allows a function call at the top level of a table
-	 * function reference.  However, if the function doesn't return set then
-	 * the planner might have replaced the function call via constant-folding
-	 * or inlining.  So if we see any other kind of expression node, execute
-	 * it via the general ExecEvalExpr() code.  That code path will not
-	 * support set-returning functions buried in the expression, though.
-	 */
 	if (IsA(expr, FuncExpr))
 	{
+		/*
+		 * For a FunctionScan or ProjectSet, the passed expression tree can be a
+		 * FuncExpr, since the grammar only allows a function call at the top
+		 * level of a table function reference.
+		 */
 		FuncExpr   *func = (FuncExpr *) expr;
 
 		state->funcReturnsSet = func->funcretset;
 		state->args = ExecInitExprList(func->args, parent);
-
 		init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, func->funcretset, false);
+				   econtext->ecxt_per_query_memory, func->funcretset, true);
 	}
-	else
-	{
-		state->elidedFuncState = ExecInitExpr(expr, parent);
-	}
-
-	return state;
-}
-
-/*
- *		ExecMakeTableFunctionResult
- *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
- *
- * This is used by nodeFunctionscan.c.
- */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
-							ExprContext *econtext,
-							MemoryContext argContext,
-							TupleDesc expectedDesc,
-							bool randomAccess)
-{
-	Tuplestorestate *tupstore = NULL;
-	TupleDesc	tupdesc = NULL;
-	Oid			funcrettype;
-	bool		returnsTuple;
-	bool		returnsSet = false;
-	FunctionCallInfo fcinfo;
-	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
-	HeapTupleData tmptup;
-	MemoryContext callerContext;
-	MemoryContext oldcontext;
-	bool		first_time = true;
-
-	callerContext = CurrentMemoryContext;
-
-	funcrettype = exprType((Node *) setexpr->expr);
-
-	returnsTuple = type_is_rowtype(funcrettype);
-
-	/*
-	 * Prepare a resultinfo node for communication.  We always do this even if
-	 * not expecting a set result, so that we can pass expectedDesc.  In the
-	 * generic-expression case, the expression doesn't actually get to see the
-	 * resultinfo, but set it up anyway because we use some of the fields as
-	 * our own state variables.
-	 */
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = expectedDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-	if (randomAccess)
-		rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
-	fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
-
-	/*
-	 * Normally the passed expression tree will be a SetExprState, since the
-	 * grammar only allows a function call at the top level of a table
-	 * function reference.  However, if the function doesn't return set then
-	 * the planner might have replaced the function call via constant-folding
-	 * or inlining.  So if we see any other kind of expression node, execute
-	 * it via the general ExecEvalExpr() code; the only difference is that we
-	 * don't get a chance to pass a special ReturnSetInfo to any functions
-	 * buried in the expression.
-	 */
-	if (!setexpr->elidedFuncState)
+	else if (IsA(expr, OpExpr))
 	{
 		/*
-		 * This path is similar to ExecMakeFunctionResultSet.
-		 */
-		returnsSet = setexpr->funcReturnsSet;
-		InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
-								 list_length(setexpr->args),
-								 setexpr->fcinfo->fncollation,
-								 NULL, (Node *) &rsinfo);
-
-		/*
-		 * Evaluate the function's argument list.
-		 *
-		 * We can't do this in the per-tuple context: the argument values
-		 * would disappear when we reset that context in the inner loop.  And
-		 * the caller's CurrentMemoryContext is typically a query-lifespan
-		 * context, so we don't want to leak memory there.  We require the
-		 * caller to pass a separate memory context that can be used for this,
-		 * and can be reset each time through to avoid bloat.
-		 */
-		MemoryContextReset(argContext);
-		oldcontext = MemoryContextSwitchTo(argContext);
-		ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-		MemoryContextSwitchTo(oldcontext);
-
-		/*
-		 * If function is strict, and there are any NULL arguments, skip
-		 * calling the function and act like it returned NULL (or an empty
-		 * set, in the returns-set case).
+		 * For ProjectSet, the expression node could be an OpExpr.
 		 */
-		if (setexpr->func.fn_strict)
-		{
-			int			i;
+		OpExpr	   *op = (OpExpr *) expr;
 
-			for (i = 0; i < fcinfo->nargs; i++)
-			{
-				if (fcinfo->args[i].isnull)
-					goto no_function_result;
-			}
-		}
+		state->funcReturnsSet = op->opretset;
+		state->args = ExecInitExprList(op->args, parent);
+		init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
+				   econtext->ecxt_per_query_memory, op->opretset, true);
 	}
 	else
 	{
-		/* Treat setexpr as a generic expression */
-		InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
-	}
-
-	/*
-	 * Switch to short-lived context for calling the function or expression.
-	 */
-	MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	/*
-	 * Loop to handle the ValuePerCall protocol (which is also the same
-	 * behavior needed in the generic ExecEvalExpr path).
-	 */
-	for (;;)
-	{
-		Datum		result;
-
-		CHECK_FOR_INTERRUPTS();
-
 		/*
-		 * reset per-tuple memory context before each call of the function or
-		 * expression. This cleans up any local memory the function may leak
-		 * when called.
+		 * However, again for FunctionScan, if the function doesn't return set
+		 * then the planner might have replaced the function call via constant-
+		 * folding or inlining.  So if we see any other kind of expression node,
+		 * execute it via the general ExecEvalExpr() code.  That code path will
+		 * not support set-returning functions buried in the expression, though.
 		 */
-		ResetExprContext(econtext);
-
-		/* Call the function or expression one time */
-		if (!setexpr->elidedFuncState)
-		{
-			pgstat_init_function_usage(fcinfo, &fcusage);
-
-			fcinfo->isnull = false;
-			rsinfo.isDone = ExprSingleResult;
-			result = FunctionCallInvoke(fcinfo);
-
-			pgstat_end_function_usage(&fcusage,
-									  rsinfo.isDone != ExprMultipleResult);
-		}
-		else
-		{
-			result =
-				ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-			rsinfo.isDone = ExprSingleResult;
-		}
-
-		/* Which protocol does function want to use? */
-		if (rsinfo.returnMode == SFRM_ValuePerCall)
-		{
-			/*
-			 * Check for end of result set.
-			 */
-			if (rsinfo.isDone == ExprEndResult)
-				break;
-
-			/*
-			 * If first time through, build tuplestore for result.  For a
-			 * scalar function result type, also make a suitable tupdesc.
-			 */
-			if (first_time)
-			{
-				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-				tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-				rsinfo.setResult = tupstore;
-				if (!returnsTuple)
-				{
-					tupdesc = CreateTemplateTupleDesc(1);
-					TupleDescInitEntry(tupdesc,
-									   (AttrNumber) 1,
-									   "column",
-									   funcrettype,
-									   -1,
-									   0);
-					rsinfo.setDesc = tupdesc;
-				}
-				MemoryContextSwitchTo(oldcontext);
-			}
-
-			/*
-			 * Store current resultset item.
-			 */
-			if (returnsTuple)
-			{
-				if (!fcinfo->isnull)
-				{
-					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-					if (tupdesc == NULL)
-					{
-						/*
-						 * This is the first non-NULL result from the
-						 * function.  Use the type info embedded in the
-						 * rowtype Datum to look up the needed tupdesc.  Make
-						 * a copy for the query.
-						 */
-						oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-						tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
-															  HeapTupleHeaderGetTypMod(td));
-						rsinfo.setDesc = tupdesc;
-						MemoryContextSwitchTo(oldcontext);
-					}
-					else
-					{
-						/*
-						 * Verify all later returned rows have same subtype;
-						 * necessary in case the type is RECORD.
-						 */
-						if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-							HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-							ereport(ERROR,
-									(errcode(ERRCODE_DATATYPE_MISMATCH),
-									 errmsg("rows returned by function are not all of the same row type")));
-					}
-
-					/*
-					 * tuplestore_puttuple needs a HeapTuple not a bare
-					 * HeapTupleHeader, but it doesn't need all the fields.
-					 */
-					tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-					tmptup.t_data = td;
-
-					tuplestore_puttuple(tupstore, &tmptup);
-				}
-				else
-				{
-					/*
-					 * NULL result from a tuple-returning function; expand it
-					 * to a row of all nulls.  We rely on the expectedDesc to
-					 * form such rows.  (Note: this would be problematic if
-					 * tuplestore_putvalues saved the tdtypeid/tdtypmod from
-					 * the provided descriptor, since that might not match
-					 * what we get from the function itself.  But it doesn't.)
-					 */
-					int			natts = expectedDesc->natts;
-					bool	   *nullflags;
-
-					nullflags = (bool *) palloc(natts * sizeof(bool));
-					memset(nullflags, true, natts * sizeof(bool));
-					tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-				}
-			}
-			else
-			{
-				/* Scalar-type case: just store the function result */
-				tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
-			}
-
-			/*
-			 * Are we done?
-			 */
-			if (rsinfo.isDone != ExprMultipleResult)
-				break;
-		}
-		else if (rsinfo.returnMode == SFRM_Materialize)
-		{
-			/* check we're on the same page as the function author */
-			if (!first_time || rsinfo.isDone != ExprSingleResult)
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-						 errmsg("table-function protocol for materialize mode was not followed")));
-			/* Done evaluating the set result */
-			break;
-		}
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-					 errmsg("unrecognized table-function returnMode: %d",
-							(int) rsinfo.returnMode)));
-
-		first_time = false;
-	}
-
-no_function_result:
-
-	/*
-	 * If we got nothing from the function (ie, an empty-set or NULL result),
-	 * we have to create the tuplestore to return, and if it's a
-	 * non-set-returning function then insert a single all-nulls row.  As
-	 * above, we depend on the expectedDesc to manufacture the dummy row.
-	 */
-	if (rsinfo.setResult == NULL)
-	{
-		MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-		tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-		rsinfo.setResult = tupstore;
-		if (!returnsSet)
-		{
-			int			natts = expectedDesc->natts;
-			bool	   *nullflags;
-
-			MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-			nullflags = (bool *) palloc(natts * sizeof(bool));
-			memset(nullflags, true, natts * sizeof(bool));
-			tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-		}
-	}
-
-	/*
-	 * If function provided a tupdesc, cross-check it.  We only really need to
-	 * do this for functions returning RECORD, but might as well do it always.
-	 */
-	if (rsinfo.setDesc)
-	{
-		tupledesc_match(expectedDesc, rsinfo.setDesc);
-
-		/*
-		 * If it is a dynamically-allocated TupleDesc, free it: it is
-		 * typically allocated in a per-query context, so we must avoid
-		 * leaking it across multiple usages.
-		 */
-		if (rsinfo.setDesc->tdrefcount == -1)
-			FreeTupleDesc(rsinfo.setDesc);
-	}
-
-	MemoryContextSwitchTo(callerContext);
-
-	/* All done, pass back the tuplestore */
-	return rsinfo.setResult;
-}
-
-
-/*
- * Prepare targetlist SRF function call for execution.
- *
- * This is used by nodeProjectSet.c.
- */
-SetExprState *
-ExecInitFunctionResultSet(Expr *expr,
-						  ExprContext *econtext, PlanState *parent)
-{
-	SetExprState *state = makeNode(SetExprState);
+		state->elidedFuncState = ExecInitExpr(expr, parent);
 
-	state->funcReturnsSet = true;
-	state->expr = expr;
-	state->func.fn_oid = InvalidOid;
+		MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
 
-	/*
-	 * Initialize metadata.  The expression node could be either a FuncExpr or
-	 * an OpExpr.
-	 */
-	if (IsA(expr, FuncExpr))
-	{
-		FuncExpr   *func = (FuncExpr *) expr;
+		/* By performing InitFunctionCallInfoData here, we avoid palloc0() */
+		state->fcinfo = palloc(SizeForFunctionCallInfo(list_length(state->args)));
 
-		state->args = ExecInitExprList(func->args, parent);
-		init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, true, true);
-	}
-	else if (IsA(expr, OpExpr))
-	{
-		OpExpr	   *op = (OpExpr *) expr;
+		MemoryContextSwitchTo(oldcontext);
 
-		state->args = ExecInitExprList(op->args, parent);
-		init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, true, true);
+		InitFunctionCallInfoData(*state->fcinfo, NULL, 0, InvalidOid, NULL, NULL);
 	}
-	else
-		elog(ERROR, "unrecognized node type: %d",
-			 (int) nodeTag(expr));
-
-	/* shouldn't get here unless the selected function returns set */
-	Assert(state->func.fn_retset);
 
 	return state;
 }
@@ -473,7 +126,7 @@ ExecInitFunctionResultSet(Expr *expr,
  * needs to live until all rows have been returned (i.e. *isDone set to
  * ExprEndResult or ExprSingleResult).
  *
- * This is used by nodeProjectSet.c.
+ * This is used by nodeProjectSet.c and nodeFunctionscan.c.
  */
 Datum
 ExecMakeFunctionResultSet(SetExprState *fcache,
@@ -486,7 +139,7 @@ ExecMakeFunctionResultSet(SetExprState *fcache,
 	Datum		result;
 	FunctionCallInfo fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
+	ReturnSetInfo *rsinfo;
 	bool		callit;
 	int			i;
 
@@ -539,6 +192,28 @@ restart:
 		return (Datum) 0;
 	}
 
+	/*
+	 * Prepare a resultinfo node for communication.  We always do this even if
+	 * not expecting a set result, so that we can pass expectedDesc.  In the
+	 * generic-expression case, the expression doesn't actually get to see the
+	 * resultinfo, but set it up anyway because we use some of the fields as
+	 * our own state variables.
+	 */
+	fcinfo = fcache->fcinfo;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	if (rsinfo == NULL)
+	{
+		MemoryContext oldContext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = fcache->funcResultDesc;
+		fcinfo->resultinfo = (Node *) rsinfo;
+
+		MemoryContextSwitchTo(oldContext);
+	}
+
 	/*
 	 * arguments is a list of expressions to evaluate before passing to the
 	 * function manager.  We skip the evaluation if it was already done in the
@@ -549,7 +224,6 @@ restart:
 	 * rows from this SRF have been returned, otherwise ValuePerCall SRFs
 	 * would reference freed memory after the first returned row.
 	 */
-	fcinfo = fcache->fcinfo;
 	arguments = fcache->args;
 	if (!fcache->setArgsValid)
 	{
@@ -557,6 +231,14 @@ restart:
 
 		ExecEvalFuncArgs(fcinfo, arguments, econtext);
 		MemoryContextSwitchTo(oldContext);
+
+		/* Reset the rsinfo structure */
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		/* note we do not set SFRM_Materialize_Random or _Preferred */
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		/* isDone is filled below */
+		rsinfo->setResult = NULL;
+		rsinfo->setDesc = NULL;
 	}
 	else
 	{
@@ -568,18 +250,6 @@ restart:
 	 * Now call the function, passing the evaluated parameter values.
 	 */
 
-	/* Prepare a resultinfo node for communication. */
-	fcinfo->resultinfo = (Node *) &rsinfo;
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = fcache->funcResultDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
-	/* note we do not set SFRM_Materialize_Random or _Preferred */
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
 	/*
 	 * If function is strict, and there are any NULL arguments, skip calling
 	 * the function.
@@ -599,16 +269,25 @@ restart:
 
 	if (callit)
 	{
-		pgstat_init_function_usage(fcinfo, &fcusage);
+		if (!fcache->elidedFuncState)
+		{
+			pgstat_init_function_usage(fcinfo, &fcusage);
 
-		fcinfo->isnull = false;
-		rsinfo.isDone = ExprSingleResult;
-		result = FunctionCallInvoke(fcinfo);
-		*isNull = fcinfo->isnull;
-		*isDone = rsinfo.isDone;
+			fcinfo->isnull = false;
+			rsinfo->isDone = ExprSingleResult;
+			result = FunctionCallInvoke(fcinfo);
+			*isNull = fcinfo->isnull;
+			*isDone = rsinfo->isDone;
 
-		pgstat_end_function_usage(&fcusage,
-								  rsinfo.isDone != ExprMultipleResult);
+			pgstat_end_function_usage(&fcusage,
+									  rsinfo->isDone != ExprMultipleResult);
+		}
+		else
+		{
+			result =
+				ExecEvalExpr(fcache->elidedFuncState, econtext, isNull);
+			*isDone = ExprSingleResult;
+		}
 	}
 	else
 	{
@@ -619,10 +298,31 @@ restart:
 	}
 
 	/* Which protocol does function want to use? */
-	if (rsinfo.returnMode == SFRM_ValuePerCall)
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
 	{
 		if (*isDone != ExprEndResult)
 		{
+			/*
+			 * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
+			 */
+			if (rsinfo->setDesc == NULL)
+			{
+				if (fcache->funcReturnsTuple && !*isNull)
+				{
+					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+					/*
+					 * This is the first non-NULL result from the
+					 * function.  Use the type info embedded in the
+					 * rowtype Datum to look up the needed tupdesc.  Make
+					 * a copy for the query.
+					 */
+					MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+					rsinfo->setDesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td), HeapTupleHeaderGetTypMod(td));
+					MemoryContextSwitchTo(oldcontext);
+				}
+			}
+
 			/*
 			 * Save the current argument values to re-use on the next call.
 			 */
@@ -640,21 +340,34 @@ restart:
 			}
 		}
 	}
-	else if (rsinfo.returnMode == SFRM_Materialize)
+	else if (rsinfo->returnMode == SFRM_Materialize)
 	{
 		/* check we're on the same page as the function author */
-		if (rsinfo.isDone != ExprSingleResult)
+		if (rsinfo->isDone != ExprSingleResult)
 			ereport(ERROR,
 					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 					 errmsg("table-function protocol for materialize mode was not followed")));
-		if (rsinfo.setResult != NULL)
+		if (rsinfo->setResult != NULL)
 		{
 			/* prepare to return values from the tuplestore */
 			ExecPrepareTuplestoreResult(fcache, econtext,
-										rsinfo.setResult,
-										rsinfo.setDesc);
-			/* loop back to top to start returning from tuplestore */
-			goto restart;
+										rsinfo->setResult,
+										rsinfo->setDesc);
+
+			/*
+			 * If we are being invoked by a Materialize node, attempt
+			 * to donate the returned tuplstore to it.
+			 */
+			if (ExecSRFDonateResultTuplestore(fcache))
+			{
+				*isDone = ExprMultipleResult;
+				return 0;
+			}
+			else
+			{
+				/* loop back to top to start returning from tuplestore */
+				goto restart;
+			}
 		}
 		/* if setResult was left null, treat it as empty set */
 		*isDone = ExprEndResult;
@@ -665,7 +378,7 @@ restart:
 		ereport(ERROR,
 				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 				 errmsg("unrecognized table-function returnMode: %d",
-						(int) rsinfo.returnMode)));
+						(int) rsinfo->returnMode)));
 
 	return result;
 }
@@ -712,6 +425,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
 							 numargs,
 							 input_collation, NULL, NULL);
+	sexpr->fcinfo->resultinfo = NULL;
 
 	/* If function returns set, check if that's allowed by caller */
 	if (sexpr->func.fn_retset && !allowSRF)
@@ -782,6 +496,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	sexpr->funcResultStore = NULL;
 	sexpr->funcResultSlot = NULL;
 	sexpr->shutdown_reg = false;
+	sexpr->funcResultStoreDonationEnabled = false;
 }
 
 /*
@@ -792,6 +507,7 @@ static void
 ShutdownSetExpr(Datum arg)
 {
 	SetExprState *sexpr = castNode(SetExprState, DatumGetPointer(arg));
+	ReturnSetInfo *rsinfo = castNode(ReturnSetInfo, sexpr->fcinfo->resultinfo);
 
 	/* If we have a slot, make sure it's let go of any tuplestore pointer */
 	if (sexpr->funcResultSlot)
@@ -802,6 +518,13 @@ ShutdownSetExpr(Datum arg)
 		tuplestore_end(sexpr->funcResultStore);
 	sexpr->funcResultStore = NULL;
 
+	/* Release the ReturnSetInfo structure */
+	if (rsinfo != NULL)
+	{
+		pfree(rsinfo);
+		sexpr->fcinfo->resultinfo = NULL;
+	}
+
 	/* Clear any active set-argument state */
 	sexpr->setArgsValid = false;
 
@@ -910,53 +633,3 @@ ExecPrepareTuplestoreResult(SetExprState *sexpr,
 		sexpr->shutdown_reg = true;
 	}
 }
-
-/*
- * Check that function result tuple type (src_tupdesc) matches or can
- * be considered to match what the query expects (dst_tupdesc). If
- * they don't match, ereport.
- *
- * We really only care about number of attributes and data type.
- * Also, we can ignore type mismatch on columns that are dropped in the
- * destination type, so long as the physical storage matches.  This is
- * helpful in some cases involving out-of-date cached plans.
- */
-static void
-tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
-{
-	int			i;
-
-	if (dst_tupdesc->natts != src_tupdesc->natts)
-		ereport(ERROR,
-				(errcode(ERRCODE_DATATYPE_MISMATCH),
-				 errmsg("function return row and query-specified return row do not match"),
-				 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
-								  "Returned row contains %d attributes, but query expects %d.",
-								  src_tupdesc->natts,
-								  src_tupdesc->natts, dst_tupdesc->natts)));
-
-	for (i = 0; i < dst_tupdesc->natts; i++)
-	{
-		Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
-		Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
-
-		if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
-			continue;			/* no worries */
-		if (!dattr->attisdropped)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("function return row and query-specified return row do not match"),
-					 errdetail("Returned type %s at ordinal position %d, but query expects %s.",
-							   format_type_be(sattr->atttypid),
-							   i + 1,
-							   format_type_be(dattr->atttypid))));
-
-		if (dattr->attlen != sattr->attlen ||
-			dattr->attalign != sattr->attalign)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("function return row and query-specified return row do not match"),
-					 errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
-							   i + 1)));
-	}
-}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index 0370f2e..f616a00 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -1,7 +1,23 @@
 /*-------------------------------------------------------------------------
  *
  * nodeFunctionscan.c
- *	  Support routines for scanning RangeFunctions (functions in rangetable).
+ *	  Coordinates a scan over PL functions. It supports several use cases:
+ *
+ *      - single function scan, and multiple functions in ROWS FROM;
+ *      - SRFs and regular functions;
+ *      - tuple- and scalar-returning functions;
+ *      - it will materialise if eflags call for it;
+ *      - if possible, it will pipeline it’s output;
+ *      - it avoids double-materialisation in case of SFRM_Materialize.
+ *
+ *    To achieve these, it depends upon the Materialize (for materialisation
+ *    and pipelining) and SRFScan (for SRF handling, and tuple expansion,
+ *    and double-materialisation avoidance) nodes, and the actual function
+ *    invocation (for SRF- and regular functions alike) is done in execSRF.c.
+ *
+ *    The Planner knows nothing of the Materialize and SRFScan structures.
+ *    They are constructed by the Executor at execution time, and are reported
+ *    in the EXPLAIN output.
  *
  * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -24,26 +40,15 @@
 
 #include "catalog/pg_type.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
-
-
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-	SetExprState *setexpr;		/* state of the expression being evaluated */
-	TupleDesc	tupdesc;		/* desc of the function result type */
-	int			colcount;		/* expected number of result columns */
-	Tuplestorestate *tstore;	/* holds the function result set */
-	int64		rowcount;		/* # of rows in result set, -1 if not known */
-	TupleTableSlot *func_slot;	/* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
-static TupleTableSlot *FunctionNext(FunctionScanState *node);
+#include "utils/syscache.h"
 
 
 /* ----------------------------------------------------------------
@@ -82,37 +87,22 @@ FunctionNext(FunctionScanState *node)
 		 * into the scan result slot. No need to update ordinality or
 		 * rowcounts either.
 		 */
-		Tuplestorestate *tstore = node->funcstates[0].tstore;
+		TupleTableSlot *rs = node->funcstates[0].scanstate->ps.ps_ResultTupleSlot;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
+		 * Get the next tuple from the Scan node.
+		 *
+		 * If we have a rowcount for the function, and we know the previous
+		 * read position was out of bounds, don't try the read. This allows
+		 * backward scan to work when there are mixed row counts present.
 		 */
-		if (tstore == NULL)
-		{
-			node->funcstates[0].tstore = tstore =
-				ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											node->funcstates[0].tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
+		rs = ExecProcNode(&node->funcstates[0].scanstate->ps);
 
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(tstore);
-		}
+		if (TupIsNull(rs))
+			return NULL;
+
+		ExecCopySlot(scanslot, rs);
 
-		/*
-		 * Get the next tuple from tuplestore.
-		 */
-		(void) tuplestore_gettupleslot(tstore,
-									   ScanDirectionIsForward(direction),
-									   false,
-									   scanslot);
 		return scanslot;
 	}
 
@@ -141,46 +131,22 @@ FunctionNext(FunctionScanState *node)
 	for (funcno = 0; funcno < node->nfuncs; funcno++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[funcno];
+		TupleTableSlot *func_slot = fs->scanstate->ps.ps_ResultTupleSlot;
 		int			i;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
-		 */
-		if (fs->tstore == NULL)
-		{
-			fs->tstore =
-				ExecMakeTableFunctionResult(fs->setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											fs->tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(fs->tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
+		 * Get the next tuple from the Scan node.
 		 *
 		 * If we have a rowcount for the function, and we know the previous
 		 * read position was out of bounds, don't try the read. This allows
 		 * backward scan to work when there are mixed row counts present.
 		 */
 		if (fs->rowcount != -1 && fs->rowcount < oldpos)
-			ExecClearTuple(fs->func_slot);
+			ExecClearTuple(func_slot);
 		else
-			(void) tuplestore_gettupleslot(fs->tstore,
-										   ScanDirectionIsForward(direction),
-										   false,
-										   fs->func_slot);
+			func_slot = ExecProcNode(&fs->scanstate->ps);
 
-		if (TupIsNull(fs->func_slot))
+		if (TupIsNull(func_slot))
 		{
 			/*
 			 * If we ran out of data for this function in the forward
@@ -207,12 +173,12 @@ FunctionNext(FunctionScanState *node)
 			/*
 			 * we have a result, so just copy it to the result cols.
 			 */
-			slot_getallattrs(fs->func_slot);
+			slot_getallattrs(func_slot);
 
 			for (i = 0; i < fs->colcount; i++)
 			{
-				scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-				scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
+				scanslot->tts_values[att] = func_slot->tts_values[i];
+				scanslot->tts_isnull[att] = func_slot->tts_isnull[i];
 				att++;
 			}
 
@@ -272,6 +238,53 @@ ExecFunctionScan(PlanState *pstate)
 					(ExecScanRecheckMtd) FunctionRecheck);
 }
 
+/*
+ * Helper function to build target list, which is required in order for
+ * normal processing of ExecInit, from the tupdesc.
+ */
+static void
+build_tlist_for_tupdesc(TupleDesc tupdesc, int colcount,
+						List **mat_tlist, List **scan_tlist)
+{
+	Form_pg_attribute attr;
+	int attno;
+
+	for (attno = 1; attno <= colcount; attno++)
+	{
+		attr = TupleDescAttr(tupdesc, attno - 1);
+
+		if (attr->attisdropped)
+		{
+			*scan_tlist = lappend(*scan_tlist,
+							  makeTargetEntry((Expr *)
+								  makeConst(INT2OID, -1,
+											0,
+											attr->attlen,
+											0 /* value */, true /* isnull */,
+											true),
+								  attno, attr->attname.data,
+								  attr->attisdropped));
+			*mat_tlist = lappend(*mat_tlist,
+							 makeTargetEntry((Expr *)
+								 makeVar(1 /* varno */, attno, INT2OID, -1, 0, 0),
+								 attno, attr->attname.data, attr->attisdropped));
+		}
+		else
+		{
+			*scan_tlist = lappend(*scan_tlist,
+							  makeTargetEntry((Expr *)
+								  makeVar(1 /* varno */, attno, attr->atttypid,
+										  attr->atttypmod, attr->attcollation, 0),
+								  attno, attr->attname.data, attr->attisdropped));
+			*mat_tlist = lappend(*mat_tlist,
+							 makeTargetEntry((Expr *)
+								 makeVar(1 /* varno */, attno, attr->atttypid,
+										 attr->atttypmod, attr->attcollation, 0),
+								 attno, attr->attname.data, attr->attisdropped));
+		}
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitFunctionScan
  * ----------------------------------------------------------------
@@ -285,6 +298,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	int			i,
 				natts;
 	ListCell   *lc;
+	bool 		needs_material;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & EXEC_FLAG_MARK));
@@ -315,6 +329,9 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	else
 		scanstate->simple = false;
 
+	/* Only add a Mterialize node if required */
+	needs_material = eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD);
+
 	/*
 	 * Ordinal 0 represents the "before the first row" position.
 	 *
@@ -347,23 +364,15 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 		TypeFuncClass functypclass;
 		Oid			funcrettype;
 		TupleDesc	tupdesc;
+		List /* TargetEntry* */ *mat_tlist = NIL;
+		List /* TargetEntry* */ *scan_tlist = NIL;
+		bool funcReturnsTuple;
 
-		fs->setexpr =
-			ExecInitTableFunctionResult((Expr *) funcexpr,
-										scanstate->ss.ps.ps_ExprContext,
-										&scanstate->ss.ps);
-
-		/*
-		 * Don't allocate the tuplestores; the actual calls to the functions
-		 * do that.  NULL means that we have not called the function yet (or
-		 * need to call it again after a rescan).
-		 */
-		fs->tstore = NULL;
 		fs->rowcount = -1;
 
 		/*
 		 * Now determine if the function returns a simple or composite type,
-		 * and build an appropriate tupdesc.  Note that in the composite case,
+		 * and build an appropriate targetlist.  Note that in the composite case,
 		 * the function may now return more columns than it did when the plan
 		 * was made; we have to ignore any columns beyond "colcount".
 		 */
@@ -379,6 +388,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			Assert(tupdesc->natts >= colcount);
 			/* Must copy it out of typcache for safety */
 			tupdesc = CreateTupleDescCopy(tupdesc);
+			funcReturnsTuple = true;
 		}
 		else if (functypclass == TYPEFUNC_SCALAR)
 		{
@@ -393,6 +403,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			TupleDescInitEntryCollation(tupdesc,
 										(AttrNumber) 1,
 										exprCollation(funcexpr));
+			funcReturnsTuple = false;
 		}
 		else if (functypclass == TYPEFUNC_RECORD)
 		{
@@ -407,6 +418,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			 * case it doesn't.)
 			 */
 			BlessTupleDesc(tupdesc);
+			funcReturnsTuple = true;
 		}
 		else
 		{
@@ -414,21 +426,45 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			elog(ERROR, "function in FROM has unsupported return type");
 		}
 
-		fs->tupdesc = tupdesc;
 		fs->colcount = colcount;
 
-		/*
-		 * We only need separate slots for the function results if we are
-		 * doing ordinality or multiple functions; otherwise, we'll fetch
-		 * function results directly into the scan slot.
-		 */
-		if (!scanstate->simple)
+		/* Expand tupdesc into targetlists for the scan nodes */
+		build_tlist_for_tupdesc(tupdesc, colcount, &mat_tlist, &scan_tlist);
+
+		SRFScanPlan *srfscan = makeNode(SRFScanPlan);
+		srfscan->funcexpr = funcexpr;
+		srfscan->rtfunc = (Node *) rtfunc;
+		srfscan->plan.targetlist = scan_tlist;
+		srfscan->plan.extParam = rtfunc->funcparams;
+		srfscan->plan.allParam = rtfunc->funcparams;
+		srfscan->funcResultDesc = tupdesc;
+		srfscan->funcReturnsTuple = funcReturnsTuple;
+		Plan *scan = &srfscan->plan;
+
+		if (needs_material)
 		{
-			fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-												   &TTSOpsMinimalTuple);
+			Material *fscan = makeNode(Material);
+			fscan->plan.lefttree = scan;
+			fscan->plan.targetlist = mat_tlist;
+			fscan->plan.extParam = rtfunc->funcparams;
+			fscan->plan.allParam = rtfunc->funcparams;
+			scan = &fscan->plan;
+		}
+
+		fs->scanstate = (ScanState *) ExecInitNode (scan, estate, eflags);
+
+		if (needs_material)
+		{
+			/*
+			 * Tell the SRFScan about its parent, so that it can donate
+			 * the SRF's tuplestore if the SRF uses SFRM_Materialize.
+			 */
+			MaterialState *ms = (MaterialState *) fs->scanstate;
+			SRFScanState *sss = (SRFScanState *) outerPlanState(ms);
+
+			sss->setexpr->funcResultStoreDonationEnabled = true;
+			sss->setexpr->funcResultStoreDonationTarget = &ms->ss.ps;
 		}
-		else
-			fs->func_slot = NULL;
 
 		natts += colcount;
 		i++;
@@ -443,7 +479,11 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	if (scanstate->simple)
 	{
-		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].tupdesc);
+		SRFScanState *sss = IsA(scanstate->funcstates[0].scanstate, MaterialState) ?
+				(SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[0].scanstate) :
+				(SRFScanState *) scanstate->funcstates[0].scanstate;
+
+		scan_tupdesc = CreateTupleDescCopy(sss->setexpr->funcResultDesc);
 		scan_tupdesc->tdtypeid = RECORDOID;
 		scan_tupdesc->tdtypmod = -1;
 	}
@@ -458,8 +498,12 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 
 		for (i = 0; i < nfuncs; i++)
 		{
-			TupleDesc	tupdesc = scanstate->funcstates[i].tupdesc;
-			int			colcount = scanstate->funcstates[i].colcount;
+			SRFScanState *sss = IsA(scanstate->funcstates[i].scanstate, MaterialState) ?
+					(SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[i].scanstate) :
+					(SRFScanState *) scanstate->funcstates[i].scanstate;
+
+			TupleDesc	tupdesc = sss->setexpr->funcResultDesc;
+			int			colcount = sss->colcount;
 			int			j;
 
 			for (j = 1; j <= colcount; j++)
@@ -536,20 +580,11 @@ ExecEndFunctionScan(FunctionScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/*
-	 * Release slots and tuplestore resources
+	 * Release the Material scan resources
 	 */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-
-		if (fs->tstore != NULL)
-		{
-			tuplestore_end(node->funcstates[i].tstore);
-			fs->tstore = NULL;
-		}
+		ExecEndNode(&node->funcstates[i].scanstate->ps);
 	}
 }
 
@@ -568,23 +603,12 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-	}
 
 	ExecScanReScan(&node->ss);
 
 	/*
-	 * Here we have a choice whether to drop the tuplestores (and recompute
-	 * the function outputs) or just rescan them.  We must recompute if an
-	 * expression contains changed parameters, else we rescan.
-	 *
-	 * XXX maybe we should recompute if the function is volatile?  But in
-	 * general the executor doesn't conditionalize its actions on that.
+	 * We must recompute if an
+	 * expression contains changed parameters.
 	 */
 	if (chgparam)
 	{
@@ -597,11 +621,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 			if (bms_overlap(chgparam, rtfunc->funcparams))
 			{
-				if (node->funcstates[i].tstore != NULL)
-				{
-					tuplestore_end(node->funcstates[i].tstore);
-					node->funcstates[i].tstore = NULL;
-				}
+				UpdateChangedParamSet(&node->funcstates[i].scanstate->ps,
+									  node->ss.ps.chgParam);
+
 				node->funcstates[i].rowcount = -1;
 			}
 			i++;
@@ -611,10 +633,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	/* Reset ordinality counter */
 	node->ordinal = 0;
 
-	/* Make sure we rewind any remaining tuplestores */
+	/* Rescan them all */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		if (node->funcstates[i].tstore != NULL)
-			tuplestore_rescan(node->funcstates[i].tstore);
+		ExecReScan(&node->funcstates[i].scanstate->ps);
 	}
 }
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index cc93bbe..93271d1 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -45,9 +45,12 @@ ExecMaterial(PlanState *pstate)
 	Tuplestorestate *tuplestorestate;
 	bool		eof_tuplestore;
 	TupleTableSlot *slot;
+	bool 		first_time = true;
 
 	CHECK_FOR_INTERRUPTS();
 
+restart:
+
 	/*
 	 * get state info from node
 	 */
@@ -126,12 +129,24 @@ ExecMaterial(PlanState *pstate)
 		PlanState  *outerNode;
 		TupleTableSlot *outerslot;
 
+		if (!first_time)
+			ereport(ERROR,
+					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+					 errmsg("attempt to scan donated result store failed")));
+
 		/*
 		 * We can only get here with forward==true, so no need to worry about
 		 * which direction the subplan will go.
 		 */
 		outerNode = outerPlanState(node);
 		outerslot = ExecProcNode(outerNode);
+
+		if (node->tuplestore_donated)
+		{
+			first_time = false;
+			goto restart;
+		}
+
 		if (TupIsNull(outerslot))
 		{
 			node->eof_underlying = true;
@@ -196,6 +211,7 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
 
 	matstate->eof_underlying = false;
 	matstate->tuplestorestate = NULL;
+	matstate->tuplestore_donated = false;
 
 	/*
 	 * Miscellaneous initialization
@@ -346,6 +362,7 @@ ExecReScanMaterial(MaterialState *node)
 		{
 			tuplestore_end(node->tuplestorestate);
 			node->tuplestorestate = NULL;
+			node->tuplestore_donated = false;
 			if (outerPlan->chgParam == NULL)
 				ExecReScan(outerPlan);
 			node->eof_underlying = false;
@@ -361,8 +378,30 @@ ExecReScanMaterial(MaterialState *node)
 		 * if chgParam of subnode is not null then plan will be re-scanned by
 		 * first ExecProcNode.
 		 */
+		node->tuplestore_donated = false;
 		if (outerPlan->chgParam == NULL)
 			ExecReScan(outerPlan);
 		node->eof_underlying = false;
 	}
 }
+
+void
+ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store)
+{
+	if (!node->tuplestore_donated)
+	{
+		if (node->tuplestorestate)
+		{
+			tuplestore_end(node->tuplestorestate);
+		}
+
+		node->tuplestorestate = store;
+		node->tuplestore_donated = true;
+		node->eof_underlying = true;
+	}
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+				 errmsg("Result tuplestore donated more than once")));
+}
+
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index fc6667e..84cacb3 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -293,9 +293,16 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 * such parameters, then there is no point in REWIND support at all in the
 	 * inner child, because it will always be rescanned with fresh parameter
 	 * values.
+	 *
+	 * The exception to this simple rule is a ROWS FROM function scan where it
+	 * is possible that only some of the inolved functions are affected by the
+	 * parameters. In this case, we blanket request support for REWIND. A more
+	 * intelligent approch would request REWIND only for nodes unaffected by
+	 * the parameters, but we aren't so intelligent yet.
 	 */
 	outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
-	if (node->nestParams == NIL)
+	if (node->nestParams == NIL ||
+		IsA(innerPlan(node), FunctionScan))
 		eflags |= EXEC_FLAG_REWIND;
 	else
 		eflags &= ~EXEC_FLAG_REWIND;
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 277d278..8f5aa86 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -283,6 +283,7 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
 			state->elems[off] = (Node *)
 				ExecInitFunctionResultSet(expr, state->ps.ps_ExprContext,
 										  &state->ps);
+			Assert (((SetExprState *) state->elems[off])->funcReturnsSet);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeSRFScan.c b/src/backend/executor/nodeSRFScan.c
new file mode 100644
index 0000000..4d61a95
--- /dev/null
+++ b/src/backend/executor/nodeSRFScan.c
@@ -0,0 +1,262 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSRFScan.c
+ *	  Coordinates a scan over a single SRF function, or a non-SRF as if it
+ *    were an SRF returning a single row.
+ *
+ *    SRFScan expands the function’s output if it returns a tuple. If the
+ *    SRF uses SFRM_Materialize, it will donate the returned tuplestore to
+ *    the parent Materialize node, if there is one, to avoid double-
+ *    materialisation.
+ *
+ *    The Planner knows nothing of the SRFScan structure. It is constructed
+ *    by the Executor at execution time, and is reported in the EXPLAIN
+ *    output.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeSRFScan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
+#include "funcapi.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+static TupleTableSlot *			/* result tuple from subplan */
+ExecSRF(PlanState *node)
+{
+	SRFScanState *pstate = (SRFScanState *) node;
+	ExprContext *econtext = pstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *resultSlot = pstate->ss.ps.ps_ResultTupleSlot;
+	Datum result;
+	ExprDoneCond *isdone = &pstate->elemdone;
+	bool	   isnull;
+	SetExprState *setexpr = pstate->setexpr;
+	FunctionCallInfo fcinfo;
+	ReturnSetInfo *rsinfo;
+
+	/* We only support forward scans. */
+	Assert(ScanDirectionIsForward(estate->es_direction));
+
+	ExecClearTuple(resultSlot);
+
+	/*
+	 * Only execute something if we are not already complete...
+	 */
+	if (*isdone == ExprEndResult)
+		return NULL;
+
+	/*
+	 * Evaluate SRF - possibly continuing previously started output.
+	 */
+	result = ExecMakeFunctionResultSet((SetExprState *) setexpr,
+										econtext, pstate->argcontext,
+										&isnull, isdone);
+
+	if (*isdone == ExprEndResult)
+		return NULL;
+
+	fcinfo = setexpr->fcinfo;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	/* Have we donated the result store? */
+	if (setexpr->funcResultStoreDonated)
+		return 0;
+
+	/*
+	 * If we obtained a tupdesc, check it is appropriate, but not in
+	 * the case of SFRM_Materialize becuase is will have been checked
+	 * already.
+	 */
+	if (!pstate->tupdesc_checked &&
+		setexpr->funcReturnsTuple &&
+		rsinfo->returnMode != SFRM_Materialize &&
+		rsinfo->setDesc && setexpr->funcResultDesc)
+	{
+		tupledesc_match (setexpr->funcResultDesc, rsinfo->setDesc);
+		pstate->tupdesc_checked = true;
+	}
+
+	/*
+	 * If returned a tupple, expand into multiple columns.
+	 */
+	if (setexpr->funcReturnsTuple)
+	{
+		if (!isnull)
+		{
+			HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+			/*
+			 * In SFRM_Materialize mode, the type will have been checked
+			 * already.
+			 */
+			if (rsinfo->returnMode != SFRM_Materialize)
+			{
+				/*
+				 * Verify all later returned rows have same subtype;
+				 * necessary in case the type is RECORD.
+				 */
+				if (HeapTupleHeaderGetTypeId(td) != rsinfo->setDesc->tdtypeid ||
+					HeapTupleHeaderGetTypMod(td) != rsinfo->setDesc->tdtypmod)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATATYPE_MISMATCH),
+							 errmsg("rows returned by function are not all of the same row type")));
+			}
+
+			/*
+			 * tuplestore_puttuple needs a HeapTuple not a bare
+			 * HeapTupleHeader, but it doesn't need all the fields.
+			 */
+			HeapTupleData tmptup;
+			tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+			tmptup.t_data = td;
+
+			heap_deform_tuple (&tmptup, setexpr->funcResultDesc,
+							   resultSlot->tts_values,
+							   resultSlot->tts_isnull);
+		}
+		else
+		{
+			/*
+			 * populate the result cols with nulls
+			 */
+			int i;
+			for (i = 0; i < pstate->colcount; i++)
+			{
+				resultSlot->tts_values[i] = (Datum) 0;
+				resultSlot->tts_isnull[i] = true;
+			}
+		}
+	}
+	else
+	{
+		/* Scalar-type case: just store the function result */
+		resultSlot->tts_values[0] = result;
+		resultSlot->tts_isnull[0] = isnull;
+	}
+
+	/*
+	 * If we achieved obtained a single result, don't execute again.
+	 */
+	if (*isdone == ExprSingleResult)
+		*isdone = ExprEndResult;
+
+	ExecStoreVirtualTuple(resultSlot);
+	return resultSlot;
+}
+
+SRFScanState *
+ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags)
+{
+	RangeTblFunction *rtfunc = (RangeTblFunction *) node->rtfunc;
+
+	SRFScanState *srfstate;
+
+	/*
+	 * SRFScan should not have any children.
+	 */
+	Assert(outerPlan(node) == NULL);
+	Assert(innerPlan(node) == NULL);
+
+	/*
+	 * create state structure
+	 */
+	srfstate = makeNode(SRFScanState);
+	srfstate->ss.ps.plan = (Plan *) node;
+	srfstate->ss.ps.state = estate;
+	srfstate->ss.ps.ExecProcNode = ExecSRF;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &srfstate->ss.ps);
+
+	srfstate->setexpr =
+		ExecInitFunctionResultSet((Expr *) node->funcexpr,
+								  srfstate->ss.ps.ps_ExprContext,
+								  &srfstate->ss.ps);
+
+	srfstate->setexpr->funcResultDesc = node->funcResultDesc;
+	srfstate->setexpr->funcReturnsTuple = node->funcReturnsTuple;
+
+	srfstate->colcount = rtfunc->funccolcount;
+
+	srfstate->tupdesc_checked = false;
+
+	/* Start with the assumption we will get some result. */
+	srfstate->elemdone = ExprSingleResult;
+
+	/*
+	 * Initialize result type and slot. No need to initialize projection info
+	 * because this node doesn't do projections (ps_ResultTupleSlot).
+	 *
+	 * material nodes only return tuples from their materialized relation.
+	 */
+	ExecInitScanTupleSlot(estate, &srfstate->ss, srfstate->setexpr->funcResultDesc,
+						  &TTSOpsMinimalTuple);
+	ExecInitResultTupleSlotTL(&srfstate->ss.ps, &TTSOpsMinimalTuple);
+	ExecAssignScanProjectionInfo(&srfstate->ss);
+
+	/*
+	 * Create a memory context that ExecMakeFunctionResultSet can use to
+	 * evaluate function arguments in.  We can't use the per-tuple context for
+	 * this because it gets reset too often; but we don't want to leak
+	 * evaluation results into the query-lifespan context either.  We use one
+	 * context for the arguments of all tSRFs, as they have roughly equivalent
+	 * lifetimes.
+	 */
+	srfstate->argcontext = AllocSetContextCreate(CurrentMemoryContext,
+											  "SRF function arguments",
+											  ALLOCSET_DEFAULT_SIZES);
+	return srfstate;
+}
+
+void
+ExecEndSRFScan(SRFScanState *node)
+{
+	/* Nothing to do */
+}
+
+void
+ExecReScanSRF(SRFScanState *node)
+{
+	/* Expecting some results. */
+	node->elemdone = ExprSingleResult;
+
+	/* We must re-evaluate function call arguments. */
+	node->setexpr->setArgsValid = false;
+}
+
+bool
+ExecSRFDonateResultTuplestore(SetExprState *fcache)
+{
+	if (fcache->funcResultStoreDonationEnabled)
+	{
+		if (IsA (fcache->funcResultStoreDonationTarget, MaterialState))
+		{
+			MaterialState *target = (MaterialState *) fcache->funcResultStoreDonationTarget;
+
+			ExecMaterialReceiveResultStore(target, fcache->funcResultStore);
+
+			fcache->funcResultStore = NULL;
+
+			fcache->funcResultStoreDonated = true;
+
+			return true;
+		}
+	}
+
+	return false;
+}
diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
index a068005..58a4c1c 100644
--- a/src/include/access/tupdesc.h
+++ b/src/include/access/tupdesc.h
@@ -151,4 +151,6 @@ extern TupleDesc BuildDescForRelation(List *schema);
 
 extern TupleDesc BuildDescFromLists(List *names, List *types, List *typmods, List *collations);
 
+extern void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+
 #endif							/* TUPDESC_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9be0b38..f41aa6c 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -403,13 +403,6 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
 /*
  * prototypes from functions in execSRF.c
  */
-extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
-												 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
-													ExprContext *econtext,
-													MemoryContext argContext,
-													TupleDesc expectedDesc,
-													bool randomAccess);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
 											   ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 4f7d60d..af9c709 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,16 @@
 
 #include "nodes/execnodes.h"
 
+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+	int			colcount;		/* expected number of result columns */
+	int64		rowcount;		/* # of rows in result set, -1 if not known */
+	ScanState  *scanstate;		/* scan node: either SRFScan or Materialize */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/executor/nodeMaterial.h b/src/include/executor/nodeMaterial.h
index e558c34..b051815 100644
--- a/src/include/executor/nodeMaterial.h
+++ b/src/include/executor/nodeMaterial.h
@@ -21,5 +21,6 @@ extern void ExecEndMaterial(MaterialState *node);
 extern void ExecMaterialMarkPos(MaterialState *node);
 extern void ExecMaterialRestrPos(MaterialState *node);
 extern void ExecReScanMaterial(MaterialState *node);
+extern void ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store);
 
 #endif							/* NODEMATERIAL_H */
diff --git a/src/include/executor/nodeSRFScan.h b/src/include/executor/nodeSRFScan.h
new file mode 100644
index 0000000..2430de5
--- /dev/null
+++ b/src/include/executor/nodeSRFScan.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * IDENTIFICATION
+ *	  src/include/executor/nodeSRFScan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef nodeSRFScan_h
+#define nodeSRFScan_h
+
+#include "nodes/execnodes.h"
+
+typedef struct
+{
+	ScanState		ss;					/* its first field is NodeTag */
+	SetExprState 	*setexpr;			/* state of the expression being evaluated */
+	ExprDoneCond	elemdone;
+	int				colcount;			/* # of columns */
+	bool			tupdesc_checked;	/* has the return tupdesc been checked? */
+	MemoryContext 	argcontext;			/* context for SRF arguments */
+	PlanState		*parent;			/* the plan's parent node */
+} SRFScanState;
+
+extern SRFScanState *ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags);
+extern void ExecEndSRFScan(SRFScanState *node);
+extern void ExecReScanSRF(SRFScanState *node);
+extern bool ExecSRFDonateResultTuplestore(SetExprState *fcache);
+
+#endif /* nodeSRFScan_h */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9ac7bc1..0225241 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -794,10 +794,16 @@ typedef struct SetExprState
 	/*
 	 * For a set-returning function (SRF) that returns a tuplestore, we keep
 	 * the tuplestore here and dole out the result rows one at a time. The
-	 * slot holds the row currently being returned.
+	 * slot holds the row currently being returned. The boolean
+	 * funcResultStoreDonationEnabled indicates whether the an SRF
+	 * returning SFRM_Materialize tupleStore should attempt to donate its
+	 * resultStore to a higher level Materialize node.
 	 */
 	Tuplestorestate *funcResultStore;
 	TupleTableSlot *funcResultSlot;
+	bool 		funcResultStoreDonationEnabled;
+	bool 		funcResultStoreDonated;
+	struct PlanState *funcResultStoreDonationTarget;
 
 	/*
 	 * In some cases we need to compute a tuple descriptor for the function's
@@ -1652,6 +1658,7 @@ typedef struct SubqueryScanState
  *		funcstates			per-function execution states (private in
  *							nodeFunctionscan.c)
  *		argcontext			memory context to evaluate function arguments in
+ *		pending_srf_tuples	still evaluating any SRFs?
  * ----------------
  */
 struct FunctionScanPerFuncState;
@@ -1971,6 +1978,7 @@ typedef struct MaterialState
 	int			eflags;			/* capability flags to pass to tuplestore */
 	bool		eof_underlying; /* reached end of underlying plan? */
 	Tuplestorestate *tuplestorestate;
+	bool		tuplestore_donated; /* was duplestore donated by another node? */
 } MaterialState;
 
 /* ----------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39..18c2ec2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -512,7 +512,9 @@ typedef enum NodeTag
 	T_SupportRequestSelectivity,	/* in nodes/supportnodes.h */
 	T_SupportRequestCost,		/* in nodes/supportnodes.h */
 	T_SupportRequestRows,		/* in nodes/supportnodes.h */
-	T_SupportRequestIndexCondition	/* in nodes/supportnodes.h */
+	T_SupportRequestIndexCondition,	/* in nodes/supportnodes.h */
+	T_SRFScanPlan,
+	T_SRFScanState
 } NodeTag;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8e6594e..ef81b06 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -542,6 +542,14 @@ typedef struct TableFuncScan
 	TableFunc  *tablefunc;		/* table function node */
 } TableFuncScan;
 
+typedef struct SRFScanPlan {
+	Plan		plan;
+	Node		*funcexpr;
+	Node 		*rtfunc;
+	TupleDesc	funcResultDesc;		/* funciton output columns tuple descriptor */
+	bool		funcReturnsTuple;
+} SRFScanPlan;
+
 /* ----------------
  *		CteScan node
  * ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index ef8eec3..7acf02d 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -514,13 +514,15 @@ order by 1, 2;
          ->  Function Scan on pg_catalog.generate_series s1
                Output: s1.s1
                Function Call: generate_series(1, 3)
+               ->  SRF Scan
          ->  HashAggregate
                Output: s2.s2, sum((s1.s1 + s2.s2))
                Group Key: s2.s2
                ->  Function Scan on pg_catalog.generate_series s2
                      Output: s2.s2
                      Function Call: generate_series(1, 3)
-(14 rows)
+                     ->  SRF Scan
+(16 rows)
 
 select s1, s2, sm
 from generate_series(1, 3) s1,
@@ -549,6 +551,7 @@ select array(select sum(x+y) s
  Function Scan on pg_catalog.generate_series x
    Output: (SubPlan 1)
    Function Call: generate_series(1, 3)
+   ->  SRF Scan
    SubPlan 1
      ->  Sort
            Output: (sum((x.x + y.y))), y.y
@@ -559,7 +562,8 @@ select array(select sum(x+y) s
                  ->  Function Scan on pg_catalog.generate_series y
                        Output: y.y
                        Function Call: generate_series(1, 3)
-(13 rows)
+                       ->  SRF Scan
+(15 rows)
 
 select array(select sum(x+y) s
             from generate_series(1,3) y group by y order by s)
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c..5eb7dba 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -374,7 +374,8 @@ select g as alias1, g as alias2
    ->  Sort
          Sort Key: g
          ->  Function Scan on generate_series g
-(6 rows)
+               ->  SRF Scan
+(7 rows)
 
 select g as alias1, g as alias2
   from generate_series(1,3) g
@@ -1234,7 +1235,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(8 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(10 rows)
 
 select *
   from (values (1),(2)) v(x),
@@ -1358,7 +1361,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(10 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(12 rows)
 
 -- Verify that we correctly handle the child node returning a
 -- non-minimal slot, which happens if the input is pre-sorted,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 44d51ed..c56b7bd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1684,6 +1684,7 @@ FROM generate_series(1, 3) g(i);
                            QUERY PLAN                           
 ----------------------------------------------------------------
  Function Scan on generate_series g
+   ->  SRF Scan
    SubPlan 1
      ->  Limit
            ->  Merge Append
@@ -1691,10 +1692,12 @@ FROM generate_series(1, 3) g(i);
                  ->  Sort
                        Sort Key: ((d.d + g.i))
                        ->  Function Scan on generate_series d
+                             ->  SRF Scan
                  ->  Sort
                        Sort Key: ((d_1.d + g.i))
                        ->  Function Scan on generate_series d_1
-(11 rows)
+                             ->  SRF Scan
+(14 rows)
 
 SELECT
     ARRAY(SELECT f.i FROM (
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 07e631d..e61a798 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3073,7 +3073,10 @@ where q1 = thousand or q2 = thousand;
    ->  Nested Loop
          ->  Nested Loop
                ->  Function Scan on q1
+                     ->  SRF Scan
                ->  Function Scan on q2
+                     ->  Materialize
+                           ->  SRF Scan
          ->  Bitmap Heap Scan on tenk1
                Recheck Cond: ((q1.q1 = thousand) OR (q2.q2 = thousand))
                ->  BitmapOr
@@ -3083,7 +3086,7 @@ where q1 = thousand or q2 = thousand;
                            Index Cond: (thousand = q2.q2)
    ->  Hash
          ->  Seq Scan on int4_tbl
-(15 rows)
+(18 rows)
 
 explain (costs off)
 select * from
@@ -3098,14 +3101,17 @@ where thousand = (q1 + q2);
    ->  Nested Loop
          ->  Nested Loop
                ->  Function Scan on q1
+                     ->  SRF Scan
                ->  Function Scan on q2
+                     ->  Materialize
+                           ->  SRF Scan
          ->  Bitmap Heap Scan on tenk1
                Recheck Cond: (thousand = (q1.q1 + q2.q2))
                ->  Bitmap Index Scan on tenk1_thous_tenthous
                      Index Cond: (thousand = (q1.q1 + q2.q2))
    ->  Hash
          ->  Seq Scan on int4_tbl
-(12 rows)
+(15 rows)
 
 --
 -- test ability to generate a suitable plan for a star-schema query
@@ -3472,9 +3478,10 @@ left join unnest(v1ys) as u1(u1y) on u1y = v2y;
          Hash Cond: (u1.u1y = "*VALUES*_1".column2)
          Filter: ("*VALUES*_1".column1 = "*VALUES*".column1)
          ->  Function Scan on unnest u1
+               ->  SRF Scan
          ->  Hash
                ->  Values Scan on "*VALUES*_1"
-(8 rows)
+(9 rows)
 
 select * from
 (values (1, array[10,20]), (2, array[20,30])) as v1(v1x,v1ys)
@@ -4287,7 +4294,9 @@ select 1 from (select a.id FROM a left join b on a.b_id = b.id) q,
    ->  Seq Scan on a
    ->  Function Scan on generate_series gs
          Filter: (a.id = i)
-(4 rows)
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
 
 rollback;
 create temp table parent (k int primary key, pd int);
@@ -4626,7 +4635,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
@@ -4636,7 +4647,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
@@ -4647,7 +4660,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4658,12 +4673,13 @@ explain (costs off)
 ------------------------------------------
  Nested Loop
    ->  Function Scan on generate_series g
+         ->  SRF Scan
    ->  Append
          ->  Seq Scan on int8_tbl a
                Filter: (g.g = q1)
          ->  Seq Scan on int8_tbl b
                Filter: (g.g = q2)
-(7 rows)
+(8 rows)
 
 select * from generate_series(100,200) g,
   lateral (select * from int8_tbl a where g = q1 union all
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 0879c88..172a55d 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -179,9 +179,10 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
  Hash Join
    Hash Cond: (g.g = a.unique1)
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Hash
          ->  Seq Scan on tenk1 a
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
@@ -189,7 +190,8 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
 -------------------------------------------------
  Nested Loop
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Index Scan using tenk1_unique1 on tenk1 a
          Index Cond: (unique1 = g.g)
-(4 rows)
+(5 rows)
 
diff --git a/src/test/regress/expected/pg_lsn.out b/src/test/regress/expected/pg_lsn.out
index 2854cfd..9b2003f 100644
--- a/src/test/regress/expected/pg_lsn.out
+++ b/src/test/regress/expected/pg_lsn.out
@@ -80,13 +80,17 @@ SELECT DISTINCT (i || '/' || j)::pg_lsn f
          Group Key: ((((i.i)::text || '/'::text) || (j.j)::text))::pg_lsn
          ->  Nested Loop
                ->  Function Scan on generate_series k
+                     ->  SRF Scan
                ->  Materialize
                      ->  Nested Loop
                            ->  Function Scan on generate_series j
                                  Filter: ((j > 0) AND (j <= 10))
+                                 ->  SRF Scan
                            ->  Function Scan on generate_series i
                                  Filter: (i <= 10)
-(12 rows)
+                                 ->  Materialize
+                                       ->  SRF Scan
+(16 rows)
 
 SELECT DISTINCT (i || '/' || j)::pg_lsn f
   FROM generate_series(1, 10) i,
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index e85b294..d97897e 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3107,7 +3107,7 @@ select * from sc_test();
 
 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
@@ -4838,7 +4838,9 @@ select i, a from
    ->  Function Scan on public.consumes_rw_array i
          Output: i.i
          Function Call: consumes_rw_array((returns_rw_array(1)))
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)
 
 select i, a from
   (select returns_rw_array(1) as a offset 0) ss,
@@ -4855,7 +4857,8 @@ select consumes_rw_array(a), a from returns_rw_array(1) a;
  Function Scan on public.returns_rw_array a
    Output: consumes_rw_array(a), a
    Function Call: returns_rw_array(1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select consumes_rw_array(a), a from returns_rw_array(1) a;
  consumes_rw_array |   a   
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 36a5929..9dfb642 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -2009,7 +2009,9 @@ select x from int8_tbl, extractq2(int8_tbl) f(x);
    ->  Function Scan on f
          Output: f.x
          Function Call: int8_tbl.q2
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)
 
 select x from int8_tbl, extractq2(int8_tbl) f(x);
          x         
@@ -2098,3 +2100,155 @@ select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 (0 rows)
 
 drop type rngfunc2;
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+	SELECT i, nextval('rngfunc_vpc_seq')
+		FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+	for i in $1..$2 loop
+		return next (i, nextval('rngfunc_mat_seq'));
+	end loop;
+end;
+$$
+LANGUAGE plpgsql;
+-- A VPC SRF that is not part of a complex query should not materialize.
+-- 
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+	select * from rngfunc_vpc(1, 3) t;
+           QUERY PLAN           
+--------------------------------
+ Function Scan on rngfunc_vpc t
+   ->  SRF Scan
+(2 rows)
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+-- 
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+ nextval 
+---------
+       1
+(1 row)
+
+select * from rngfunc_vpc(1, 3) t limit 2;
+ i | s 
+---+---
+ 1 | 2
+ 2 | 3
+(2 rows)
+
+select nextval('rngfunc_vpc_seq');
+ nextval 
+---------
+       4
+(1 row)
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_vpc t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_mat t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_vpc t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=3 loops=1)
+                     SFRM: ValuePerCall
+(8 rows)
+
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_mat t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=0 loops=1)
+                     SFRM: Materialize
+                     Donated tuplestore: true
+(9 rows)
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+ nextval 
+---------
+       4
+(1 row)
+
+select * from rngfunc_mat(1, 3) t limit 2;
+ i | s 
+---+---
+ 1 | 5
+ 2 | 6
+(2 rows)
+
+select nextval('rngfunc_mat_seq');
+ nextval 
+---------
+       8
+(1 row)
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 7189f5b..0f5fcc7 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -577,8 +577,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  HashAggregate
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)
 
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -588,9 +590,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)
 
 select from generate_series(1,5) union select from generate_series(1,3);
 --
@@ -626,8 +630,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  Unique
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)
 
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -637,9 +643,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)
 
 select from generate_series(1,5) union select from generate_series(1,3);
 --
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index 70deadf..9036b7d 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2655,7 +2655,7 @@ select * from sc_test();
 
 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
diff --git a/src/test/regress/sql/rangefuncs.sql b/src/test/regress/sql/rangefuncs.sql
index 5d29d2e..4180603 100644
--- a/src/test/regress/sql/rangefuncs.sql
+++ b/src/test/regress/sql/rangefuncs.sql
@@ -656,3 +656,82 @@ select *, row_to_json(u) from unnest(array[null::rngfunc2, (1,'foo')::rngfunc2,
 select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 
 drop type rngfunc2;
+
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+	SELECT i, nextval('rngfunc_vpc_seq')
+		FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+	for i in $1..$2 loop
+		return next (i, nextval('rngfunc_mat_seq'));
+	end loop;
+end;
+$$
+LANGUAGE plpgsql;
+
+-- A VPC SRF that is not part of a complex query should not materialize.
+-- 
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+	select * from rngfunc_vpc(1, 3) t;
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+-- 
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+select * from rngfunc_vpc(1, 3) t limit 2;
+select nextval('rngfunc_vpc_seq');
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+select * from rngfunc_mat(1, 3) t limit 2;
+select nextval('rngfunc_mat_seq');
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------

#16

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Dent John (#15)

Re: The flinfo->fn_extra question, from me this time.

Dent John <denty@QQdd.eu> writes:

I’ve updated the patch, addressed the rescan issue, and restructured the tests.
[ pipeline-functionscan-v4.patch ]

FWIW, this patch doesn't apply to HEAD anymore. The cfbot
has failed to notice because it is still testing the v3 patch.
Apparently the formatting of this email is weird enough that
neither the archives nor the CF app notice the embedded patch.

Please fix and repost.

regards, tom lane

#17

Dent John

denty@qqdd.eu

almost 6 years ago

In reply to: Tom Lane (#16)

Re: The flinfo->fn_extra question, from me this time.

Thanks Tom.

I’ll look at it. Probably won’t be able to until after the commitfest closes though.

Show quoted text

On 28 Jan 2020, at 02:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dent John <denty@QQdd.eu> writes:

I’ve updated the patch, addressed the rescan issue, and restructured the tests.
[ pipeline-functionscan-v4.patch ]

FWIW, this patch doesn't apply to HEAD anymore. The cfbot
has failed to notice because it is still testing the v3 patch.
Apparently the formatting of this email is weird enough that
neither the archives nor the CF app notice the embedded patch.

Please fix and repost.

regards, tom lane

#18

Thomas Munro

thomas.munro@gmail.com

almost 6 years ago

In reply to: Dent John (#17)

1 attachment(s)

Re: The flinfo->fn_extra question, from me this time.

On Tue, Jan 28, 2020 at 9:59 PM Dent John <denty@qqdd.eu> wrote:

I’ll look at it. Probably won’t be able to until after the commitfest closes though.

(We've seen that hidden attachment problem from Apple Mail before,
discussion of the MIME details in the archives somewhere. I have no
idea what GUI interaction causes that, but most Apple Mail attachments
seem to be fine.)

Here's a quick rebase in case it helps. I mostly applied fine (see
below). The conflicts were just Makefile and expected output files,
which I tried to do the obvious thing with. I had to add a #include
"access/tupdesc.h" to plannodes.h to make something compile (because
it uses TupleDesc). Passes check-world here.

$ gpatch --merge -p1 < ~/pipeline-functionscan-v4.patch
patching file src/backend/access/common/tupdesc.c
patching file src/backend/commands/explain.c
patching file src/backend/executor/Makefile
Hunk #1 NOT MERGED at 19-29.
patching file src/backend/executor/execAmi.c
patching file src/backend/executor/execProcnode.c
patching file src/backend/executor/execSRF.c
patching file src/backend/executor/nodeFunctionscan.c
Hunk #1 merged at 4-20.
patching file src/backend/executor/nodeMaterial.c
patching file src/backend/executor/nodeNestloop.c
patching file src/backend/executor/nodeProjectSet.c
patching file src/backend/executor/nodeSRFScan.c
patching file src/include/access/tupdesc.h
patching file src/include/executor/executor.h
patching file src/include/executor/nodeFunctionscan.h
patching file src/include/executor/nodeMaterial.h
patching file src/include/executor/nodeSRFScan.h
patching file src/include/nodes/execnodes.h
patching file src/include/nodes/nodes.h
patching file src/include/nodes/plannodes.h
patching file src/test/regress/expected/aggregates.out
patching file src/test/regress/expected/groupingsets.out
patching file src/test/regress/expected/inherit.out
patching file src/test/regress/expected/join.out
Hunk #1 NOT MERGED at 3078-3087.
Hunk #3 NOT MERGED at 3111-3120, merged at 3127.
patching file src/test/regress/expected/misc_functions.out
patching file src/test/regress/expected/pg_lsn.out
patching file src/test/regress/expected/plpgsql.out
patching file src/test/regress/expected/rangefuncs.out
patching file src/test/regress/expected/union.out
patching file src/test/regress/sql/plpgsql.sql
patching file src/test/regress/sql/rangefuncs.sql

Attachments:

0001-pipeline-functionscan-v5.patchapplication/octet-stream; name=0001-pipeline-functionscan-v5.patchDownload

From a98f7f3cd0adb39ec377e3afa3e3fdd5fad68adf Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 28 Jan 2020 22:28:32 +1300
Subject: [PATCH] pipeline-functionscan

---
 src/backend/access/common/tupdesc.c          |  51 ++
 src/backend/commands/explain.c               |  42 ++
 src/backend/executor/Makefile                |   1 +
 src/backend/executor/execAmi.c               |   5 +
 src/backend/executor/execProcnode.c          |  10 +
 src/backend/executor/execSRF.c               | 595 +++++--------------
 src/backend/executor/nodeFunctionscan.c      | 287 ++++-----
 src/backend/executor/nodeMaterial.c          |  39 ++
 src/backend/executor/nodeNestloop.c          |   9 +-
 src/backend/executor/nodeProjectSet.c        |   1 +
 src/backend/executor/nodeSRFScan.c           | 262 ++++++++
 src/include/access/tupdesc.h                 |   2 +
 src/include/executor/executor.h              |   7 -
 src/include/executor/nodeFunctionscan.h      |  10 +
 src/include/executor/nodeMaterial.h          |   1 +
 src/include/executor/nodeSRFScan.h           |  30 +
 src/include/nodes/execnodes.h                |  10 +-
 src/include/nodes/nodes.h                    |   4 +-
 src/include/nodes/plannodes.h                |   9 +
 src/test/regress/expected/aggregates.out     |   8 +-
 src/test/regress/expected/groupingsets.out   |  11 +-
 src/test/regress/expected/inherit.out        |   5 +-
 src/test/regress/expected/join.out           |  28 +-
 src/test/regress/expected/misc_functions.out |   6 +-
 src/test/regress/expected/pg_lsn.out         |   6 +-
 src/test/regress/expected/plpgsql.out        |   9 +-
 src/test/regress/expected/rangefuncs.out     | 171 +++++-
 src/test/regress/expected/tsearch.out        |   3 +-
 src/test/regress/expected/union.out          |  16 +-
 src/test/regress/expected/window.out         |   3 +-
 src/test/regress/sql/plpgsql.sql             |   2 +-
 src/test/regress/sql/rangefuncs.sql          |  79 +++
 32 files changed, 1085 insertions(+), 637 deletions(-)
 create mode 100644 src/backend/executor/nodeSRFScan.c
 create mode 100644 src/include/executor/nodeSRFScan.h

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 00bb4cb53d..b870ab6b76 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
+#include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -927,3 +928,53 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)
 
 	return desc;
 }
+
+/*
+ * Check that function result tuple type (src_tupdesc) matches or can
+ * be considered to match what the query expects (dst_tupdesc). If
+ * they don't match, ereport.
+ *
+ * We really only care about number of attributes and data type.
+ * Also, we can ignore type mismatch on columns that are dropped in the
+ * destination type, so long as the physical storage matches.  This is
+ * helpful in some cases involving out-of-date cached plans.
+ */
+void
+tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
+{
+	int			i;
+
+	if (dst_tupdesc->natts != src_tupdesc->natts)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("function return row and query-specified return row do not match"),
+				 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
+								  "Returned row contains %d attributes, but query expects %d.",
+								  src_tupdesc->natts,
+								  src_tupdesc->natts, dst_tupdesc->natts)));
+
+	for (i = 0; i < dst_tupdesc->natts; i++)
+	{
+		Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
+		Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
+
+		if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
+			continue;			/* no worries */
+		if (!dattr->attisdropped)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("function return row and query-specified return row do not match"),
+					 errdetail("Returned type %s at ordinal position %d, but query expects %s.",
+							   format_type_be(sattr->atttypid),
+							   i + 1,
+							   format_type_be(dattr->atttypid))));
+
+		if (dattr->attlen != sattr->attlen ||
+			dattr->attalign != sattr->attalign)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("function return row and query-specified return row do not match"),
+					 errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
+							   i + 1)));
+	}
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c750b1..6f7546a7ae 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,8 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "nodes/extensible.h"
@@ -1181,6 +1183,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
+		case T_SRFScanPlan:
+			pname = sname = "SRF Scan";
+			break;
 		case T_FunctionScan:
 			pname = sname = "Function Scan";
 			break;
@@ -1769,6 +1774,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				}
 			}
 			break;
+		case T_SRFScanPlan:
+			if (es->analyze)
+			{
+				SRFScanState *sss = (SRFScanState *) planstate;
+
+				if (sss->setexpr)
+				{
+					SetExprState *setexpr = (SetExprState *) sss->setexpr;
+					FunctionCallInfo fcinfo = setexpr->fcinfo;
+					ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+					if (rsinfo)
+					{
+						ExplainPropertyText("SFRM",
+							rsinfo->returnMode == SFRM_ValuePerCall ? "ValuePerCall" :
+								rsinfo->returnMode == SFRM_Materialize ? "Materialize" :
+									"Unknown",
+											es);
+
+						if (rsinfo->returnMode == SFRM_Materialize)
+							ExplainPropertyBool("Donated tuplestore",
+												setexpr->funcResultStoreDonated, es);
+					}
+				}
+			}
 		case T_FunctionScan:
 			if (es->verbose)
 			{
@@ -1977,6 +2007,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		IsA(plan, FunctionScan) ||
 		(IsA(planstate, CustomScanState) &&
 		 ((CustomScanState *) planstate)->custom_ps != NIL) ||
 		planstate->subPlan;
@@ -2001,6 +2032,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainNode(innerPlanState(planstate), ancestors,
 					"Inner", NULL, es);
 
+	/* FunctionScan subnodes */
+	if (IsA(planstate, FunctionScanState))
+		for(int i=0; i<((FunctionScanState *)planstate)->nfuncs; i++)
+		{
+			bool oldverbose = es->verbose;
+			es->verbose = false;
+			ExplainNode(&((FunctionScanState *)planstate)->funcstates[i].scanstate->ps,
+						ancestors, "Function", NULL, es);
+			es->verbose = oldverbose;
+		}
+
 	/* special child plans */
 	switch (nodeTag(plan))
 	{
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index a983800e4b..9dae142f71 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -65,6 +65,7 @@ OBJS = \
 	nodeSort.o \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
+	nodeSRFScan.o \
 	nodeTableFuncscan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index b12aeb3334..07ccca7507 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -25,6 +25,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -204,6 +205,10 @@ ExecReScan(PlanState *node)
 			ExecReScanFunctionScan((FunctionScanState *) node);
 			break;
 
+		case T_SRFScanState:
+			ExecReScanSRF((SRFScanState *) node);
+			break;
+
 		case T_TableFuncScanState:
 			ExecReScanTableFuncScan((TableFuncScanState *) node);
 			break;
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 7b2e84f402..da39593b27 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -83,6 +83,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -252,6 +253,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_SRFScanPlan:
+			result = (PlanState *) ExecInitSRFScan((SRFScanPlan *) node,
+														 estate, eflags);
+			break;
+
 		case T_ValuesScan:
 			result = (PlanState *) ExecInitValuesScan((ValuesScan *) node,
 													  estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndFunctionScan((FunctionScanState *) node);
 			break;
 
+		case T_SRFScanState:
+			ExecEndSRFScan((SRFScanState *) node);
+			break;
+
 		case T_TableFuncScanState:
 			ExecEndTableFuncScan((TableFuncScanState *) node);
 			break;
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index 2312cc7142..51c08412e9 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,9 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeMaterial.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -44,17 +47,17 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
 										ExprContext *econtext,
 										Tuplestorestate *resultStore,
 										TupleDesc resultDesc);
-static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
 
 
 /*
- * Prepare function call in FROM (ROWS FROM) for execution.
+ * Prepare function call in FROM (ROWS FROM) or targetlist SRF function
+ * call for execution for execution.
  *
- * This is used by nodeFunctionscan.c.
+ * This is used by nodeFunctionscan.c and nodeProjectSet.c.
  */
 SetExprState *
-ExecInitTableFunctionResult(Expr *expr,
-							ExprContext *econtext, PlanState *parent)
+ExecInitFunctionResultSet(Expr *expr,
+						  ExprContext *econtext, PlanState *parent)
 {
 	SetExprState *state = makeNode(SetExprState);
 
@@ -62,402 +65,52 @@ ExecInitTableFunctionResult(Expr *expr,
 	state->expr = expr;
 	state->func.fn_oid = InvalidOid;
 
-	/*
-	 * Normally the passed expression tree will be a FuncExpr, since the
-	 * grammar only allows a function call at the top level of a table
-	 * function reference.  However, if the function doesn't return set then
-	 * the planner might have replaced the function call via constant-folding
-	 * or inlining.  So if we see any other kind of expression node, execute
-	 * it via the general ExecEvalExpr() code.  That code path will not
-	 * support set-returning functions buried in the expression, though.
-	 */
 	if (IsA(expr, FuncExpr))
 	{
+		/*
+		 * For a FunctionScan or ProjectSet, the passed expression tree can be a
+		 * FuncExpr, since the grammar only allows a function call at the top
+		 * level of a table function reference.
+		 */
 		FuncExpr   *func = (FuncExpr *) expr;
 
 		state->funcReturnsSet = func->funcretset;
 		state->args = ExecInitExprList(func->args, parent);
-
 		init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, func->funcretset, false);
+				   econtext->ecxt_per_query_memory, func->funcretset, true);
 	}
-	else
-	{
-		state->elidedFuncState = ExecInitExpr(expr, parent);
-	}
-
-	return state;
-}
-
-/*
- *		ExecMakeTableFunctionResult
- *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
- *
- * This is used by nodeFunctionscan.c.
- */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
-							ExprContext *econtext,
-							MemoryContext argContext,
-							TupleDesc expectedDesc,
-							bool randomAccess)
-{
-	Tuplestorestate *tupstore = NULL;
-	TupleDesc	tupdesc = NULL;
-	Oid			funcrettype;
-	bool		returnsTuple;
-	bool		returnsSet = false;
-	FunctionCallInfo fcinfo;
-	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
-	HeapTupleData tmptup;
-	MemoryContext callerContext;
-	MemoryContext oldcontext;
-	bool		first_time = true;
-
-	callerContext = CurrentMemoryContext;
-
-	funcrettype = exprType((Node *) setexpr->expr);
-
-	returnsTuple = type_is_rowtype(funcrettype);
-
-	/*
-	 * Prepare a resultinfo node for communication.  We always do this even if
-	 * not expecting a set result, so that we can pass expectedDesc.  In the
-	 * generic-expression case, the expression doesn't actually get to see the
-	 * resultinfo, but set it up anyway because we use some of the fields as
-	 * our own state variables.
-	 */
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = expectedDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-	if (randomAccess)
-		rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
-	fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
-
-	/*
-	 * Normally the passed expression tree will be a SetExprState, since the
-	 * grammar only allows a function call at the top level of a table
-	 * function reference.  However, if the function doesn't return set then
-	 * the planner might have replaced the function call via constant-folding
-	 * or inlining.  So if we see any other kind of expression node, execute
-	 * it via the general ExecEvalExpr() code; the only difference is that we
-	 * don't get a chance to pass a special ReturnSetInfo to any functions
-	 * buried in the expression.
-	 */
-	if (!setexpr->elidedFuncState)
+	else if (IsA(expr, OpExpr))
 	{
 		/*
-		 * This path is similar to ExecMakeFunctionResultSet.
-		 */
-		returnsSet = setexpr->funcReturnsSet;
-		InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
-								 list_length(setexpr->args),
-								 setexpr->fcinfo->fncollation,
-								 NULL, (Node *) &rsinfo);
-
-		/*
-		 * Evaluate the function's argument list.
-		 *
-		 * We can't do this in the per-tuple context: the argument values
-		 * would disappear when we reset that context in the inner loop.  And
-		 * the caller's CurrentMemoryContext is typically a query-lifespan
-		 * context, so we don't want to leak memory there.  We require the
-		 * caller to pass a separate memory context that can be used for this,
-		 * and can be reset each time through to avoid bloat.
-		 */
-		MemoryContextReset(argContext);
-		oldcontext = MemoryContextSwitchTo(argContext);
-		ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-		MemoryContextSwitchTo(oldcontext);
-
-		/*
-		 * If function is strict, and there are any NULL arguments, skip
-		 * calling the function and act like it returned NULL (or an empty
-		 * set, in the returns-set case).
+		 * For ProjectSet, the expression node could be an OpExpr.
 		 */
-		if (setexpr->func.fn_strict)
-		{
-			int			i;
+		OpExpr	   *op = (OpExpr *) expr;
 
-			for (i = 0; i < fcinfo->nargs; i++)
-			{
-				if (fcinfo->args[i].isnull)
-					goto no_function_result;
-			}
-		}
+		state->funcReturnsSet = op->opretset;
+		state->args = ExecInitExprList(op->args, parent);
+		init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
+				   econtext->ecxt_per_query_memory, op->opretset, true);
 	}
 	else
 	{
-		/* Treat setexpr as a generic expression */
-		InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
-	}
-
-	/*
-	 * Switch to short-lived context for calling the function or expression.
-	 */
-	MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	/*
-	 * Loop to handle the ValuePerCall protocol (which is also the same
-	 * behavior needed in the generic ExecEvalExpr path).
-	 */
-	for (;;)
-	{
-		Datum		result;
-
-		CHECK_FOR_INTERRUPTS();
-
 		/*
-		 * reset per-tuple memory context before each call of the function or
-		 * expression. This cleans up any local memory the function may leak
-		 * when called.
+		 * However, again for FunctionScan, if the function doesn't return set
+		 * then the planner might have replaced the function call via constant-
+		 * folding or inlining.  So if we see any other kind of expression node,
+		 * execute it via the general ExecEvalExpr() code.  That code path will
+		 * not support set-returning functions buried in the expression, though.
 		 */
-		ResetExprContext(econtext);
-
-		/* Call the function or expression one time */
-		if (!setexpr->elidedFuncState)
-		{
-			pgstat_init_function_usage(fcinfo, &fcusage);
-
-			fcinfo->isnull = false;
-			rsinfo.isDone = ExprSingleResult;
-			result = FunctionCallInvoke(fcinfo);
-
-			pgstat_end_function_usage(&fcusage,
-									  rsinfo.isDone != ExprMultipleResult);
-		}
-		else
-		{
-			result =
-				ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-			rsinfo.isDone = ExprSingleResult;
-		}
-
-		/* Which protocol does function want to use? */
-		if (rsinfo.returnMode == SFRM_ValuePerCall)
-		{
-			/*
-			 * Check for end of result set.
-			 */
-			if (rsinfo.isDone == ExprEndResult)
-				break;
-
-			/*
-			 * If first time through, build tuplestore for result.  For a
-			 * scalar function result type, also make a suitable tupdesc.
-			 */
-			if (first_time)
-			{
-				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-				tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-				rsinfo.setResult = tupstore;
-				if (!returnsTuple)
-				{
-					tupdesc = CreateTemplateTupleDesc(1);
-					TupleDescInitEntry(tupdesc,
-									   (AttrNumber) 1,
-									   "column",
-									   funcrettype,
-									   -1,
-									   0);
-					rsinfo.setDesc = tupdesc;
-				}
-				MemoryContextSwitchTo(oldcontext);
-			}
-
-			/*
-			 * Store current resultset item.
-			 */
-			if (returnsTuple)
-			{
-				if (!fcinfo->isnull)
-				{
-					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-					if (tupdesc == NULL)
-					{
-						/*
-						 * This is the first non-NULL result from the
-						 * function.  Use the type info embedded in the
-						 * rowtype Datum to look up the needed tupdesc.  Make
-						 * a copy for the query.
-						 */
-						oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-						tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
-															  HeapTupleHeaderGetTypMod(td));
-						rsinfo.setDesc = tupdesc;
-						MemoryContextSwitchTo(oldcontext);
-					}
-					else
-					{
-						/*
-						 * Verify all later returned rows have same subtype;
-						 * necessary in case the type is RECORD.
-						 */
-						if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-							HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-							ereport(ERROR,
-									(errcode(ERRCODE_DATATYPE_MISMATCH),
-									 errmsg("rows returned by function are not all of the same row type")));
-					}
-
-					/*
-					 * tuplestore_puttuple needs a HeapTuple not a bare
-					 * HeapTupleHeader, but it doesn't need all the fields.
-					 */
-					tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-					tmptup.t_data = td;
-
-					tuplestore_puttuple(tupstore, &tmptup);
-				}
-				else
-				{
-					/*
-					 * NULL result from a tuple-returning function; expand it
-					 * to a row of all nulls.  We rely on the expectedDesc to
-					 * form such rows.  (Note: this would be problematic if
-					 * tuplestore_putvalues saved the tdtypeid/tdtypmod from
-					 * the provided descriptor, since that might not match
-					 * what we get from the function itself.  But it doesn't.)
-					 */
-					int			natts = expectedDesc->natts;
-					bool	   *nullflags;
-
-					nullflags = (bool *) palloc(natts * sizeof(bool));
-					memset(nullflags, true, natts * sizeof(bool));
-					tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-				}
-			}
-			else
-			{
-				/* Scalar-type case: just store the function result */
-				tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
-			}
-
-			/*
-			 * Are we done?
-			 */
-			if (rsinfo.isDone != ExprMultipleResult)
-				break;
-		}
-		else if (rsinfo.returnMode == SFRM_Materialize)
-		{
-			/* check we're on the same page as the function author */
-			if (!first_time || rsinfo.isDone != ExprSingleResult)
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-						 errmsg("table-function protocol for materialize mode was not followed")));
-			/* Done evaluating the set result */
-			break;
-		}
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-					 errmsg("unrecognized table-function returnMode: %d",
-							(int) rsinfo.returnMode)));
-
-		first_time = false;
-	}
-
-no_function_result:
-
-	/*
-	 * If we got nothing from the function (ie, an empty-set or NULL result),
-	 * we have to create the tuplestore to return, and if it's a
-	 * non-set-returning function then insert a single all-nulls row.  As
-	 * above, we depend on the expectedDesc to manufacture the dummy row.
-	 */
-	if (rsinfo.setResult == NULL)
-	{
-		MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-		tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-		rsinfo.setResult = tupstore;
-		if (!returnsSet)
-		{
-			int			natts = expectedDesc->natts;
-			bool	   *nullflags;
-
-			MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-			nullflags = (bool *) palloc(natts * sizeof(bool));
-			memset(nullflags, true, natts * sizeof(bool));
-			tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-		}
-	}
-
-	/*
-	 * If function provided a tupdesc, cross-check it.  We only really need to
-	 * do this for functions returning RECORD, but might as well do it always.
-	 */
-	if (rsinfo.setDesc)
-	{
-		tupledesc_match(expectedDesc, rsinfo.setDesc);
-
-		/*
-		 * If it is a dynamically-allocated TupleDesc, free it: it is
-		 * typically allocated in a per-query context, so we must avoid
-		 * leaking it across multiple usages.
-		 */
-		if (rsinfo.setDesc->tdrefcount == -1)
-			FreeTupleDesc(rsinfo.setDesc);
-	}
-
-	MemoryContextSwitchTo(callerContext);
-
-	/* All done, pass back the tuplestore */
-	return rsinfo.setResult;
-}
-
-
-/*
- * Prepare targetlist SRF function call for execution.
- *
- * This is used by nodeProjectSet.c.
- */
-SetExprState *
-ExecInitFunctionResultSet(Expr *expr,
-						  ExprContext *econtext, PlanState *parent)
-{
-	SetExprState *state = makeNode(SetExprState);
+		state->elidedFuncState = ExecInitExpr(expr, parent);
 
-	state->funcReturnsSet = true;
-	state->expr = expr;
-	state->func.fn_oid = InvalidOid;
+		MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
 
-	/*
-	 * Initialize metadata.  The expression node could be either a FuncExpr or
-	 * an OpExpr.
-	 */
-	if (IsA(expr, FuncExpr))
-	{
-		FuncExpr   *func = (FuncExpr *) expr;
+		/* By performing InitFunctionCallInfoData here, we avoid palloc0() */
+		state->fcinfo = palloc(SizeForFunctionCallInfo(list_length(state->args)));
 
-		state->args = ExecInitExprList(func->args, parent);
-		init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, true, true);
-	}
-	else if (IsA(expr, OpExpr))
-	{
-		OpExpr	   *op = (OpExpr *) expr;
+		MemoryContextSwitchTo(oldcontext);
 
-		state->args = ExecInitExprList(op->args, parent);
-		init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, true, true);
+		InitFunctionCallInfoData(*state->fcinfo, NULL, 0, InvalidOid, NULL, NULL);
 	}
-	else
-		elog(ERROR, "unrecognized node type: %d",
-			 (int) nodeTag(expr));
-
-	/* shouldn't get here unless the selected function returns set */
-	Assert(state->func.fn_retset);
 
 	return state;
 }
@@ -473,7 +126,7 @@ ExecInitFunctionResultSet(Expr *expr,
  * needs to live until all rows have been returned (i.e. *isDone set to
  * ExprEndResult or ExprSingleResult).
  *
- * This is used by nodeProjectSet.c.
+ * This is used by nodeProjectSet.c and nodeFunctionscan.c.
  */
 Datum
 ExecMakeFunctionResultSet(SetExprState *fcache,
@@ -486,7 +139,7 @@ ExecMakeFunctionResultSet(SetExprState *fcache,
 	Datum		result;
 	FunctionCallInfo fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
+	ReturnSetInfo *rsinfo;
 	bool		callit;
 	int			i;
 
@@ -539,6 +192,28 @@ restart:
 		return (Datum) 0;
 	}
 
+	/*
+	 * Prepare a resultinfo node for communication.  We always do this even if
+	 * not expecting a set result, so that we can pass expectedDesc.  In the
+	 * generic-expression case, the expression doesn't actually get to see the
+	 * resultinfo, but set it up anyway because we use some of the fields as
+	 * our own state variables.
+	 */
+	fcinfo = fcache->fcinfo;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	if (rsinfo == NULL)
+	{
+		MemoryContext oldContext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = fcache->funcResultDesc;
+		fcinfo->resultinfo = (Node *) rsinfo;
+
+		MemoryContextSwitchTo(oldContext);
+	}
+
 	/*
 	 * arguments is a list of expressions to evaluate before passing to the
 	 * function manager.  We skip the evaluation if it was already done in the
@@ -549,7 +224,6 @@ restart:
 	 * rows from this SRF have been returned, otherwise ValuePerCall SRFs
 	 * would reference freed memory after the first returned row.
 	 */
-	fcinfo = fcache->fcinfo;
 	arguments = fcache->args;
 	if (!fcache->setArgsValid)
 	{
@@ -557,6 +231,14 @@ restart:
 
 		ExecEvalFuncArgs(fcinfo, arguments, econtext);
 		MemoryContextSwitchTo(oldContext);
+
+		/* Reset the rsinfo structure */
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		/* note we do not set SFRM_Materialize_Random or _Preferred */
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		/* isDone is filled below */
+		rsinfo->setResult = NULL;
+		rsinfo->setDesc = NULL;
 	}
 	else
 	{
@@ -568,18 +250,6 @@ restart:
 	 * Now call the function, passing the evaluated parameter values.
 	 */
 
-	/* Prepare a resultinfo node for communication. */
-	fcinfo->resultinfo = (Node *) &rsinfo;
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = fcache->funcResultDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
-	/* note we do not set SFRM_Materialize_Random or _Preferred */
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
 	/*
 	 * If function is strict, and there are any NULL arguments, skip calling
 	 * the function.
@@ -599,16 +269,25 @@ restart:
 
 	if (callit)
 	{
-		pgstat_init_function_usage(fcinfo, &fcusage);
+		if (!fcache->elidedFuncState)
+		{
+			pgstat_init_function_usage(fcinfo, &fcusage);
 
-		fcinfo->isnull = false;
-		rsinfo.isDone = ExprSingleResult;
-		result = FunctionCallInvoke(fcinfo);
-		*isNull = fcinfo->isnull;
-		*isDone = rsinfo.isDone;
+			fcinfo->isnull = false;
+			rsinfo->isDone = ExprSingleResult;
+			result = FunctionCallInvoke(fcinfo);
+			*isNull = fcinfo->isnull;
+			*isDone = rsinfo->isDone;
 
-		pgstat_end_function_usage(&fcusage,
-								  rsinfo.isDone != ExprMultipleResult);
+			pgstat_end_function_usage(&fcusage,
+									  rsinfo->isDone != ExprMultipleResult);
+		}
+		else
+		{
+			result =
+				ExecEvalExpr(fcache->elidedFuncState, econtext, isNull);
+			*isDone = ExprSingleResult;
+		}
 	}
 	else
 	{
@@ -619,10 +298,31 @@ restart:
 	}
 
 	/* Which protocol does function want to use? */
-	if (rsinfo.returnMode == SFRM_ValuePerCall)
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
 	{
 		if (*isDone != ExprEndResult)
 		{
+			/*
+			 * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
+			 */
+			if (rsinfo->setDesc == NULL)
+			{
+				if (fcache->funcReturnsTuple && !*isNull)
+				{
+					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+					/*
+					 * This is the first non-NULL result from the
+					 * function.  Use the type info embedded in the
+					 * rowtype Datum to look up the needed tupdesc.  Make
+					 * a copy for the query.
+					 */
+					MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+					rsinfo->setDesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td), HeapTupleHeaderGetTypMod(td));
+					MemoryContextSwitchTo(oldcontext);
+				}
+			}
+
 			/*
 			 * Save the current argument values to re-use on the next call.
 			 */
@@ -640,21 +340,34 @@ restart:
 			}
 		}
 	}
-	else if (rsinfo.returnMode == SFRM_Materialize)
+	else if (rsinfo->returnMode == SFRM_Materialize)
 	{
 		/* check we're on the same page as the function author */
-		if (rsinfo.isDone != ExprSingleResult)
+		if (rsinfo->isDone != ExprSingleResult)
 			ereport(ERROR,
 					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 					 errmsg("table-function protocol for materialize mode was not followed")));
-		if (rsinfo.setResult != NULL)
+		if (rsinfo->setResult != NULL)
 		{
 			/* prepare to return values from the tuplestore */
 			ExecPrepareTuplestoreResult(fcache, econtext,
-										rsinfo.setResult,
-										rsinfo.setDesc);
-			/* loop back to top to start returning from tuplestore */
-			goto restart;
+										rsinfo->setResult,
+										rsinfo->setDesc);
+
+			/*
+			 * If we are being invoked by a Materialize node, attempt
+			 * to donate the returned tuplstore to it.
+			 */
+			if (ExecSRFDonateResultTuplestore(fcache))
+			{
+				*isDone = ExprMultipleResult;
+				return 0;
+			}
+			else
+			{
+				/* loop back to top to start returning from tuplestore */
+				goto restart;
+			}
 		}
 		/* if setResult was left null, treat it as empty set */
 		*isDone = ExprEndResult;
@@ -665,7 +378,7 @@ restart:
 		ereport(ERROR,
 				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 				 errmsg("unrecognized table-function returnMode: %d",
-						(int) rsinfo.returnMode)));
+						(int) rsinfo->returnMode)));
 
 	return result;
 }
@@ -712,6 +425,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
 							 numargs,
 							 input_collation, NULL, NULL);
+	sexpr->fcinfo->resultinfo = NULL;
 
 	/* If function returns set, check if that's allowed by caller */
 	if (sexpr->func.fn_retset && !allowSRF)
@@ -782,6 +496,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	sexpr->funcResultStore = NULL;
 	sexpr->funcResultSlot = NULL;
 	sexpr->shutdown_reg = false;
+	sexpr->funcResultStoreDonationEnabled = false;
 }
 
 /*
@@ -792,6 +507,7 @@ static void
 ShutdownSetExpr(Datum arg)
 {
 	SetExprState *sexpr = castNode(SetExprState, DatumGetPointer(arg));
+	ReturnSetInfo *rsinfo = castNode(ReturnSetInfo, sexpr->fcinfo->resultinfo);
 
 	/* If we have a slot, make sure it's let go of any tuplestore pointer */
 	if (sexpr->funcResultSlot)
@@ -802,6 +518,13 @@ ShutdownSetExpr(Datum arg)
 		tuplestore_end(sexpr->funcResultStore);
 	sexpr->funcResultStore = NULL;
 
+	/* Release the ReturnSetInfo structure */
+	if (rsinfo != NULL)
+	{
+		pfree(rsinfo);
+		sexpr->fcinfo->resultinfo = NULL;
+	}
+
 	/* Clear any active set-argument state */
 	sexpr->setArgsValid = false;
 
@@ -910,53 +633,3 @@ ExecPrepareTuplestoreResult(SetExprState *sexpr,
 		sexpr->shutdown_reg = true;
 	}
 }
-
-/*
- * Check that function result tuple type (src_tupdesc) matches or can
- * be considered to match what the query expects (dst_tupdesc). If
- * they don't match, ereport.
- *
- * We really only care about number of attributes and data type.
- * Also, we can ignore type mismatch on columns that are dropped in the
- * destination type, so long as the physical storage matches.  This is
- * helpful in some cases involving out-of-date cached plans.
- */
-static void
-tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
-{
-	int			i;
-
-	if (dst_tupdesc->natts != src_tupdesc->natts)
-		ereport(ERROR,
-				(errcode(ERRCODE_DATATYPE_MISMATCH),
-				 errmsg("function return row and query-specified return row do not match"),
-				 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
-								  "Returned row contains %d attributes, but query expects %d.",
-								  src_tupdesc->natts,
-								  src_tupdesc->natts, dst_tupdesc->natts)));
-
-	for (i = 0; i < dst_tupdesc->natts; i++)
-	{
-		Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
-		Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
-
-		if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
-			continue;			/* no worries */
-		if (!dattr->attisdropped)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("function return row and query-specified return row do not match"),
-					 errdetail("Returned type %s at ordinal position %d, but query expects %s.",
-							   format_type_be(sattr->atttypid),
-							   i + 1,
-							   format_type_be(dattr->atttypid))));
-
-		if (dattr->attlen != sattr->attlen ||
-			dattr->attalign != sattr->attalign)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("function return row and query-specified return row do not match"),
-					 errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
-							   i + 1)));
-	}
-}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index ccb66ce1aa..84e34d969f 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -1,7 +1,23 @@
 /*-------------------------------------------------------------------------
  *
  * nodeFunctionscan.c
- *	  Support routines for scanning RangeFunctions (functions in rangetable).
+ *	  Coordinates a scan over PL functions. It supports several use cases:
+ *
+ *      - single function scan, and multiple functions in ROWS FROM;
+ *      - SRFs and regular functions;
+ *      - tuple- and scalar-returning functions;
+ *      - it will materialise if eflags call for it;
+ *      - if possible, it will pipeline it’s output;
+ *      - it avoids double-materialisation in case of SFRM_Materialize.
+ *
+ *    To achieve these, it depends upon the Materialize (for materialisation
+ *    and pipelining) and SRFScan (for SRF handling, and tuple expansion,
+ *    and double-materialisation avoidance) nodes, and the actual function
+ *    invocation (for SRF- and regular functions alike) is done in execSRF.c.
+ *
+ *    The Planner knows nothing of the Materialize and SRFScan structures.
+ *    They are constructed by the Executor at execution time, and are reported
+ *    in the EXPLAIN output.
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -24,26 +40,15 @@
 
 #include "catalog/pg_type.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
-
-
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-	SetExprState *setexpr;		/* state of the expression being evaluated */
-	TupleDesc	tupdesc;		/* desc of the function result type */
-	int			colcount;		/* expected number of result columns */
-	Tuplestorestate *tstore;	/* holds the function result set */
-	int64		rowcount;		/* # of rows in result set, -1 if not known */
-	TupleTableSlot *func_slot;	/* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
-static TupleTableSlot *FunctionNext(FunctionScanState *node);
+#include "utils/syscache.h"
 
 
 /* ----------------------------------------------------------------
@@ -82,37 +87,22 @@ FunctionNext(FunctionScanState *node)
 		 * into the scan result slot. No need to update ordinality or
 		 * rowcounts either.
 		 */
-		Tuplestorestate *tstore = node->funcstates[0].tstore;
+		TupleTableSlot *rs = node->funcstates[0].scanstate->ps.ps_ResultTupleSlot;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
+		 * Get the next tuple from the Scan node.
+		 *
+		 * If we have a rowcount for the function, and we know the previous
+		 * read position was out of bounds, don't try the read. This allows
+		 * backward scan to work when there are mixed row counts present.
 		 */
-		if (tstore == NULL)
-		{
-			node->funcstates[0].tstore = tstore =
-				ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											node->funcstates[0].tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
+		rs = ExecProcNode(&node->funcstates[0].scanstate->ps);
 
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(tstore);
-		}
+		if (TupIsNull(rs))
+			return NULL;
+
+		ExecCopySlot(scanslot, rs);
 
-		/*
-		 * Get the next tuple from tuplestore.
-		 */
-		(void) tuplestore_gettupleslot(tstore,
-									   ScanDirectionIsForward(direction),
-									   false,
-									   scanslot);
 		return scanslot;
 	}
 
@@ -141,46 +131,22 @@ FunctionNext(FunctionScanState *node)
 	for (funcno = 0; funcno < node->nfuncs; funcno++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[funcno];
+		TupleTableSlot *func_slot = fs->scanstate->ps.ps_ResultTupleSlot;
 		int			i;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
-		 */
-		if (fs->tstore == NULL)
-		{
-			fs->tstore =
-				ExecMakeTableFunctionResult(fs->setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											fs->tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(fs->tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
+		 * Get the next tuple from the Scan node.
 		 *
 		 * If we have a rowcount for the function, and we know the previous
 		 * read position was out of bounds, don't try the read. This allows
 		 * backward scan to work when there are mixed row counts present.
 		 */
 		if (fs->rowcount != -1 && fs->rowcount < oldpos)
-			ExecClearTuple(fs->func_slot);
+			ExecClearTuple(func_slot);
 		else
-			(void) tuplestore_gettupleslot(fs->tstore,
-										   ScanDirectionIsForward(direction),
-										   false,
-										   fs->func_slot);
+			func_slot = ExecProcNode(&fs->scanstate->ps);
 
-		if (TupIsNull(fs->func_slot))
+		if (TupIsNull(func_slot))
 		{
 			/*
 			 * If we ran out of data for this function in the forward
@@ -207,12 +173,12 @@ FunctionNext(FunctionScanState *node)
 			/*
 			 * we have a result, so just copy it to the result cols.
 			 */
-			slot_getallattrs(fs->func_slot);
+			slot_getallattrs(func_slot);
 
 			for (i = 0; i < fs->colcount; i++)
 			{
-				scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-				scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
+				scanslot->tts_values[att] = func_slot->tts_values[i];
+				scanslot->tts_isnull[att] = func_slot->tts_isnull[i];
 				att++;
 			}
 
@@ -272,6 +238,53 @@ ExecFunctionScan(PlanState *pstate)
 					(ExecScanRecheckMtd) FunctionRecheck);
 }
 
+/*
+ * Helper function to build target list, which is required in order for
+ * normal processing of ExecInit, from the tupdesc.
+ */
+static void
+build_tlist_for_tupdesc(TupleDesc tupdesc, int colcount,
+						List **mat_tlist, List **scan_tlist)
+{
+	Form_pg_attribute attr;
+	int attno;
+
+	for (attno = 1; attno <= colcount; attno++)
+	{
+		attr = TupleDescAttr(tupdesc, attno - 1);
+
+		if (attr->attisdropped)
+		{
+			*scan_tlist = lappend(*scan_tlist,
+							  makeTargetEntry((Expr *)
+								  makeConst(INT2OID, -1,
+											0,
+											attr->attlen,
+											0 /* value */, true /* isnull */,
+											true),
+								  attno, attr->attname.data,
+								  attr->attisdropped));
+			*mat_tlist = lappend(*mat_tlist,
+							 makeTargetEntry((Expr *)
+								 makeVar(1 /* varno */, attno, INT2OID, -1, 0, 0),
+								 attno, attr->attname.data, attr->attisdropped));
+		}
+		else
+		{
+			*scan_tlist = lappend(*scan_tlist,
+							  makeTargetEntry((Expr *)
+								  makeVar(1 /* varno */, attno, attr->atttypid,
+										  attr->atttypmod, attr->attcollation, 0),
+								  attno, attr->attname.data, attr->attisdropped));
+			*mat_tlist = lappend(*mat_tlist,
+							 makeTargetEntry((Expr *)
+								 makeVar(1 /* varno */, attno, attr->atttypid,
+										 attr->atttypmod, attr->attcollation, 0),
+								 attno, attr->attname.data, attr->attisdropped));
+		}
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitFunctionScan
  * ----------------------------------------------------------------
@@ -285,6 +298,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	int			i,
 				natts;
 	ListCell   *lc;
+	bool 		needs_material;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & EXEC_FLAG_MARK));
@@ -315,6 +329,9 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	else
 		scanstate->simple = false;
 
+	/* Only add a Mterialize node if required */
+	needs_material = eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD);
+
 	/*
 	 * Ordinal 0 represents the "before the first row" position.
 	 *
@@ -347,23 +364,15 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 		TypeFuncClass functypclass;
 		Oid			funcrettype;
 		TupleDesc	tupdesc;
+		List /* TargetEntry* */ *mat_tlist = NIL;
+		List /* TargetEntry* */ *scan_tlist = NIL;
+		bool funcReturnsTuple;
 
-		fs->setexpr =
-			ExecInitTableFunctionResult((Expr *) funcexpr,
-										scanstate->ss.ps.ps_ExprContext,
-										&scanstate->ss.ps);
-
-		/*
-		 * Don't allocate the tuplestores; the actual calls to the functions
-		 * do that.  NULL means that we have not called the function yet (or
-		 * need to call it again after a rescan).
-		 */
-		fs->tstore = NULL;
 		fs->rowcount = -1;
 
 		/*
 		 * Now determine if the function returns a simple or composite type,
-		 * and build an appropriate tupdesc.  Note that in the composite case,
+		 * and build an appropriate targetlist.  Note that in the composite case,
 		 * the function may now return more columns than it did when the plan
 		 * was made; we have to ignore any columns beyond "colcount".
 		 */
@@ -379,6 +388,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			Assert(tupdesc->natts >= colcount);
 			/* Must copy it out of typcache for safety */
 			tupdesc = CreateTupleDescCopy(tupdesc);
+			funcReturnsTuple = true;
 		}
 		else if (functypclass == TYPEFUNC_SCALAR)
 		{
@@ -393,6 +403,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			TupleDescInitEntryCollation(tupdesc,
 										(AttrNumber) 1,
 										exprCollation(funcexpr));
+			funcReturnsTuple = false;
 		}
 		else if (functypclass == TYPEFUNC_RECORD)
 		{
@@ -407,6 +418,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			 * case it doesn't.)
 			 */
 			BlessTupleDesc(tupdesc);
+			funcReturnsTuple = true;
 		}
 		else
 		{
@@ -414,21 +426,45 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			elog(ERROR, "function in FROM has unsupported return type");
 		}
 
-		fs->tupdesc = tupdesc;
 		fs->colcount = colcount;
 
-		/*
-		 * We only need separate slots for the function results if we are
-		 * doing ordinality or multiple functions; otherwise, we'll fetch
-		 * function results directly into the scan slot.
-		 */
-		if (!scanstate->simple)
+		/* Expand tupdesc into targetlists for the scan nodes */
+		build_tlist_for_tupdesc(tupdesc, colcount, &mat_tlist, &scan_tlist);
+
+		SRFScanPlan *srfscan = makeNode(SRFScanPlan);
+		srfscan->funcexpr = funcexpr;
+		srfscan->rtfunc = (Node *) rtfunc;
+		srfscan->plan.targetlist = scan_tlist;
+		srfscan->plan.extParam = rtfunc->funcparams;
+		srfscan->plan.allParam = rtfunc->funcparams;
+		srfscan->funcResultDesc = tupdesc;
+		srfscan->funcReturnsTuple = funcReturnsTuple;
+		Plan *scan = &srfscan->plan;
+
+		if (needs_material)
 		{
-			fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-												   &TTSOpsMinimalTuple);
+			Material *fscan = makeNode(Material);
+			fscan->plan.lefttree = scan;
+			fscan->plan.targetlist = mat_tlist;
+			fscan->plan.extParam = rtfunc->funcparams;
+			fscan->plan.allParam = rtfunc->funcparams;
+			scan = &fscan->plan;
+		}
+
+		fs->scanstate = (ScanState *) ExecInitNode (scan, estate, eflags);
+
+		if (needs_material)
+		{
+			/*
+			 * Tell the SRFScan about its parent, so that it can donate
+			 * the SRF's tuplestore if the SRF uses SFRM_Materialize.
+			 */
+			MaterialState *ms = (MaterialState *) fs->scanstate;
+			SRFScanState *sss = (SRFScanState *) outerPlanState(ms);
+
+			sss->setexpr->funcResultStoreDonationEnabled = true;
+			sss->setexpr->funcResultStoreDonationTarget = &ms->ss.ps;
 		}
-		else
-			fs->func_slot = NULL;
 
 		natts += colcount;
 		i++;
@@ -443,7 +479,11 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	if (scanstate->simple)
 	{
-		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].tupdesc);
+		SRFScanState *sss = IsA(scanstate->funcstates[0].scanstate, MaterialState) ?
+				(SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[0].scanstate) :
+				(SRFScanState *) scanstate->funcstates[0].scanstate;
+
+		scan_tupdesc = CreateTupleDescCopy(sss->setexpr->funcResultDesc);
 		scan_tupdesc->tdtypeid = RECORDOID;
 		scan_tupdesc->tdtypmod = -1;
 	}
@@ -458,8 +498,12 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 
 		for (i = 0; i < nfuncs; i++)
 		{
-			TupleDesc	tupdesc = scanstate->funcstates[i].tupdesc;
-			int			colcount = scanstate->funcstates[i].colcount;
+			SRFScanState *sss = IsA(scanstate->funcstates[i].scanstate, MaterialState) ?
+					(SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[i].scanstate) :
+					(SRFScanState *) scanstate->funcstates[i].scanstate;
+
+			TupleDesc	tupdesc = sss->setexpr->funcResultDesc;
+			int			colcount = sss->colcount;
 			int			j;
 
 			for (j = 1; j <= colcount; j++)
@@ -536,20 +580,11 @@ ExecEndFunctionScan(FunctionScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/*
-	 * Release slots and tuplestore resources
+	 * Release the Material scan resources
 	 */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-
-		if (fs->tstore != NULL)
-		{
-			tuplestore_end(node->funcstates[i].tstore);
-			fs->tstore = NULL;
-		}
+		ExecEndNode(&node->funcstates[i].scanstate->ps);
 	}
 }
 
@@ -568,23 +603,12 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-	}
 
 	ExecScanReScan(&node->ss);
 
 	/*
-	 * Here we have a choice whether to drop the tuplestores (and recompute
-	 * the function outputs) or just rescan them.  We must recompute if an
-	 * expression contains changed parameters, else we rescan.
-	 *
-	 * XXX maybe we should recompute if the function is volatile?  But in
-	 * general the executor doesn't conditionalize its actions on that.
+	 * We must recompute if an
+	 * expression contains changed parameters.
 	 */
 	if (chgparam)
 	{
@@ -597,11 +621,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 			if (bms_overlap(chgparam, rtfunc->funcparams))
 			{
-				if (node->funcstates[i].tstore != NULL)
-				{
-					tuplestore_end(node->funcstates[i].tstore);
-					node->funcstates[i].tstore = NULL;
-				}
+				UpdateChangedParamSet(&node->funcstates[i].scanstate->ps,
+									  node->ss.ps.chgParam);
+
 				node->funcstates[i].rowcount = -1;
 			}
 			i++;
@@ -611,10 +633,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	/* Reset ordinality counter */
 	node->ordinal = 0;
 
-	/* Make sure we rewind any remaining tuplestores */
+	/* Rescan them all */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		if (node->funcstates[i].tstore != NULL)
-			tuplestore_rescan(node->funcstates[i].tstore);
+		ExecReScan(&node->funcstates[i].scanstate->ps);
 	}
 }
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index dd077f4323..fdec8521ad 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -45,9 +45,12 @@ ExecMaterial(PlanState *pstate)
 	Tuplestorestate *tuplestorestate;
 	bool		eof_tuplestore;
 	TupleTableSlot *slot;
+	bool 		first_time = true;
 
 	CHECK_FOR_INTERRUPTS();
 
+restart:
+
 	/*
 	 * get state info from node
 	 */
@@ -126,12 +129,24 @@ ExecMaterial(PlanState *pstate)
 		PlanState  *outerNode;
 		TupleTableSlot *outerslot;
 
+		if (!first_time)
+			ereport(ERROR,
+					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+					 errmsg("attempt to scan donated result store failed")));
+
 		/*
 		 * We can only get here with forward==true, so no need to worry about
 		 * which direction the subplan will go.
 		 */
 		outerNode = outerPlanState(node);
 		outerslot = ExecProcNode(outerNode);
+
+		if (node->tuplestore_donated)
+		{
+			first_time = false;
+			goto restart;
+		}
+
 		if (TupIsNull(outerslot))
 		{
 			node->eof_underlying = true;
@@ -196,6 +211,7 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
 
 	matstate->eof_underlying = false;
 	matstate->tuplestorestate = NULL;
+	matstate->tuplestore_donated = false;
 
 	/*
 	 * Miscellaneous initialization
@@ -346,6 +362,7 @@ ExecReScanMaterial(MaterialState *node)
 		{
 			tuplestore_end(node->tuplestorestate);
 			node->tuplestorestate = NULL;
+			node->tuplestore_donated = false;
 			if (outerPlan->chgParam == NULL)
 				ExecReScan(outerPlan);
 			node->eof_underlying = false;
@@ -361,8 +378,30 @@ ExecReScanMaterial(MaterialState *node)
 		 * if chgParam of subnode is not null then plan will be re-scanned by
 		 * first ExecProcNode.
 		 */
+		node->tuplestore_donated = false;
 		if (outerPlan->chgParam == NULL)
 			ExecReScan(outerPlan);
 		node->eof_underlying = false;
 	}
 }
+
+void
+ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store)
+{
+	if (!node->tuplestore_donated)
+	{
+		if (node->tuplestorestate)
+		{
+			tuplestore_end(node->tuplestorestate);
+		}
+
+		node->tuplestorestate = store;
+		node->tuplestore_donated = true;
+		node->eof_underlying = true;
+	}
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+				 errmsg("Result tuplestore donated more than once")));
+}
+
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b07c2996d4..48d7db54d3 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -293,9 +293,16 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 * such parameters, then there is no point in REWIND support at all in the
 	 * inner child, because it will always be rescanned with fresh parameter
 	 * values.
+	 *
+	 * The exception to this simple rule is a ROWS FROM function scan where it
+	 * is possible that only some of the inolved functions are affected by the
+	 * parameters. In this case, we blanket request support for REWIND. A more
+	 * intelligent approch would request REWIND only for nodes unaffected by
+	 * the parameters, but we aren't so intelligent yet.
 	 */
 	outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
-	if (node->nestParams == NIL)
+	if (node->nestParams == NIL ||
+		IsA(innerPlan(node), FunctionScan))
 		eflags |= EXEC_FLAG_REWIND;
 	else
 		eflags &= ~EXEC_FLAG_REWIND;
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 4a1b060fde..66a1d30778 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -283,6 +283,7 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
 			state->elems[off] = (Node *)
 				ExecInitFunctionResultSet(expr, state->ps.ps_ExprContext,
 										  &state->ps);
+			Assert (((SetExprState *) state->elems[off])->funcReturnsSet);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeSRFScan.c b/src/backend/executor/nodeSRFScan.c
new file mode 100644
index 0000000000..4d61a95ed7
--- /dev/null
+++ b/src/backend/executor/nodeSRFScan.c
@@ -0,0 +1,262 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSRFScan.c
+ *	  Coordinates a scan over a single SRF function, or a non-SRF as if it
+ *    were an SRF returning a single row.
+ *
+ *    SRFScan expands the function’s output if it returns a tuple. If the
+ *    SRF uses SFRM_Materialize, it will donate the returned tuplestore to
+ *    the parent Materialize node, if there is one, to avoid double-
+ *    materialisation.
+ *
+ *    The Planner knows nothing of the SRFScan structure. It is constructed
+ *    by the Executor at execution time, and is reported in the EXPLAIN
+ *    output.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeSRFScan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
+#include "funcapi.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+static TupleTableSlot *			/* result tuple from subplan */
+ExecSRF(PlanState *node)
+{
+	SRFScanState *pstate = (SRFScanState *) node;
+	ExprContext *econtext = pstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *resultSlot = pstate->ss.ps.ps_ResultTupleSlot;
+	Datum result;
+	ExprDoneCond *isdone = &pstate->elemdone;
+	bool	   isnull;
+	SetExprState *setexpr = pstate->setexpr;
+	FunctionCallInfo fcinfo;
+	ReturnSetInfo *rsinfo;
+
+	/* We only support forward scans. */
+	Assert(ScanDirectionIsForward(estate->es_direction));
+
+	ExecClearTuple(resultSlot);
+
+	/*
+	 * Only execute something if we are not already complete...
+	 */
+	if (*isdone == ExprEndResult)
+		return NULL;
+
+	/*
+	 * Evaluate SRF - possibly continuing previously started output.
+	 */
+	result = ExecMakeFunctionResultSet((SetExprState *) setexpr,
+										econtext, pstate->argcontext,
+										&isnull, isdone);
+
+	if (*isdone == ExprEndResult)
+		return NULL;
+
+	fcinfo = setexpr->fcinfo;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	/* Have we donated the result store? */
+	if (setexpr->funcResultStoreDonated)
+		return 0;
+
+	/*
+	 * If we obtained a tupdesc, check it is appropriate, but not in
+	 * the case of SFRM_Materialize becuase is will have been checked
+	 * already.
+	 */
+	if (!pstate->tupdesc_checked &&
+		setexpr->funcReturnsTuple &&
+		rsinfo->returnMode != SFRM_Materialize &&
+		rsinfo->setDesc && setexpr->funcResultDesc)
+	{
+		tupledesc_match (setexpr->funcResultDesc, rsinfo->setDesc);
+		pstate->tupdesc_checked = true;
+	}
+
+	/*
+	 * If returned a tupple, expand into multiple columns.
+	 */
+	if (setexpr->funcReturnsTuple)
+	{
+		if (!isnull)
+		{
+			HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+			/*
+			 * In SFRM_Materialize mode, the type will have been checked
+			 * already.
+			 */
+			if (rsinfo->returnMode != SFRM_Materialize)
+			{
+				/*
+				 * Verify all later returned rows have same subtype;
+				 * necessary in case the type is RECORD.
+				 */
+				if (HeapTupleHeaderGetTypeId(td) != rsinfo->setDesc->tdtypeid ||
+					HeapTupleHeaderGetTypMod(td) != rsinfo->setDesc->tdtypmod)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATATYPE_MISMATCH),
+							 errmsg("rows returned by function are not all of the same row type")));
+			}
+
+			/*
+			 * tuplestore_puttuple needs a HeapTuple not a bare
+			 * HeapTupleHeader, but it doesn't need all the fields.
+			 */
+			HeapTupleData tmptup;
+			tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+			tmptup.t_data = td;
+
+			heap_deform_tuple (&tmptup, setexpr->funcResultDesc,
+							   resultSlot->tts_values,
+							   resultSlot->tts_isnull);
+		}
+		else
+		{
+			/*
+			 * populate the result cols with nulls
+			 */
+			int i;
+			for (i = 0; i < pstate->colcount; i++)
+			{
+				resultSlot->tts_values[i] = (Datum) 0;
+				resultSlot->tts_isnull[i] = true;
+			}
+		}
+	}
+	else
+	{
+		/* Scalar-type case: just store the function result */
+		resultSlot->tts_values[0] = result;
+		resultSlot->tts_isnull[0] = isnull;
+	}
+
+	/*
+	 * If we achieved obtained a single result, don't execute again.
+	 */
+	if (*isdone == ExprSingleResult)
+		*isdone = ExprEndResult;
+
+	ExecStoreVirtualTuple(resultSlot);
+	return resultSlot;
+}
+
+SRFScanState *
+ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags)
+{
+	RangeTblFunction *rtfunc = (RangeTblFunction *) node->rtfunc;
+
+	SRFScanState *srfstate;
+
+	/*
+	 * SRFScan should not have any children.
+	 */
+	Assert(outerPlan(node) == NULL);
+	Assert(innerPlan(node) == NULL);
+
+	/*
+	 * create state structure
+	 */
+	srfstate = makeNode(SRFScanState);
+	srfstate->ss.ps.plan = (Plan *) node;
+	srfstate->ss.ps.state = estate;
+	srfstate->ss.ps.ExecProcNode = ExecSRF;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &srfstate->ss.ps);
+
+	srfstate->setexpr =
+		ExecInitFunctionResultSet((Expr *) node->funcexpr,
+								  srfstate->ss.ps.ps_ExprContext,
+								  &srfstate->ss.ps);
+
+	srfstate->setexpr->funcResultDesc = node->funcResultDesc;
+	srfstate->setexpr->funcReturnsTuple = node->funcReturnsTuple;
+
+	srfstate->colcount = rtfunc->funccolcount;
+
+	srfstate->tupdesc_checked = false;
+
+	/* Start with the assumption we will get some result. */
+	srfstate->elemdone = ExprSingleResult;
+
+	/*
+	 * Initialize result type and slot. No need to initialize projection info
+	 * because this node doesn't do projections (ps_ResultTupleSlot).
+	 *
+	 * material nodes only return tuples from their materialized relation.
+	 */
+	ExecInitScanTupleSlot(estate, &srfstate->ss, srfstate->setexpr->funcResultDesc,
+						  &TTSOpsMinimalTuple);
+	ExecInitResultTupleSlotTL(&srfstate->ss.ps, &TTSOpsMinimalTuple);
+	ExecAssignScanProjectionInfo(&srfstate->ss);
+
+	/*
+	 * Create a memory context that ExecMakeFunctionResultSet can use to
+	 * evaluate function arguments in.  We can't use the per-tuple context for
+	 * this because it gets reset too often; but we don't want to leak
+	 * evaluation results into the query-lifespan context either.  We use one
+	 * context for the arguments of all tSRFs, as they have roughly equivalent
+	 * lifetimes.
+	 */
+	srfstate->argcontext = AllocSetContextCreate(CurrentMemoryContext,
+											  "SRF function arguments",
+											  ALLOCSET_DEFAULT_SIZES);
+	return srfstate;
+}
+
+void
+ExecEndSRFScan(SRFScanState *node)
+{
+	/* Nothing to do */
+}
+
+void
+ExecReScanSRF(SRFScanState *node)
+{
+	/* Expecting some results. */
+	node->elemdone = ExprSingleResult;
+
+	/* We must re-evaluate function call arguments. */
+	node->setexpr->setArgsValid = false;
+}
+
+bool
+ExecSRFDonateResultTuplestore(SetExprState *fcache)
+{
+	if (fcache->funcResultStoreDonationEnabled)
+	{
+		if (IsA (fcache->funcResultStoreDonationTarget, MaterialState))
+		{
+			MaterialState *target = (MaterialState *) fcache->funcResultStoreDonationTarget;
+
+			ExecMaterialReceiveResultStore(target, fcache->funcResultStore);
+
+			fcache->funcResultStore = NULL;
+
+			fcache->funcResultStoreDonated = true;
+
+			return true;
+		}
+	}
+
+	return false;
+}
diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
index d17af13ee3..118fdcb52f 100644
--- a/src/include/access/tupdesc.h
+++ b/src/include/access/tupdesc.h
@@ -151,4 +151,6 @@ extern TupleDesc BuildDescForRelation(List *schema);
 
 extern TupleDesc BuildDescFromLists(List *names, List *types, List *typmods, List *collations);
 
+extern void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+
 #endif							/* TUPDESC_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6ef3e1fe06..f0fb68dd6a 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -404,13 +404,6 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
 /*
  * prototypes from functions in execSRF.c
  */
-extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
-												 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
-													ExprContext *econtext,
-													MemoryContext argContext,
-													TupleDesc expectedDesc,
-													bool randomAccess);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
 											   ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 74e8eefd38..ca89980edf 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,16 @@
 
 #include "nodes/execnodes.h"
 
+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+	int			colcount;		/* expected number of result columns */
+	int64		rowcount;		/* # of rows in result set, -1 if not known */
+	ScanState  *scanstate;		/* scan node: either SRFScan or Materialize */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/executor/nodeMaterial.h b/src/include/executor/nodeMaterial.h
index 99e7cbfc94..f55922c5bd 100644
--- a/src/include/executor/nodeMaterial.h
+++ b/src/include/executor/nodeMaterial.h
@@ -21,5 +21,6 @@ extern void ExecEndMaterial(MaterialState *node);
 extern void ExecMaterialMarkPos(MaterialState *node);
 extern void ExecMaterialRestrPos(MaterialState *node);
 extern void ExecReScanMaterial(MaterialState *node);
+extern void ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store);
 
 #endif							/* NODEMATERIAL_H */
diff --git a/src/include/executor/nodeSRFScan.h b/src/include/executor/nodeSRFScan.h
new file mode 100644
index 0000000000..2430de5976
--- /dev/null
+++ b/src/include/executor/nodeSRFScan.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * IDENTIFICATION
+ *	  src/include/executor/nodeSRFScan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef nodeSRFScan_h
+#define nodeSRFScan_h
+
+#include "nodes/execnodes.h"
+
+typedef struct
+{
+	ScanState		ss;					/* its first field is NodeTag */
+	SetExprState 	*setexpr;			/* state of the expression being evaluated */
+	ExprDoneCond	elemdone;
+	int				colcount;			/* # of columns */
+	bool			tupdesc_checked;	/* has the return tupdesc been checked? */
+	MemoryContext 	argcontext;			/* context for SRF arguments */
+	PlanState		*parent;			/* the plan's parent node */
+} SRFScanState;
+
+extern SRFScanState *ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags);
+extern void ExecEndSRFScan(SRFScanState *node);
+extern void ExecReScanSRF(SRFScanState *node);
+extern bool ExecSRFDonateResultTuplestore(SetExprState *fcache);
+
+#endif /* nodeSRFScan_h */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 1f6f5bbc20..973115aefe 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -793,10 +793,16 @@ typedef struct SetExprState
 	/*
 	 * For a set-returning function (SRF) that returns a tuplestore, we keep
 	 * the tuplestore here and dole out the result rows one at a time. The
-	 * slot holds the row currently being returned.
+	 * slot holds the row currently being returned. The boolean
+	 * funcResultStoreDonationEnabled indicates whether the an SRF
+	 * returning SFRM_Materialize tupleStore should attempt to donate its
+	 * resultStore to a higher level Materialize node.
 	 */
 	Tuplestorestate *funcResultStore;
 	TupleTableSlot *funcResultSlot;
+	bool 		funcResultStoreDonationEnabled;
+	bool 		funcResultStoreDonated;
+	struct PlanState *funcResultStoreDonationTarget;
 
 	/*
 	 * In some cases we need to compute a tuple descriptor for the function's
@@ -1647,6 +1653,7 @@ typedef struct SubqueryScanState
  *		funcstates			per-function execution states (private in
  *							nodeFunctionscan.c)
  *		argcontext			memory context to evaluate function arguments in
+ *		pending_srf_tuples	still evaluating any SRFs?
  * ----------------
  */
 struct FunctionScanPerFuncState;
@@ -1974,6 +1981,7 @@ typedef struct MaterialState
 	int			eflags;			/* capability flags to pass to tuplestore */
 	bool		eof_underlying; /* reached end of underlying plan? */
 	Tuplestorestate *tuplestorestate;
+	bool		tuplestore_donated; /* was duplestore donated by another node? */
 } MaterialState;
 
 /* ----------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..9df64fb325 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -513,7 +513,9 @@ typedef enum NodeTag
 	T_SupportRequestSelectivity,	/* in nodes/supportnodes.h */
 	T_SupportRequestCost,		/* in nodes/supportnodes.h */
 	T_SupportRequestRows,		/* in nodes/supportnodes.h */
-	T_SupportRequestIndexCondition	/* in nodes/supportnodes.h */
+	T_SupportRequestIndexCondition,	/* in nodes/supportnodes.h */
+	T_SRFScanPlan,
+	T_SRFScanState
 } NodeTag;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 32c0d87f80..07ae669e7a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -16,6 +16,7 @@
 
 #include "access/sdir.h"
 #include "access/stratnum.h"
+#include "access/tupdesc.h"
 #include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/lockoptions.h"
@@ -546,6 +547,14 @@ typedef struct TableFuncScan
 	TableFunc  *tablefunc;		/* table function node */
 } TableFuncScan;
 
+typedef struct SRFScanPlan {
+	Plan		plan;
+	Node		*funcexpr;
+	Node 		*rtfunc;
+	TupleDesc	funcResultDesc;		/* funciton output columns tuple descriptor */
+	bool		funcReturnsTuple;
+} SRFScanPlan;
+
 /* ----------------
  *		CteScan node
  * ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f457b5b150..ab8e222f3b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -514,13 +514,15 @@ order by 1, 2;
          ->  Function Scan on pg_catalog.generate_series s1
                Output: s1.s1
                Function Call: generate_series(1, 3)
+               ->  SRF Scan
          ->  HashAggregate
                Output: s2.s2, sum((s1.s1 + s2.s2))
                Group Key: s2.s2
                ->  Function Scan on pg_catalog.generate_series s2
                      Output: s2.s2
                      Function Call: generate_series(1, 3)
-(14 rows)
+                     ->  SRF Scan
+(16 rows)
 
 select s1, s2, sm
 from generate_series(1, 3) s1,
@@ -549,6 +551,7 @@ select array(select sum(x+y) s
  Function Scan on pg_catalog.generate_series x
    Output: (SubPlan 1)
    Function Call: generate_series(1, 3)
+   ->  SRF Scan
    SubPlan 1
      ->  Sort
            Output: (sum((x.x + y.y))), y.y
@@ -559,7 +562,8 @@ select array(select sum(x+y) s
                  ->  Function Scan on pg_catalog.generate_series y
                        Output: y.y
                        Function Call: generate_series(1, 3)
-(13 rows)
+                       ->  SRF Scan
+(15 rows)
 
 select array(select sum(x+y) s
             from generate_series(1,3) y group by y order by s)
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c88a..5eb7dba0a8 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -374,7 +374,8 @@ select g as alias1, g as alias2
    ->  Sort
          Sort Key: g
          ->  Function Scan on generate_series g
-(6 rows)
+               ->  SRF Scan
+(7 rows)
 
 select g as alias1, g as alias2
   from generate_series(1,3) g
@@ -1234,7 +1235,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(8 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(10 rows)
 
 select *
   from (values (1),(2)) v(x),
@@ -1358,7 +1361,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(10 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(12 rows)
 
 -- Verify that we correctly handle the child node returning a
 -- non-minimal slot, which happens if the input is pre-sorted,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dfd0ee414f..c339722149 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1684,6 +1684,7 @@ FROM generate_series(1, 3) g(i);
                            QUERY PLAN                           
 ----------------------------------------------------------------
  Function Scan on generate_series g
+   ->  SRF Scan
    SubPlan 1
      ->  Limit
            ->  Merge Append
@@ -1691,10 +1692,12 @@ FROM generate_series(1, 3) g(i);
                  ->  Sort
                        Sort Key: ((d.d + g.i))
                        ->  Function Scan on generate_series d
+                             ->  SRF Scan
                  ->  Sort
                        Sort Key: ((d_1.d + g.i))
                        ->  Function Scan on generate_series d_1
-(11 rows)
+                             ->  SRF Scan
+(14 rows)
 
 SELECT
     ARRAY(SELECT f.i FROM (
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..3650aeefe0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3403,7 +3403,8 @@ select * from mki8(1,2);
  Function Scan on mki8
    Output: q1, q2
    Function Call: '(1,2)'::int8_tbl
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from mki8(1,2);
  q1 | q2 
@@ -3418,7 +3419,8 @@ select * from mki4(42);
  Function Scan on mki4
    Output: f1
    Function Call: '(42)'::int4_tbl
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from mki4(42);
  f1 
@@ -3660,9 +3662,10 @@ left join unnest(v1ys) as u1(u1y) on u1y = v2y;
          Hash Cond: (u1.u1y = "*VALUES*_1".column2)
          Filter: ("*VALUES*_1".column1 = "*VALUES*".column1)
          ->  Function Scan on unnest u1
+               ->  SRF Scan
          ->  Hash
                ->  Values Scan on "*VALUES*_1"
-(8 rows)
+(9 rows)
 
 select * from
 (values (1, array[10,20]), (2, array[20,30])) as v1(v1x,v1ys)
@@ -4475,7 +4478,9 @@ select 1 from (select a.id FROM a left join b on a.b_id = b.id) q,
    ->  Seq Scan on a
    ->  Function Scan on generate_series gs
          Filter: (a.id = i)
-(4 rows)
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
 
 rollback;
 create temp table parent (k int primary key, pd int);
@@ -4814,7 +4819,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
@@ -4824,7 +4831,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
@@ -4835,7 +4844,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4846,12 +4857,13 @@ explain (costs off)
 ------------------------------------------
  Nested Loop
    ->  Function Scan on generate_series g
+         ->  SRF Scan
    ->  Append
          ->  Seq Scan on int8_tbl a
                Filter: (g.g = q1)
          ->  Seq Scan on int8_tbl b
                Filter: (g.g = q2)
-(7 rows)
+(8 rows)
 
 select * from generate_series(100,200) g,
   lateral (select * from int8_tbl a where g = q1 union all
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 0879c885eb..172a55d975 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -179,9 +179,10 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
  Hash Join
    Hash Cond: (g.g = a.unique1)
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Hash
          ->  Seq Scan on tenk1 a
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
@@ -189,7 +190,8 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
 -------------------------------------------------
  Nested Loop
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Index Scan using tenk1_unique1 on tenk1 a
          Index Cond: (unique1 = g.g)
-(4 rows)
+(5 rows)
 
diff --git a/src/test/regress/expected/pg_lsn.out b/src/test/regress/expected/pg_lsn.out
index 64d41dfdad..e68adc1f24 100644
--- a/src/test/regress/expected/pg_lsn.out
+++ b/src/test/regress/expected/pg_lsn.out
@@ -87,13 +87,17 @@ SELECT DISTINCT (i || '/' || j)::pg_lsn f
          Group Key: ((((i.i)::text || '/'::text) || (j.j)::text))::pg_lsn
          ->  Nested Loop
                ->  Function Scan on generate_series k
+                     ->  SRF Scan
                ->  Materialize
                      ->  Nested Loop
                            ->  Function Scan on generate_series j
                                  Filter: ((j > 0) AND (j <= 10))
+                                 ->  SRF Scan
                            ->  Function Scan on generate_series i
                                  Filter: (i <= 10)
-(12 rows)
+                                 ->  Materialize
+                                       ->  SRF Scan
+(16 rows)
 
 SELECT DISTINCT (i || '/' || j)::pg_lsn f
   FROM generate_series(1, 10) i,
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index cd2c79f4d5..67a6f39ae2 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3094,7 +3094,7 @@ select * from sc_test();
 
 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
@@ -4852,7 +4852,9 @@ select i, a from
    ->  Function Scan on public.consumes_rw_array i
          Output: i.i
          Function Call: consumes_rw_array((returns_rw_array(1)))
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)
 
 select i, a from
   (select returns_rw_array(1) as a offset 0) ss,
@@ -4869,7 +4871,8 @@ select consumes_rw_array(a), a from returns_rw_array(1) a;
  Function Scan on public.returns_rw_array a
    Output: consumes_rw_array(a), a
    Function Call: returns_rw_array(1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select consumes_rw_array(a), a from returns_rw_array(1) a;
  consumes_rw_array |   a   
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index a70060ba01..7f96baaee8 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -1841,7 +1841,8 @@ explain (verbose, costs off)
  Function Scan on public.array_to_set t
    Output: f1, f2
    Function Call: array_to_set('{one,two}'::text[])
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 -- but without, it can be:
 create or replace function array_to_set(anyarray) returns setof record as $$
@@ -1879,7 +1880,8 @@ explain (verbose, costs off)
  Function Scan on pg_catalog.generate_subscripts i
    Output: i.i, ('{one,two}'::text[])[i.i]
    Function Call: generate_subscripts('{one,two}'::text[], 1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 create temp table rngfunc(f1 int8, f2 int8);
 create function testrngfunc() returns record as $$
@@ -1950,7 +1952,8 @@ select * from testrngfunc();
  Function Scan on testrngfunc
    Output: f1, f2
    Function Call: '(7.136178,7.14)'::rngfunc_type
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from testrngfunc();
     f1    |  f2  
@@ -1982,7 +1985,8 @@ select * from testrngfunc();
  Function Scan on public.testrngfunc
    Output: f1, f2
    Function Call: testrngfunc()
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from testrngfunc();
     f1    |  f2  
@@ -2048,7 +2052,8 @@ select * from testrngfunc();
  Function Scan on public.testrngfunc
    Output: f1, f2
    Function Call: testrngfunc()
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from testrngfunc();
     f1    |  f2  
@@ -2217,7 +2222,9 @@ select x from int8_tbl, extractq2(int8_tbl) f(x);
    ->  Function Scan on f
          Output: f.x
          Function Call: int8_tbl.q2
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)
 
 select x from int8_tbl, extractq2(int8_tbl) f(x);
          x         
@@ -2306,3 +2313,155 @@ select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 (0 rows)
 
 drop type rngfunc2;
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+	SELECT i, nextval('rngfunc_vpc_seq')
+		FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+	for i in $1..$2 loop
+		return next (i, nextval('rngfunc_mat_seq'));
+	end loop;
+end;
+$$
+LANGUAGE plpgsql;
+-- A VPC SRF that is not part of a complex query should not materialize.
+-- 
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+	select * from rngfunc_vpc(1, 3) t;
+           QUERY PLAN           
+--------------------------------
+ Function Scan on rngfunc_vpc t
+   ->  SRF Scan
+(2 rows)
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+-- 
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+ nextval 
+---------
+       1
+(1 row)
+
+select * from rngfunc_vpc(1, 3) t limit 2;
+ i | s 
+---+---
+ 1 | 2
+ 2 | 3
+(2 rows)
+
+select nextval('rngfunc_vpc_seq');
+ nextval 
+---------
+       4
+(1 row)
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_vpc t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_mat t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_vpc t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=3 loops=1)
+                     SFRM: ValuePerCall
+(8 rows)
+
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_mat t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=0 loops=1)
+                     SFRM: Materialize
+                     Donated tuplestore: true
+(9 rows)
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+ nextval 
+---------
+       4
+(1 row)
+
+select * from rngfunc_mat(1, 3) t limit 2;
+ i | s 
+---+---
+ 1 | 5
+ 2 | 6
+(2 rows)
+
+select nextval('rngfunc_mat_seq');
+ nextval 
+---------
+       8
+(1 row)
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index fe1cd9deb0..9f6deff81e 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1669,8 +1669,9 @@ select * from test_tsquery, to_tsquery('new') q where txtsample @@ q;
  Nested Loop
    Join Filter: (test_tsquery.txtsample @@ q.q)
    ->  Function Scan on to_tsquery q
+         ->  SRF Scan
    ->  Seq Scan on test_tsquery
-(4 rows)
+(5 rows)
 
 -- to_tsquery(regconfig, text) is an immutable function.
 -- That allows us to get rid of using function scan and join at all.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6e72e92d80..6828582d1e 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -577,8 +577,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  HashAggregate
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)
 
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -588,9 +590,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)
 
 select from generate_series(1,5) union select from generate_series(1,3);
 --
@@ -626,8 +630,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  Unique
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)
 
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -637,9 +643,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)
 
 select from generate_series(1,5) union select from generate_series(1,3);
 --
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index d5fd4045f9..d2cd0b529f 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3851,7 +3851,8 @@ EXPLAIN (costs off) SELECT * FROM pg_temp.f(2);
          ->  Sort
                Sort Key: s.s
                ->  Function Scan on generate_series s
-(5 rows)
+                     ->  SRF Scan
+(6 rows)
 
 SELECT * FROM pg_temp.f(2);
     f    
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index d841d8c0f9..4717b069be 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2646,7 +2646,7 @@ select * from sc_test();
 
 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
diff --git a/src/test/regress/sql/rangefuncs.sql b/src/test/regress/sql/rangefuncs.sql
index 476b4f27e2..4d39f39b57 100644
--- a/src/test/regress/sql/rangefuncs.sql
+++ b/src/test/regress/sql/rangefuncs.sql
@@ -730,3 +730,82 @@ select *, row_to_json(u) from unnest(array[null::rngfunc2, (1,'foo')::rngfunc2,
 select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 
 drop type rngfunc2;
+
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+	SELECT i, nextval('rngfunc_vpc_seq')
+		FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+	for i in $1..$2 loop
+		return next (i, nextval('rngfunc_mat_seq'));
+	end loop;
+end;
+$$
+LANGUAGE plpgsql;
+
+-- A VPC SRF that is not part of a complex query should not materialize.
+-- 
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+	select * from rngfunc_vpc(1, 3) t;
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+-- 
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+select * from rngfunc_vpc(1, 3) t limit 2;
+select nextval('rngfunc_vpc_seq');
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+select * from rngfunc_mat(1, 3) t limit 2;
+select nextval('rngfunc_mat_seq');
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------
-- 
2.23.0

#19

Dent John

denty@QQdd.eu

almost 6 years ago

In reply to: Thomas Munro (#18)

Re: The flinfo->fn_extra question, from me this time.

On 28 Jan 2020, at 09:56, Thomas Munro <thomas.munro@gmail.com> wrote:

([…] I have no
idea what GUI interaction causes that, but most Apple Mail attachments
seem to be fine.)

I gathered from the other thread that posting plain text seems to attach the patches in a way that’s more acceptable. Seems to work, but doesn’t explain exactly what the issue is, and I’m pretty sure I’ve not always had to go via the “make plain text” menu item before.

Here's a quick rebase in case it helps. I mostly applied fine (see
below). The conflicts were just Makefile and expected output files,
which I tried to do the obvious thing with. I had to add a #include
"access/tupdesc.h" to plannodes.h to make something compile (because
it uses TupleDesc). Passes check-world here.

Thanks a lot for doing that. I tried it against 530609a, and indeed it seems to work.

I’m also watching the polymorphic table functions light thread[0]/messages/by-id/46a1cb32-e9c6-e7a8-f3c0-78e6b3f70cfe@2ndquadrant.com, which at first glance would also seems to make useful SRF RECORD-returning functions when employed in the SELECT list. It’s not doing what this patch does, but people might happy enough to transform their queries into SELECT … FROM (SELECT fn(…)) to achieve pipelining, at least in the short term.

[0]: /messages/by-id/46a1cb32-e9c6-e7a8-f3c0-78e6b3f70cfe@2ndquadrant.com

denty.

#20

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Dent John (#19)

1 attachment(s)

Re: The flinfo->fn_extra question, from me this time.

The cfbot is still not happy with this, because you're ignoring the
project style rule against C99-like mixing of code and declarations.
I went to fix that, and soon found that the code doesn't compile,
much less pass regression tests, with --enable-cassert. That's
really a serious error on your part: basically, nobody should ever
do backend code development in non-cassert builds, because there is
too much useful error checking you forego that way. (Performance
testing is a different matter ... but you need to make the code
work before you worry about speed.)

Anyway, attached is a marginal update that gets this to the point
where it should compile in the cfbot, but it'll still fail regression
tests there. (At least on the Linux side. I guess the cfbot's
Windows builds are sans cassert, which seems like an odd choice.)

I didn't want to spend any more effort on it than that, because I'm
not really on board with this line of attack. This patch seems
awfully invasive for what it's accomplishing, both at the code level
and in terms of what users will see in EXPLAIN. No, I don't think
that adding additional "SRF Scan" nodes below FunctionScan is an
improvement, nor do I like your repurposing/abusing of Materialize.
It might be okay if you were just using Materialize as-is, but if
it's sort-of-materialize-but-not-always, I don't think that's going
to make anyone less confused.

More locally, this business with creating new "plan nodes" below the
FunctionScan at executor startup is a real abuse of a whole lot of stuff,
and I suspect that it's not unrelated to the assertion failures I'm
seeing. Don't do that. If you want to build some data structures at
executor start, fine, but they're not plans and shouldn't be mislabeled as
that. On the other hand, if they do need to be plan nodes, they should be
made by the planner (which in turn would require a lot of infrastructure
you haven't built, eg copyfuncs/outfuncs/readfuncs/setrefs/...).

The v3 patch seemed closer to the sort of thing I was expecting
to get out of this (though I've not read it in any detail).

regards, tom lane

Attachments:

0001-pipeline-functionscan-v6.patchtext/x-diff; charset=us-ascii; name=0001-pipeline-functionscan-v6.patchDownload

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 1e743d7..86bb80a 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
+#include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -927,3 +928,53 @@ BuildDescFromLists(List *names, List *types, List *typmods, List *collations)
 
 	return desc;
 }
+
+/*
+ * Check that function result tuple type (src_tupdesc) matches or can
+ * be considered to match what the query expects (dst_tupdesc). If
+ * they don't match, ereport.
+ *
+ * We really only care about number of attributes and data type.
+ * Also, we can ignore type mismatch on columns that are dropped in the
+ * destination type, so long as the physical storage matches.  This is
+ * helpful in some cases involving out-of-date cached plans.
+ */
+void
+tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
+{
+	int			i;
+
+	if (dst_tupdesc->natts != src_tupdesc->natts)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATATYPE_MISMATCH),
+				 errmsg("function return row and query-specified return row do not match"),
+				 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
+								  "Returned row contains %d attributes, but query expects %d.",
+								  src_tupdesc->natts,
+								  src_tupdesc->natts, dst_tupdesc->natts)));
+
+	for (i = 0; i < dst_tupdesc->natts; i++)
+	{
+		Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
+		Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
+
+		if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
+			continue;			/* no worries */
+		if (!dattr->attisdropped)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("function return row and query-specified return row do not match"),
+					 errdetail("Returned type %s at ordinal position %d, but query expects %s.",
+							   format_type_be(sattr->atttypid),
+							   i + 1,
+							   format_type_be(dattr->atttypid))));
+
+		if (dattr->attlen != sattr->attlen ||
+			dattr->attalign != sattr->attalign)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("function return row and query-specified return row do not match"),
+					 errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
+							   i + 1)));
+	}
+}
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4..71e7ab5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -19,6 +19,8 @@
 #include "commands/defrem.h"
 #include "commands/prepare.h"
 #include "executor/nodeHash.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "nodes/extensible.h"
@@ -1182,6 +1184,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SubqueryScan:
 			pname = sname = "Subquery Scan";
 			break;
+		case T_SRFScanPlan:
+			pname = sname = "SRF Scan";
+			break;
 		case T_FunctionScan:
 			pname = sname = "Function Scan";
 			break;
@@ -1770,6 +1775,31 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				}
 			}
 			break;
+		case T_SRFScanPlan:
+			if (es->analyze)
+			{
+				SRFScanState *sss = (SRFScanState *) planstate;
+
+				if (sss->setexpr)
+				{
+					SetExprState *setexpr = (SetExprState *) sss->setexpr;
+					FunctionCallInfo fcinfo = setexpr->fcinfo;
+					ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+					if (rsinfo)
+					{
+						ExplainPropertyText("SFRM",
+							rsinfo->returnMode == SFRM_ValuePerCall ? "ValuePerCall" :
+								rsinfo->returnMode == SFRM_Materialize ? "Materialize" :
+									"Unknown",
+											es);
+
+						if (rsinfo->returnMode == SFRM_Materialize)
+							ExplainPropertyBool("Donated tuplestore",
+												setexpr->funcResultStoreDonated, es);
+					}
+				}
+			}
 		case T_FunctionScan:
 			if (es->verbose)
 			{
@@ -2002,6 +2032,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		IsA(plan, FunctionScan) ||
 		(IsA(planstate, CustomScanState) &&
 		 ((CustomScanState *) planstate)->custom_ps != NIL) ||
 		planstate->subPlan;
@@ -2026,6 +2057,17 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainNode(innerPlanState(planstate), ancestors,
 					"Inner", NULL, es);
 
+	/* FunctionScan subnodes */
+	if (IsA(planstate, FunctionScanState))
+		for(int i=0; i<((FunctionScanState *)planstate)->nfuncs; i++)
+		{
+			bool oldverbose = es->verbose;
+			es->verbose = false;
+			ExplainNode(&((FunctionScanState *)planstate)->funcstates[i].scanstate->ps,
+						ancestors, "Function", NULL, es);
+			es->verbose = oldverbose;
+		}
+
 	/* special child plans */
 	switch (nodeTag(plan))
 	{
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index a983800..9dae142 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -65,6 +65,7 @@ OBJS = \
 	nodeSort.o \
 	nodeSubplan.o \
 	nodeSubqueryscan.o \
+	nodeSRFScan.o \
 	nodeTableFuncscan.o \
 	nodeTidscan.o \
 	nodeUnique.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index b12aeb3..07ccca75 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -25,6 +25,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -204,6 +205,10 @@ ExecReScan(PlanState *node)
 			ExecReScanFunctionScan((FunctionScanState *) node);
 			break;
 
+		case T_SRFScanState:
+			ExecReScanSRF((SRFScanState *) node);
+			break;
+
 		case T_TableFuncScanState:
 			ExecReScanTableFuncScan((TableFuncScanState *) node);
 			break;
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 7b2e84f..da39593 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -83,6 +83,7 @@
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "executor/nodeGather.h"
 #include "executor/nodeGatherMerge.h"
 #include "executor/nodeGroup.h"
@@ -252,6 +253,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_SRFScanPlan:
+			result = (PlanState *) ExecInitSRFScan((SRFScanPlan *) node,
+														 estate, eflags);
+			break;
+
 		case T_ValuesScan:
 			result = (PlanState *) ExecInitValuesScan((ValuesScan *) node,
 													  estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndFunctionScan((FunctionScanState *) node);
 			break;
 
+		case T_SRFScanState:
+			ExecEndSRFScan((SRFScanState *) node);
+			break;
+
 		case T_TableFuncScanState:
 			ExecEndTableFuncScan((TableFuncScanState *) node);
 			break;
diff --git a/src/backend/executor/execSRF.c b/src/backend/executor/execSRF.c
index 2312cc7..e29296a 100644
--- a/src/backend/executor/execSRF.c
+++ b/src/backend/executor/execSRF.c
@@ -21,6 +21,9 @@
 #include "access/htup_details.h"
 #include "catalog/objectaccess.h"
 #include "executor/execdebug.h"
+#include "executor/nodeMaterial.h"
+#include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
@@ -44,17 +47,17 @@ static void ExecPrepareTuplestoreResult(SetExprState *sexpr,
 										ExprContext *econtext,
 										Tuplestorestate *resultStore,
 										TupleDesc resultDesc);
-static void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
 
 
 /*
- * Prepare function call in FROM (ROWS FROM) for execution.
+ * Prepare function call in FROM (ROWS FROM) or targetlist SRF function
+ * call for execution for execution.
  *
- * This is used by nodeFunctionscan.c.
+ * This is used by nodeFunctionscan.c and nodeProjectSet.c.
  */
 SetExprState *
-ExecInitTableFunctionResult(Expr *expr,
-							ExprContext *econtext, PlanState *parent)
+ExecInitFunctionResultSet(Expr *expr,
+						  ExprContext *econtext, PlanState *parent)
 {
 	SetExprState *state = makeNode(SetExprState);
 
@@ -62,402 +65,54 @@ ExecInitTableFunctionResult(Expr *expr,
 	state->expr = expr;
 	state->func.fn_oid = InvalidOid;
 
-	/*
-	 * Normally the passed expression tree will be a FuncExpr, since the
-	 * grammar only allows a function call at the top level of a table
-	 * function reference.  However, if the function doesn't return set then
-	 * the planner might have replaced the function call via constant-folding
-	 * or inlining.  So if we see any other kind of expression node, execute
-	 * it via the general ExecEvalExpr() code.  That code path will not
-	 * support set-returning functions buried in the expression, though.
-	 */
 	if (IsA(expr, FuncExpr))
 	{
+		/*
+		 * For a FunctionScan or ProjectSet, the passed expression tree can be a
+		 * FuncExpr, since the grammar only allows a function call at the top
+		 * level of a table function reference.
+		 */
 		FuncExpr   *func = (FuncExpr *) expr;
 
 		state->funcReturnsSet = func->funcretset;
 		state->args = ExecInitExprList(func->args, parent);
-
 		init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, func->funcretset, false);
+				   econtext->ecxt_per_query_memory, func->funcretset, true);
 	}
-	else
-	{
-		state->elidedFuncState = ExecInitExpr(expr, parent);
-	}
-
-	return state;
-}
-
-/*
- *		ExecMakeTableFunctionResult
- *
- * Evaluate a table function, producing a materialized result in a Tuplestore
- * object.
- *
- * This is used by nodeFunctionscan.c.
- */
-Tuplestorestate *
-ExecMakeTableFunctionResult(SetExprState *setexpr,
-							ExprContext *econtext,
-							MemoryContext argContext,
-							TupleDesc expectedDesc,
-							bool randomAccess)
-{
-	Tuplestorestate *tupstore = NULL;
-	TupleDesc	tupdesc = NULL;
-	Oid			funcrettype;
-	bool		returnsTuple;
-	bool		returnsSet = false;
-	FunctionCallInfo fcinfo;
-	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
-	HeapTupleData tmptup;
-	MemoryContext callerContext;
-	MemoryContext oldcontext;
-	bool		first_time = true;
-
-	callerContext = CurrentMemoryContext;
-
-	funcrettype = exprType((Node *) setexpr->expr);
-
-	returnsTuple = type_is_rowtype(funcrettype);
-
-	/*
-	 * Prepare a resultinfo node for communication.  We always do this even if
-	 * not expecting a set result, so that we can pass expectedDesc.  In the
-	 * generic-expression case, the expression doesn't actually get to see the
-	 * resultinfo, but set it up anyway because we use some of the fields as
-	 * our own state variables.
-	 */
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = expectedDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize | SFRM_Materialize_Preferred);
-	if (randomAccess)
-		rsinfo.allowedModes |= (int) SFRM_Materialize_Random;
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
-	fcinfo = palloc(SizeForFunctionCallInfo(list_length(setexpr->args)));
-
-	/*
-	 * Normally the passed expression tree will be a SetExprState, since the
-	 * grammar only allows a function call at the top level of a table
-	 * function reference.  However, if the function doesn't return set then
-	 * the planner might have replaced the function call via constant-folding
-	 * or inlining.  So if we see any other kind of expression node, execute
-	 * it via the general ExecEvalExpr() code; the only difference is that we
-	 * don't get a chance to pass a special ReturnSetInfo to any functions
-	 * buried in the expression.
-	 */
-	if (!setexpr->elidedFuncState)
+	else if (IsA(expr, OpExpr))
 	{
 		/*
-		 * This path is similar to ExecMakeFunctionResultSet.
-		 */
-		returnsSet = setexpr->funcReturnsSet;
-		InitFunctionCallInfoData(*fcinfo, &(setexpr->func),
-								 list_length(setexpr->args),
-								 setexpr->fcinfo->fncollation,
-								 NULL, (Node *) &rsinfo);
-
-		/*
-		 * Evaluate the function's argument list.
-		 *
-		 * We can't do this in the per-tuple context: the argument values
-		 * would disappear when we reset that context in the inner loop.  And
-		 * the caller's CurrentMemoryContext is typically a query-lifespan
-		 * context, so we don't want to leak memory there.  We require the
-		 * caller to pass a separate memory context that can be used for this,
-		 * and can be reset each time through to avoid bloat.
-		 */
-		MemoryContextReset(argContext);
-		oldcontext = MemoryContextSwitchTo(argContext);
-		ExecEvalFuncArgs(fcinfo, setexpr->args, econtext);
-		MemoryContextSwitchTo(oldcontext);
-
-		/*
-		 * If function is strict, and there are any NULL arguments, skip
-		 * calling the function and act like it returned NULL (or an empty
-		 * set, in the returns-set case).
+		 * For ProjectSet, the expression node could be an OpExpr.
 		 */
-		if (setexpr->func.fn_strict)
-		{
-			int			i;
+		OpExpr	   *op = (OpExpr *) expr;
 
-			for (i = 0; i < fcinfo->nargs; i++)
-			{
-				if (fcinfo->args[i].isnull)
-					goto no_function_result;
-			}
-		}
+		state->funcReturnsSet = op->opretset;
+		state->args = ExecInitExprList(op->args, parent);
+		init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
+				   econtext->ecxt_per_query_memory, op->opretset, true);
 	}
 	else
 	{
-		/* Treat setexpr as a generic expression */
-		InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL);
-	}
-
-	/*
-	 * Switch to short-lived context for calling the function or expression.
-	 */
-	MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-
-	/*
-	 * Loop to handle the ValuePerCall protocol (which is also the same
-	 * behavior needed in the generic ExecEvalExpr path).
-	 */
-	for (;;)
-	{
-		Datum		result;
-
-		CHECK_FOR_INTERRUPTS();
-
 		/*
-		 * reset per-tuple memory context before each call of the function or
-		 * expression. This cleans up any local memory the function may leak
-		 * when called.
+		 * However, again for FunctionScan, if the function doesn't return set
+		 * then the planner might have replaced the function call via constant-
+		 * folding or inlining.  So if we see any other kind of expression node,
+		 * execute it via the general ExecEvalExpr() code.  That code path will
+		 * not support set-returning functions buried in the expression, though.
 		 */
-		ResetExprContext(econtext);
-
-		/* Call the function or expression one time */
-		if (!setexpr->elidedFuncState)
-		{
-			pgstat_init_function_usage(fcinfo, &fcusage);
-
-			fcinfo->isnull = false;
-			rsinfo.isDone = ExprSingleResult;
-			result = FunctionCallInvoke(fcinfo);
-
-			pgstat_end_function_usage(&fcusage,
-									  rsinfo.isDone != ExprMultipleResult);
-		}
-		else
-		{
-			result =
-				ExecEvalExpr(setexpr->elidedFuncState, econtext, &fcinfo->isnull);
-			rsinfo.isDone = ExprSingleResult;
-		}
-
-		/* Which protocol does function want to use? */
-		if (rsinfo.returnMode == SFRM_ValuePerCall)
-		{
-			/*
-			 * Check for end of result set.
-			 */
-			if (rsinfo.isDone == ExprEndResult)
-				break;
-
-			/*
-			 * If first time through, build tuplestore for result.  For a
-			 * scalar function result type, also make a suitable tupdesc.
-			 */
-			if (first_time)
-			{
-				oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-				tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-				rsinfo.setResult = tupstore;
-				if (!returnsTuple)
-				{
-					tupdesc = CreateTemplateTupleDesc(1);
-					TupleDescInitEntry(tupdesc,
-									   (AttrNumber) 1,
-									   "column",
-									   funcrettype,
-									   -1,
-									   0);
-					rsinfo.setDesc = tupdesc;
-				}
-				MemoryContextSwitchTo(oldcontext);
-			}
-
-			/*
-			 * Store current resultset item.
-			 */
-			if (returnsTuple)
-			{
-				if (!fcinfo->isnull)
-				{
-					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
-
-					if (tupdesc == NULL)
-					{
-						/*
-						 * This is the first non-NULL result from the
-						 * function.  Use the type info embedded in the
-						 * rowtype Datum to look up the needed tupdesc.  Make
-						 * a copy for the query.
-						 */
-						oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-						tupdesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td),
-															  HeapTupleHeaderGetTypMod(td));
-						rsinfo.setDesc = tupdesc;
-						MemoryContextSwitchTo(oldcontext);
-					}
-					else
-					{
-						/*
-						 * Verify all later returned rows have same subtype;
-						 * necessary in case the type is RECORD.
-						 */
-						if (HeapTupleHeaderGetTypeId(td) != tupdesc->tdtypeid ||
-							HeapTupleHeaderGetTypMod(td) != tupdesc->tdtypmod)
-							ereport(ERROR,
-									(errcode(ERRCODE_DATATYPE_MISMATCH),
-									 errmsg("rows returned by function are not all of the same row type")));
-					}
-
-					/*
-					 * tuplestore_puttuple needs a HeapTuple not a bare
-					 * HeapTupleHeader, but it doesn't need all the fields.
-					 */
-					tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
-					tmptup.t_data = td;
-
-					tuplestore_puttuple(tupstore, &tmptup);
-				}
-				else
-				{
-					/*
-					 * NULL result from a tuple-returning function; expand it
-					 * to a row of all nulls.  We rely on the expectedDesc to
-					 * form such rows.  (Note: this would be problematic if
-					 * tuplestore_putvalues saved the tdtypeid/tdtypmod from
-					 * the provided descriptor, since that might not match
-					 * what we get from the function itself.  But it doesn't.)
-					 */
-					int			natts = expectedDesc->natts;
-					bool	   *nullflags;
-
-					nullflags = (bool *) palloc(natts * sizeof(bool));
-					memset(nullflags, true, natts * sizeof(bool));
-					tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-				}
-			}
-			else
-			{
-				/* Scalar-type case: just store the function result */
-				tuplestore_putvalues(tupstore, tupdesc, &result, &fcinfo->isnull);
-			}
-
-			/*
-			 * Are we done?
-			 */
-			if (rsinfo.isDone != ExprMultipleResult)
-				break;
-		}
-		else if (rsinfo.returnMode == SFRM_Materialize)
-		{
-			/* check we're on the same page as the function author */
-			if (!first_time || rsinfo.isDone != ExprSingleResult)
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-						 errmsg("table-function protocol for materialize mode was not followed")));
-			/* Done evaluating the set result */
-			break;
-		}
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
-					 errmsg("unrecognized table-function returnMode: %d",
-							(int) rsinfo.returnMode)));
-
-		first_time = false;
-	}
-
-no_function_result:
-
-	/*
-	 * If we got nothing from the function (ie, an empty-set or NULL result),
-	 * we have to create the tuplestore to return, and if it's a
-	 * non-set-returning function then insert a single all-nulls row.  As
-	 * above, we depend on the expectedDesc to manufacture the dummy row.
-	 */
-	if (rsinfo.setResult == NULL)
-	{
-		MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
-		tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-		rsinfo.setResult = tupstore;
-		if (!returnsSet)
-		{
-			int			natts = expectedDesc->natts;
-			bool	   *nullflags;
-
-			MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
-			nullflags = (bool *) palloc(natts * sizeof(bool));
-			memset(nullflags, true, natts * sizeof(bool));
-			tuplestore_putvalues(tupstore, expectedDesc, NULL, nullflags);
-		}
-	}
-
-	/*
-	 * If function provided a tupdesc, cross-check it.  We only really need to
-	 * do this for functions returning RECORD, but might as well do it always.
-	 */
-	if (rsinfo.setDesc)
-	{
-		tupledesc_match(expectedDesc, rsinfo.setDesc);
-
-		/*
-		 * If it is a dynamically-allocated TupleDesc, free it: it is
-		 * typically allocated in a per-query context, so we must avoid
-		 * leaking it across multiple usages.
-		 */
-		if (rsinfo.setDesc->tdrefcount == -1)
-			FreeTupleDesc(rsinfo.setDesc);
-	}
-
-	MemoryContextSwitchTo(callerContext);
-
-	/* All done, pass back the tuplestore */
-	return rsinfo.setResult;
-}
-
+		MemoryContext oldcontext;
 
-/*
- * Prepare targetlist SRF function call for execution.
- *
- * This is used by nodeProjectSet.c.
- */
-SetExprState *
-ExecInitFunctionResultSet(Expr *expr,
-						  ExprContext *econtext, PlanState *parent)
-{
-	SetExprState *state = makeNode(SetExprState);
+		state->elidedFuncState = ExecInitExpr(expr, parent);
 
-	state->funcReturnsSet = true;
-	state->expr = expr;
-	state->func.fn_oid = InvalidOid;
+		oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
 
-	/*
-	 * Initialize metadata.  The expression node could be either a FuncExpr or
-	 * an OpExpr.
-	 */
-	if (IsA(expr, FuncExpr))
-	{
-		FuncExpr   *func = (FuncExpr *) expr;
+		/* By performing InitFunctionCallInfoData here, we avoid palloc0() */
+		state->fcinfo = palloc(SizeForFunctionCallInfo(list_length(state->args)));
 
-		state->args = ExecInitExprList(func->args, parent);
-		init_sexpr(func->funcid, func->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, true, true);
-	}
-	else if (IsA(expr, OpExpr))
-	{
-		OpExpr	   *op = (OpExpr *) expr;
+		MemoryContextSwitchTo(oldcontext);
 
-		state->args = ExecInitExprList(op->args, parent);
-		init_sexpr(op->opfuncid, op->inputcollid, expr, state, parent,
-				   econtext->ecxt_per_query_memory, true, true);
+		InitFunctionCallInfoData(*state->fcinfo, NULL, 0, InvalidOid, NULL, NULL);
 	}
-	else
-		elog(ERROR, "unrecognized node type: %d",
-			 (int) nodeTag(expr));
-
-	/* shouldn't get here unless the selected function returns set */
-	Assert(state->func.fn_retset);
 
 	return state;
 }
@@ -473,7 +128,7 @@ ExecInitFunctionResultSet(Expr *expr,
  * needs to live until all rows have been returned (i.e. *isDone set to
  * ExprEndResult or ExprSingleResult).
  *
- * This is used by nodeProjectSet.c.
+ * This is used by nodeProjectSet.c and nodeFunctionscan.c.
  */
 Datum
 ExecMakeFunctionResultSet(SetExprState *fcache,
@@ -486,7 +141,7 @@ ExecMakeFunctionResultSet(SetExprState *fcache,
 	Datum		result;
 	FunctionCallInfo fcinfo;
 	PgStat_FunctionCallUsage fcusage;
-	ReturnSetInfo rsinfo;
+	ReturnSetInfo *rsinfo;
 	bool		callit;
 	int			i;
 
@@ -540,6 +195,28 @@ restart:
 	}
 
 	/*
+	 * Prepare a resultinfo node for communication.  We always do this even if
+	 * not expecting a set result, so that we can pass expectedDesc.  In the
+	 * generic-expression case, the expression doesn't actually get to see the
+	 * resultinfo, but set it up anyway because we use some of the fields as
+	 * our own state variables.
+	 */
+	fcinfo = fcache->fcinfo;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	if (rsinfo == NULL)
+	{
+		MemoryContext oldContext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+
+		rsinfo = makeNode (ReturnSetInfo);
+		rsinfo->econtext = econtext;
+		rsinfo->expectedDesc = fcache->funcResultDesc;
+		fcinfo->resultinfo = (Node *) rsinfo;
+
+		MemoryContextSwitchTo(oldContext);
+	}
+
+	/*
 	 * arguments is a list of expressions to evaluate before passing to the
 	 * function manager.  We skip the evaluation if it was already done in the
 	 * previous call (ie, we are continuing the evaluation of a set-valued
@@ -549,7 +226,6 @@ restart:
 	 * rows from this SRF have been returned, otherwise ValuePerCall SRFs
 	 * would reference freed memory after the first returned row.
 	 */
-	fcinfo = fcache->fcinfo;
 	arguments = fcache->args;
 	if (!fcache->setArgsValid)
 	{
@@ -557,6 +233,14 @@ restart:
 
 		ExecEvalFuncArgs(fcinfo, arguments, econtext);
 		MemoryContextSwitchTo(oldContext);
+
+		/* Reset the rsinfo structure */
+		rsinfo->allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
+		/* note we do not set SFRM_Materialize_Random or _Preferred */
+		rsinfo->returnMode = SFRM_ValuePerCall;
+		/* isDone is filled below */
+		rsinfo->setResult = NULL;
+		rsinfo->setDesc = NULL;
 	}
 	else
 	{
@@ -568,18 +252,6 @@ restart:
 	 * Now call the function, passing the evaluated parameter values.
 	 */
 
-	/* Prepare a resultinfo node for communication. */
-	fcinfo->resultinfo = (Node *) &rsinfo;
-	rsinfo.type = T_ReturnSetInfo;
-	rsinfo.econtext = econtext;
-	rsinfo.expectedDesc = fcache->funcResultDesc;
-	rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
-	/* note we do not set SFRM_Materialize_Random or _Preferred */
-	rsinfo.returnMode = SFRM_ValuePerCall;
-	/* isDone is filled below */
-	rsinfo.setResult = NULL;
-	rsinfo.setDesc = NULL;
-
 	/*
 	 * If function is strict, and there are any NULL arguments, skip calling
 	 * the function.
@@ -599,16 +271,25 @@ restart:
 
 	if (callit)
 	{
-		pgstat_init_function_usage(fcinfo, &fcusage);
+		if (!fcache->elidedFuncState)
+		{
+			pgstat_init_function_usage(fcinfo, &fcusage);
 
-		fcinfo->isnull = false;
-		rsinfo.isDone = ExprSingleResult;
-		result = FunctionCallInvoke(fcinfo);
-		*isNull = fcinfo->isnull;
-		*isDone = rsinfo.isDone;
+			fcinfo->isnull = false;
+			rsinfo->isDone = ExprSingleResult;
+			result = FunctionCallInvoke(fcinfo);
+			*isNull = fcinfo->isnull;
+			*isDone = rsinfo->isDone;
 
-		pgstat_end_function_usage(&fcusage,
-								  rsinfo.isDone != ExprMultipleResult);
+			pgstat_end_function_usage(&fcusage,
+									  rsinfo->isDone != ExprMultipleResult);
+		}
+		else
+		{
+			result =
+				ExecEvalExpr(fcache->elidedFuncState, econtext, isNull);
+			*isDone = ExprSingleResult;
+		}
 	}
 	else
 	{
@@ -619,11 +300,32 @@ restart:
 	}
 
 	/* Which protocol does function want to use? */
-	if (rsinfo.returnMode == SFRM_ValuePerCall)
+	if (rsinfo->returnMode == SFRM_ValuePerCall)
 	{
 		if (*isDone != ExprEndResult)
 		{
 			/*
+			 * Obtain a suitable tupdesc, when we first encounter a non-NULL result.
+			 */
+			if (rsinfo->setDesc == NULL)
+			{
+				if (fcache->funcReturnsTuple && !*isNull)
+				{
+					HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+
+					/*
+					 * This is the first non-NULL result from the
+					 * function.  Use the type info embedded in the
+					 * rowtype Datum to look up the needed tupdesc.  Make
+					 * a copy for the query.
+					 */
+					MemoryContext oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_query_memory);
+					rsinfo->setDesc = lookup_rowtype_tupdesc_copy(HeapTupleHeaderGetTypeId(td), HeapTupleHeaderGetTypMod(td));
+					MemoryContextSwitchTo(oldcontext);
+				}
+			}
+
+			/*
 			 * Save the current argument values to re-use on the next call.
 			 */
 			if (*isDone == ExprMultipleResult)
@@ -640,21 +342,34 @@ restart:
 			}
 		}
 	}
-	else if (rsinfo.returnMode == SFRM_Materialize)
+	else if (rsinfo->returnMode == SFRM_Materialize)
 	{
 		/* check we're on the same page as the function author */
-		if (rsinfo.isDone != ExprSingleResult)
+		if (rsinfo->isDone != ExprSingleResult)
 			ereport(ERROR,
 					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 					 errmsg("table-function protocol for materialize mode was not followed")));
-		if (rsinfo.setResult != NULL)
+		if (rsinfo->setResult != NULL)
 		{
 			/* prepare to return values from the tuplestore */
 			ExecPrepareTuplestoreResult(fcache, econtext,
-										rsinfo.setResult,
-										rsinfo.setDesc);
-			/* loop back to top to start returning from tuplestore */
-			goto restart;
+										rsinfo->setResult,
+										rsinfo->setDesc);
+
+			/*
+			 * If we are being invoked by a Materialize node, attempt
+			 * to donate the returned tuplstore to it.
+			 */
+			if (ExecSRFDonateResultTuplestore(fcache))
+			{
+				*isDone = ExprMultipleResult;
+				return 0;
+			}
+			else
+			{
+				/* loop back to top to start returning from tuplestore */
+				goto restart;
+			}
 		}
 		/* if setResult was left null, treat it as empty set */
 		*isDone = ExprEndResult;
@@ -665,7 +380,7 @@ restart:
 		ereport(ERROR,
 				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
 				 errmsg("unrecognized table-function returnMode: %d",
-						(int) rsinfo.returnMode)));
+						(int) rsinfo->returnMode)));
 
 	return result;
 }
@@ -712,6 +427,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	InitFunctionCallInfoData(*sexpr->fcinfo, &(sexpr->func),
 							 numargs,
 							 input_collation, NULL, NULL);
+	sexpr->fcinfo->resultinfo = NULL;
 
 	/* If function returns set, check if that's allowed by caller */
 	if (sexpr->func.fn_retset && !allowSRF)
@@ -782,6 +498,7 @@ init_sexpr(Oid foid, Oid input_collation, Expr *node,
 	sexpr->funcResultStore = NULL;
 	sexpr->funcResultSlot = NULL;
 	sexpr->shutdown_reg = false;
+	sexpr->funcResultStoreDonationEnabled = false;
 }
 
 /*
@@ -792,6 +509,7 @@ static void
 ShutdownSetExpr(Datum arg)
 {
 	SetExprState *sexpr = castNode(SetExprState, DatumGetPointer(arg));
+	ReturnSetInfo *rsinfo = castNode(ReturnSetInfo, sexpr->fcinfo->resultinfo);
 
 	/* If we have a slot, make sure it's let go of any tuplestore pointer */
 	if (sexpr->funcResultSlot)
@@ -802,6 +520,13 @@ ShutdownSetExpr(Datum arg)
 		tuplestore_end(sexpr->funcResultStore);
 	sexpr->funcResultStore = NULL;
 
+	/* Release the ReturnSetInfo structure */
+	if (rsinfo != NULL)
+	{
+		pfree(rsinfo);
+		sexpr->fcinfo->resultinfo = NULL;
+	}
+
 	/* Clear any active set-argument state */
 	sexpr->setArgsValid = false;
 
@@ -910,53 +635,3 @@ ExecPrepareTuplestoreResult(SetExprState *sexpr,
 		sexpr->shutdown_reg = true;
 	}
 }
-
-/*
- * Check that function result tuple type (src_tupdesc) matches or can
- * be considered to match what the query expects (dst_tupdesc). If
- * they don't match, ereport.
- *
- * We really only care about number of attributes and data type.
- * Also, we can ignore type mismatch on columns that are dropped in the
- * destination type, so long as the physical storage matches.  This is
- * helpful in some cases involving out-of-date cached plans.
- */
-static void
-tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc)
-{
-	int			i;
-
-	if (dst_tupdesc->natts != src_tupdesc->natts)
-		ereport(ERROR,
-				(errcode(ERRCODE_DATATYPE_MISMATCH),
-				 errmsg("function return row and query-specified return row do not match"),
-				 errdetail_plural("Returned row contains %d attribute, but query expects %d.",
-								  "Returned row contains %d attributes, but query expects %d.",
-								  src_tupdesc->natts,
-								  src_tupdesc->natts, dst_tupdesc->natts)));
-
-	for (i = 0; i < dst_tupdesc->natts; i++)
-	{
-		Form_pg_attribute dattr = TupleDescAttr(dst_tupdesc, i);
-		Form_pg_attribute sattr = TupleDescAttr(src_tupdesc, i);
-
-		if (IsBinaryCoercible(sattr->atttypid, dattr->atttypid))
-			continue;			/* no worries */
-		if (!dattr->attisdropped)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("function return row and query-specified return row do not match"),
-					 errdetail("Returned type %s at ordinal position %d, but query expects %s.",
-							   format_type_be(sattr->atttypid),
-							   i + 1,
-							   format_type_be(dattr->atttypid))));
-
-		if (dattr->attlen != sattr->attlen ||
-			dattr->attalign != sattr->attalign)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("function return row and query-specified return row do not match"),
-					 errdetail("Physical storage mismatch on dropped attribute at ordinal position %d.",
-							   i + 1)));
-	}
-}
diff --git a/src/backend/executor/nodeFunctionscan.c b/src/backend/executor/nodeFunctionscan.c
index ccb66ce..7499705 100644
--- a/src/backend/executor/nodeFunctionscan.c
+++ b/src/backend/executor/nodeFunctionscan.c
@@ -1,7 +1,23 @@
 /*-------------------------------------------------------------------------
  *
  * nodeFunctionscan.c
- *	  Support routines for scanning RangeFunctions (functions in rangetable).
+ *	  Coordinates a scan over PL functions. It supports several use cases:
+ *
+ *      - single function scan, and multiple functions in ROWS FROM;
+ *      - SRFs and regular functions;
+ *      - tuple- and scalar-returning functions;
+ *      - it will materialise if eflags call for it;
+ *      - if possible, it will pipeline its output;
+ *      - it avoids double-materialisation in case of SFRM_Materialize.
+ *
+ *    To achieve these, it depends upon the Materialize (for materialisation
+ *    and pipelining) and SRFScan (for SRF handling, and tuple expansion,
+ *    and double-materialisation avoidance) nodes, and the actual function
+ *    invocation (for SRF- and regular functions alike) is done in execSRF.c.
+ *
+ *    The Planner knows nothing of the Materialize and SRFScan structures.
+ *    They are constructed by the Executor at execution time, and are reported
+ *    in the EXPLAIN output.
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -24,26 +40,15 @@
 
 #include "catalog/pg_type.h"
 #include "executor/nodeFunctionscan.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
 #include "funcapi.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
-
-
-/*
- * Runtime data for each function being scanned.
- */
-typedef struct FunctionScanPerFuncState
-{
-	SetExprState *setexpr;		/* state of the expression being evaluated */
-	TupleDesc	tupdesc;		/* desc of the function result type */
-	int			colcount;		/* expected number of result columns */
-	Tuplestorestate *tstore;	/* holds the function result set */
-	int64		rowcount;		/* # of rows in result set, -1 if not known */
-	TupleTableSlot *func_slot;	/* function result slot (or NULL) */
-} FunctionScanPerFuncState;
-
-static TupleTableSlot *FunctionNext(FunctionScanState *node);
+#include "utils/syscache.h"
 
 
 /* ----------------------------------------------------------------
@@ -82,37 +87,22 @@ FunctionNext(FunctionScanState *node)
 		 * into the scan result slot. No need to update ordinality or
 		 * rowcounts either.
 		 */
-		Tuplestorestate *tstore = node->funcstates[0].tstore;
+		TupleTableSlot *rs = node->funcstates[0].scanstate->ps.ps_ResultTupleSlot;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
+		 * Get the next tuple from the Scan node.
+		 *
+		 * If we have a rowcount for the function, and we know the previous
+		 * read position was out of bounds, don't try the read. This allows
+		 * backward scan to work when there are mixed row counts present.
 		 */
-		if (tstore == NULL)
-		{
-			node->funcstates[0].tstore = tstore =
-				ExecMakeTableFunctionResult(node->funcstates[0].setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											node->funcstates[0].tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
+		rs = ExecProcNode(&node->funcstates[0].scanstate->ps);
 
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(tstore);
-		}
+		if (TupIsNull(rs))
+			return NULL;
+
+		ExecCopySlot(scanslot, rs);
 
-		/*
-		 * Get the next tuple from tuplestore.
-		 */
-		(void) tuplestore_gettupleslot(tstore,
-									   ScanDirectionIsForward(direction),
-									   false,
-									   scanslot);
 		return scanslot;
 	}
 
@@ -141,46 +131,22 @@ FunctionNext(FunctionScanState *node)
 	for (funcno = 0; funcno < node->nfuncs; funcno++)
 	{
 		FunctionScanPerFuncState *fs = &node->funcstates[funcno];
+		TupleTableSlot *func_slot = fs->scanstate->ps.ps_ResultTupleSlot;
 		int			i;
 
 		/*
-		 * If first time through, read all tuples from function and put them
-		 * in a tuplestore. Subsequent calls just fetch tuples from
-		 * tuplestore.
-		 */
-		if (fs->tstore == NULL)
-		{
-			fs->tstore =
-				ExecMakeTableFunctionResult(fs->setexpr,
-											node->ss.ps.ps_ExprContext,
-											node->argcontext,
-											fs->tupdesc,
-											node->eflags & EXEC_FLAG_BACKWARD);
-
-			/*
-			 * paranoia - cope if the function, which may have constructed the
-			 * tuplestore itself, didn't leave it pointing at the start. This
-			 * call is fast, so the overhead shouldn't be an issue.
-			 */
-			tuplestore_rescan(fs->tstore);
-		}
-
-		/*
-		 * Get the next tuple from tuplestore.
+		 * Get the next tuple from the Scan node.
 		 *
 		 * If we have a rowcount for the function, and we know the previous
 		 * read position was out of bounds, don't try the read. This allows
 		 * backward scan to work when there are mixed row counts present.
 		 */
 		if (fs->rowcount != -1 && fs->rowcount < oldpos)
-			ExecClearTuple(fs->func_slot);
+			ExecClearTuple(func_slot);
 		else
-			(void) tuplestore_gettupleslot(fs->tstore,
-										   ScanDirectionIsForward(direction),
-										   false,
-										   fs->func_slot);
+			func_slot = ExecProcNode(&fs->scanstate->ps);
 
-		if (TupIsNull(fs->func_slot))
+		if (TupIsNull(func_slot))
 		{
 			/*
 			 * If we ran out of data for this function in the forward
@@ -207,12 +173,12 @@ FunctionNext(FunctionScanState *node)
 			/*
 			 * we have a result, so just copy it to the result cols.
 			 */
-			slot_getallattrs(fs->func_slot);
+			slot_getallattrs(func_slot);
 
 			for (i = 0; i < fs->colcount; i++)
 			{
-				scanslot->tts_values[att] = fs->func_slot->tts_values[i];
-				scanslot->tts_isnull[att] = fs->func_slot->tts_isnull[i];
+				scanslot->tts_values[att] = func_slot->tts_values[i];
+				scanslot->tts_isnull[att] = func_slot->tts_isnull[i];
 				att++;
 			}
 
@@ -272,6 +238,53 @@ ExecFunctionScan(PlanState *pstate)
 					(ExecScanRecheckMtd) FunctionRecheck);
 }
 
+/*
+ * Helper function to build target list, which is required in order for
+ * normal processing of ExecInit, from the tupdesc.
+ */
+static void
+build_tlist_for_tupdesc(TupleDesc tupdesc, int colcount,
+						List **mat_tlist, List **scan_tlist)
+{
+	Form_pg_attribute attr;
+	int attno;
+
+	for (attno = 1; attno <= colcount; attno++)
+	{
+		attr = TupleDescAttr(tupdesc, attno - 1);
+
+		if (attr->attisdropped)
+		{
+			*scan_tlist = lappend(*scan_tlist,
+							  makeTargetEntry((Expr *)
+								  makeConst(INT2OID, -1,
+											0,
+											attr->attlen,
+											0 /* value */, true /* isnull */,
+											true),
+								  attno, attr->attname.data,
+								  attr->attisdropped));
+			*mat_tlist = lappend(*mat_tlist,
+							 makeTargetEntry((Expr *)
+								 makeVar(1 /* varno */, attno, INT2OID, -1, 0, 0),
+								 attno, attr->attname.data, attr->attisdropped));
+		}
+		else
+		{
+			*scan_tlist = lappend(*scan_tlist,
+							  makeTargetEntry((Expr *)
+								  makeVar(1 /* varno */, attno, attr->atttypid,
+										  attr->atttypmod, attr->attcollation, 0),
+								  attno, attr->attname.data, attr->attisdropped));
+			*mat_tlist = lappend(*mat_tlist,
+							 makeTargetEntry((Expr *)
+								 makeVar(1 /* varno */, attno, attr->atttypid,
+										 attr->atttypmod, attr->attcollation, 0),
+								 attno, attr->attname.data, attr->attisdropped));
+		}
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitFunctionScan
  * ----------------------------------------------------------------
@@ -285,6 +298,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	int			i,
 				natts;
 	ListCell   *lc;
+	bool 		needs_material;
 
 	/* check for unsupported flags */
 	Assert(!(eflags & EXEC_FLAG_MARK));
@@ -315,6 +329,9 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	else
 		scanstate->simple = false;
 
+	/* Only add a Mterialize node if required */
+	needs_material = eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD);
+
 	/*
 	 * Ordinal 0 represents the "before the first row" position.
 	 *
@@ -347,23 +364,17 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 		TypeFuncClass functypclass;
 		Oid			funcrettype;
 		TupleDesc	tupdesc;
+		SRFScanPlan *srfscan;
+		Plan *scan;
+		List /* TargetEntry* */ *mat_tlist = NIL;
+		List /* TargetEntry* */ *scan_tlist = NIL;
+		bool funcReturnsTuple;
 
-		fs->setexpr =
-			ExecInitTableFunctionResult((Expr *) funcexpr,
-										scanstate->ss.ps.ps_ExprContext,
-										&scanstate->ss.ps);
-
-		/*
-		 * Don't allocate the tuplestores; the actual calls to the functions
-		 * do that.  NULL means that we have not called the function yet (or
-		 * need to call it again after a rescan).
-		 */
-		fs->tstore = NULL;
 		fs->rowcount = -1;
 
 		/*
 		 * Now determine if the function returns a simple or composite type,
-		 * and build an appropriate tupdesc.  Note that in the composite case,
+		 * and build an appropriate targetlist.  Note that in the composite case,
 		 * the function may now return more columns than it did when the plan
 		 * was made; we have to ignore any columns beyond "colcount".
 		 */
@@ -379,6 +390,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			Assert(tupdesc->natts >= colcount);
 			/* Must copy it out of typcache for safety */
 			tupdesc = CreateTupleDescCopy(tupdesc);
+			funcReturnsTuple = true;
 		}
 		else if (functypclass == TYPEFUNC_SCALAR)
 		{
@@ -393,6 +405,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			TupleDescInitEntryCollation(tupdesc,
 										(AttrNumber) 1,
 										exprCollation(funcexpr));
+			funcReturnsTuple = false;
 		}
 		else if (functypclass == TYPEFUNC_RECORD)
 		{
@@ -407,6 +420,7 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			 * case it doesn't.)
 			 */
 			BlessTupleDesc(tupdesc);
+			funcReturnsTuple = true;
 		}
 		else
 		{
@@ -414,21 +428,45 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 			elog(ERROR, "function in FROM has unsupported return type");
 		}
 
-		fs->tupdesc = tupdesc;
 		fs->colcount = colcount;
 
-		/*
-		 * We only need separate slots for the function results if we are
-		 * doing ordinality or multiple functions; otherwise, we'll fetch
-		 * function results directly into the scan slot.
-		 */
-		if (!scanstate->simple)
+		/* Expand tupdesc into targetlists for the scan nodes */
+		build_tlist_for_tupdesc(tupdesc, colcount, &mat_tlist, &scan_tlist);
+
+		srfscan = makeNode(SRFScanPlan);
+		srfscan->funcexpr = funcexpr;
+		srfscan->rtfunc = (Node *) rtfunc;
+		srfscan->plan.targetlist = scan_tlist;
+		srfscan->plan.extParam = rtfunc->funcparams;
+		srfscan->plan.allParam = rtfunc->funcparams;
+		srfscan->funcResultDesc = tupdesc;
+		srfscan->funcReturnsTuple = funcReturnsTuple;
+		scan = &srfscan->plan;
+
+		if (needs_material)
 		{
-			fs->func_slot = ExecInitExtraTupleSlot(estate, fs->tupdesc,
-												   &TTSOpsMinimalTuple);
+			Material *fscan = makeNode(Material);
+			fscan->plan.lefttree = scan;
+			fscan->plan.targetlist = mat_tlist;
+			fscan->plan.extParam = rtfunc->funcparams;
+			fscan->plan.allParam = rtfunc->funcparams;
+			scan = &fscan->plan;
+		}
+
+		fs->scanstate = (ScanState *) ExecInitNode (scan, estate, eflags);
+
+		if (needs_material)
+		{
+			/*
+			 * Tell the SRFScan about its parent, so that it can donate
+			 * the SRF's tuplestore if the SRF uses SFRM_Materialize.
+			 */
+			MaterialState *ms = (MaterialState *) fs->scanstate;
+			SRFScanState *sss = (SRFScanState *) outerPlanState(ms);
+
+			sss->setexpr->funcResultStoreDonationEnabled = true;
+			sss->setexpr->funcResultStoreDonationTarget = &ms->ss.ps;
 		}
-		else
-			fs->func_slot = NULL;
 
 		natts += colcount;
 		i++;
@@ -443,7 +481,11 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 	 */
 	if (scanstate->simple)
 	{
-		scan_tupdesc = CreateTupleDescCopy(scanstate->funcstates[0].tupdesc);
+		SRFScanState *sss = IsA(scanstate->funcstates[0].scanstate, MaterialState) ?
+				(SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[0].scanstate) :
+				(SRFScanState *) scanstate->funcstates[0].scanstate;
+
+		scan_tupdesc = CreateTupleDescCopy(sss->setexpr->funcResultDesc);
 		scan_tupdesc->tdtypeid = RECORDOID;
 		scan_tupdesc->tdtypmod = -1;
 	}
@@ -458,8 +500,12 @@ ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags)
 
 		for (i = 0; i < nfuncs; i++)
 		{
-			TupleDesc	tupdesc = scanstate->funcstates[i].tupdesc;
-			int			colcount = scanstate->funcstates[i].colcount;
+			SRFScanState *sss = IsA(scanstate->funcstates[i].scanstate, MaterialState) ?
+					(SRFScanState *) outerPlanState((MaterialState *) scanstate->funcstates[i].scanstate) :
+					(SRFScanState *) scanstate->funcstates[i].scanstate;
+
+			TupleDesc	tupdesc = sss->setexpr->funcResultDesc;
+			int			colcount = sss->colcount;
 			int			j;
 
 			for (j = 1; j <= colcount; j++)
@@ -536,20 +582,11 @@ ExecEndFunctionScan(FunctionScanState *node)
 	ExecClearTuple(node->ss.ss_ScanTupleSlot);
 
 	/*
-	 * Release slots and tuplestore resources
+	 * Release the Material scan resources
 	 */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-
-		if (fs->tstore != NULL)
-		{
-			tuplestore_end(node->funcstates[i].tstore);
-			fs->tstore = NULL;
-		}
+		ExecEndNode(&node->funcstates[i].scanstate->ps);
 	}
 }
 
@@ -568,23 +605,12 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 	if (node->ss.ps.ps_ResultTupleSlot)
 		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
-	for (i = 0; i < node->nfuncs; i++)
-	{
-		FunctionScanPerFuncState *fs = &node->funcstates[i];
-
-		if (fs->func_slot)
-			ExecClearTuple(fs->func_slot);
-	}
 
 	ExecScanReScan(&node->ss);
 
 	/*
-	 * Here we have a choice whether to drop the tuplestores (and recompute
-	 * the function outputs) or just rescan them.  We must recompute if an
-	 * expression contains changed parameters, else we rescan.
-	 *
-	 * XXX maybe we should recompute if the function is volatile?  But in
-	 * general the executor doesn't conditionalize its actions on that.
+	 * We must recompute if an
+	 * expression contains changed parameters.
 	 */
 	if (chgparam)
 	{
@@ -597,11 +623,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
 
 			if (bms_overlap(chgparam, rtfunc->funcparams))
 			{
-				if (node->funcstates[i].tstore != NULL)
-				{
-					tuplestore_end(node->funcstates[i].tstore);
-					node->funcstates[i].tstore = NULL;
-				}
+				UpdateChangedParamSet(&node->funcstates[i].scanstate->ps,
+									  node->ss.ps.chgParam);
+
 				node->funcstates[i].rowcount = -1;
 			}
 			i++;
@@ -611,10 +635,9 @@ ExecReScanFunctionScan(FunctionScanState *node)
 	/* Reset ordinality counter */
 	node->ordinal = 0;
 
-	/* Make sure we rewind any remaining tuplestores */
+	/* Rescan them all */
 	for (i = 0; i < node->nfuncs; i++)
 	{
-		if (node->funcstates[i].tstore != NULL)
-			tuplestore_rescan(node->funcstates[i].tstore);
+		ExecReScan(&node->funcstates[i].scanstate->ps);
 	}
 }
diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c
index dd077f4..2fde0dc 100644
--- a/src/backend/executor/nodeMaterial.c
+++ b/src/backend/executor/nodeMaterial.c
@@ -45,9 +45,12 @@ ExecMaterial(PlanState *pstate)
 	Tuplestorestate *tuplestorestate;
 	bool		eof_tuplestore;
 	TupleTableSlot *slot;
+	bool 		first_time = true;
 
 	CHECK_FOR_INTERRUPTS();
 
+restart:
+
 	/*
 	 * get state info from node
 	 */
@@ -126,12 +129,24 @@ ExecMaterial(PlanState *pstate)
 		PlanState  *outerNode;
 		TupleTableSlot *outerslot;
 
+		if (!first_time)
+			ereport(ERROR,
+					(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+					 errmsg("attempt to scan donated result store failed")));
+
 		/*
 		 * We can only get here with forward==true, so no need to worry about
 		 * which direction the subplan will go.
 		 */
 		outerNode = outerPlanState(node);
 		outerslot = ExecProcNode(outerNode);
+
+		if (node->tuplestore_donated)
+		{
+			first_time = false;
+			goto restart;
+		}
+
 		if (TupIsNull(outerslot))
 		{
 			node->eof_underlying = true;
@@ -196,6 +211,7 @@ ExecInitMaterial(Material *node, EState *estate, int eflags)
 
 	matstate->eof_underlying = false;
 	matstate->tuplestorestate = NULL;
+	matstate->tuplestore_donated = false;
 
 	/*
 	 * Miscellaneous initialization
@@ -346,6 +362,7 @@ ExecReScanMaterial(MaterialState *node)
 		{
 			tuplestore_end(node->tuplestorestate);
 			node->tuplestorestate = NULL;
+			node->tuplestore_donated = false;
 			if (outerPlan->chgParam == NULL)
 				ExecReScan(outerPlan);
 			node->eof_underlying = false;
@@ -361,8 +378,29 @@ ExecReScanMaterial(MaterialState *node)
 		 * if chgParam of subnode is not null then plan will be re-scanned by
 		 * first ExecProcNode.
 		 */
+		node->tuplestore_donated = false;
 		if (outerPlan->chgParam == NULL)
 			ExecReScan(outerPlan);
 		node->eof_underlying = false;
 	}
 }
+
+void
+ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store)
+{
+	if (!node->tuplestore_donated)
+	{
+		if (node->tuplestorestate)
+		{
+			tuplestore_end(node->tuplestorestate);
+		}
+
+		node->tuplestorestate = store;
+		node->tuplestore_donated = true;
+		node->eof_underlying = true;
+	}
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_SRF_PROTOCOL_VIOLATED),
+				 errmsg("Result tuplestore donated more than once")));
+}
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index b07c299..48d7db5 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -293,9 +293,16 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
 	 * such parameters, then there is no point in REWIND support at all in the
 	 * inner child, because it will always be rescanned with fresh parameter
 	 * values.
+	 *
+	 * The exception to this simple rule is a ROWS FROM function scan where it
+	 * is possible that only some of the inolved functions are affected by the
+	 * parameters. In this case, we blanket request support for REWIND. A more
+	 * intelligent approch would request REWIND only for nodes unaffected by
+	 * the parameters, but we aren't so intelligent yet.
 	 */
 	outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
-	if (node->nestParams == NIL)
+	if (node->nestParams == NIL ||
+		IsA(innerPlan(node), FunctionScan))
 		eflags |= EXEC_FLAG_REWIND;
 	else
 		eflags &= ~EXEC_FLAG_REWIND;
diff --git a/src/backend/executor/nodeProjectSet.c b/src/backend/executor/nodeProjectSet.c
index 4a1b060..66a1d30 100644
--- a/src/backend/executor/nodeProjectSet.c
+++ b/src/backend/executor/nodeProjectSet.c
@@ -283,6 +283,7 @@ ExecInitProjectSet(ProjectSet *node, EState *estate, int eflags)
 			state->elems[off] = (Node *)
 				ExecInitFunctionResultSet(expr, state->ps.ps_ExprContext,
 										  &state->ps);
+			Assert (((SetExprState *) state->elems[off])->funcReturnsSet);
 		}
 		else
 		{
diff --git a/src/backend/executor/nodeSRFScan.c b/src/backend/executor/nodeSRFScan.c
new file mode 100644
index 0000000..35ef67e
--- /dev/null
+++ b/src/backend/executor/nodeSRFScan.c
@@ -0,0 +1,262 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeSRFScan.c
+ *	  Coordinates a scan over a single SRF function, or a non-SRF as if it
+ *    were an SRF returning a single row.
+ *
+ *    SRFScan expands the function's output if it returns a tuple. If the
+ *    SRF uses SFRM_Materialize, it will donate the returned tuplestore to
+ *    the parent Materialize node, if there is one, to avoid double-
+ *    materialisation.
+ *
+ *    The Planner knows nothing of the SRFScan structure. It is constructed
+ *    by the Executor at execution time, and is reported in the EXPLAIN
+ *    output.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeSRFScan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "executor/nodeSRFScan.h"
+#include "executor/nodeMaterial.h"
+#include "funcapi.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/makefuncs.h"
+#include "parser/parse_type.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/syscache.h"
+
+static TupleTableSlot *			/* result tuple from subplan */
+ExecSRF(PlanState *node)
+{
+	SRFScanState *pstate = (SRFScanState *) node;
+	ExprContext *econtext = pstate->ss.ps.ps_ExprContext;
+	TupleTableSlot *resultSlot = pstate->ss.ps.ps_ResultTupleSlot;
+	Datum result;
+	ExprDoneCond *isdone = &pstate->elemdone;
+	bool	   isnull;
+	SetExprState *setexpr = pstate->setexpr;
+	FunctionCallInfo fcinfo;
+	ReturnSetInfo *rsinfo;
+
+	/* We only support forward scans. */
+	Assert(ScanDirectionIsForward(pstate->ss.ps.state->es_direction));
+
+	ExecClearTuple(resultSlot);
+
+	/*
+	 * Only execute something if we are not already complete...
+	 */
+	if (*isdone == ExprEndResult)
+		return NULL;
+
+	/*
+	 * Evaluate SRF - possibly continuing previously started output.
+	 */
+	result = ExecMakeFunctionResultSet((SetExprState *) setexpr,
+										econtext, pstate->argcontext,
+										&isnull, isdone);
+
+	if (*isdone == ExprEndResult)
+		return NULL;
+
+	fcinfo = setexpr->fcinfo;
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	/* Have we donated the result store? */
+	if (setexpr->funcResultStoreDonated)
+		return 0;
+
+	/*
+	 * If we obtained a tupdesc, check it is appropriate, but not in
+	 * the case of SFRM_Materialize becuase is will have been checked
+	 * already.
+	 */
+	if (!pstate->tupdesc_checked &&
+		setexpr->funcReturnsTuple &&
+		rsinfo->returnMode != SFRM_Materialize &&
+		rsinfo->setDesc && setexpr->funcResultDesc)
+	{
+		tupledesc_match (setexpr->funcResultDesc, rsinfo->setDesc);
+		pstate->tupdesc_checked = true;
+	}
+
+	/*
+	 * If returned a tupple, expand into multiple columns.
+	 */
+	if (setexpr->funcReturnsTuple)
+	{
+		if (!isnull)
+		{
+			HeapTupleHeader td = DatumGetHeapTupleHeader(result);
+			HeapTupleData tmptup;
+
+			/*
+			 * In SFRM_Materialize mode, the type will have been checked
+			 * already.
+			 */
+			if (rsinfo->returnMode != SFRM_Materialize)
+			{
+				/*
+				 * Verify all later returned rows have same subtype;
+				 * necessary in case the type is RECORD.
+				 */
+				if (HeapTupleHeaderGetTypeId(td) != rsinfo->setDesc->tdtypeid ||
+					HeapTupleHeaderGetTypMod(td) != rsinfo->setDesc->tdtypmod)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATATYPE_MISMATCH),
+							 errmsg("rows returned by function are not all of the same row type")));
+			}
+
+			/*
+			 * tuplestore_puttuple needs a HeapTuple not a bare
+			 * HeapTupleHeader, but it doesn't need all the fields.
+			 */
+			tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+			tmptup.t_data = td;
+
+			heap_deform_tuple (&tmptup, setexpr->funcResultDesc,
+							   resultSlot->tts_values,
+							   resultSlot->tts_isnull);
+		}
+		else
+		{
+			/*
+			 * populate the result cols with nulls
+			 */
+			int i;
+			for (i = 0; i < pstate->colcount; i++)
+			{
+				resultSlot->tts_values[i] = (Datum) 0;
+				resultSlot->tts_isnull[i] = true;
+			}
+		}
+	}
+	else
+	{
+		/* Scalar-type case: just store the function result */
+		resultSlot->tts_values[0] = result;
+		resultSlot->tts_isnull[0] = isnull;
+	}
+
+	/*
+	 * If we achieved obtained a single result, don't execute again.
+	 */
+	if (*isdone == ExprSingleResult)
+		*isdone = ExprEndResult;
+
+	ExecStoreVirtualTuple(resultSlot);
+	return resultSlot;
+}
+
+SRFScanState *
+ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags)
+{
+	RangeTblFunction *rtfunc = (RangeTblFunction *) node->rtfunc;
+
+	SRFScanState *srfstate;
+
+	/*
+	 * SRFScan should not have any children.
+	 */
+	Assert(outerPlan(node) == NULL);
+	Assert(innerPlan(node) == NULL);
+
+	/*
+	 * create state structure
+	 */
+	srfstate = makeNode(SRFScanState);
+	srfstate->ss.ps.plan = (Plan *) node;
+	srfstate->ss.ps.state = estate;
+	srfstate->ss.ps.ExecProcNode = ExecSRF;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &srfstate->ss.ps);
+
+	srfstate->setexpr =
+		ExecInitFunctionResultSet((Expr *) node->funcexpr,
+								  srfstate->ss.ps.ps_ExprContext,
+								  &srfstate->ss.ps);
+
+	srfstate->setexpr->funcResultDesc = node->funcResultDesc;
+	srfstate->setexpr->funcReturnsTuple = node->funcReturnsTuple;
+
+	srfstate->colcount = rtfunc->funccolcount;
+
+	srfstate->tupdesc_checked = false;
+
+	/* Start with the assumption we will get some result. */
+	srfstate->elemdone = ExprSingleResult;
+
+	/*
+	 * Initialize result type and slot. No need to initialize projection info
+	 * because this node doesn't do projections (ps_ResultTupleSlot).
+	 *
+	 * material nodes only return tuples from their materialized relation.
+	 */
+	ExecInitScanTupleSlot(estate, &srfstate->ss, srfstate->setexpr->funcResultDesc,
+						  &TTSOpsMinimalTuple);
+	ExecInitResultTupleSlotTL(&srfstate->ss.ps, &TTSOpsMinimalTuple);
+	ExecAssignScanProjectionInfo(&srfstate->ss);
+
+	/*
+	 * Create a memory context that ExecMakeFunctionResultSet can use to
+	 * evaluate function arguments in.  We can't use the per-tuple context for
+	 * this because it gets reset too often; but we don't want to leak
+	 * evaluation results into the query-lifespan context either.  We use one
+	 * context for the arguments of all tSRFs, as they have roughly equivalent
+	 * lifetimes.
+	 */
+	srfstate->argcontext = AllocSetContextCreate(CurrentMemoryContext,
+											  "SRF function arguments",
+											  ALLOCSET_DEFAULT_SIZES);
+	return srfstate;
+}
+
+void
+ExecEndSRFScan(SRFScanState *node)
+{
+	/* Nothing to do */
+}
+
+void
+ExecReScanSRF(SRFScanState *node)
+{
+	/* Expecting some results. */
+	node->elemdone = ExprSingleResult;
+
+	/* We must re-evaluate function call arguments. */
+	node->setexpr->setArgsValid = false;
+}
+
+bool
+ExecSRFDonateResultTuplestore(SetExprState *fcache)
+{
+	if (fcache->funcResultStoreDonationEnabled)
+	{
+		if (IsA (fcache->funcResultStoreDonationTarget, MaterialState))
+		{
+			MaterialState *target = (MaterialState *) fcache->funcResultStoreDonationTarget;
+
+			ExecMaterialReceiveResultStore(target, fcache->funcResultStore);
+
+			fcache->funcResultStore = NULL;
+
+			fcache->funcResultStoreDonated = true;
+
+			return true;
+		}
+	}
+
+	return false;
+}
diff --git a/src/include/access/tupdesc.h b/src/include/access/tupdesc.h
index d17af13..118fdcb 100644
--- a/src/include/access/tupdesc.h
+++ b/src/include/access/tupdesc.h
@@ -151,4 +151,6 @@ extern TupleDesc BuildDescForRelation(List *schema);
 
 extern TupleDesc BuildDescFromLists(List *names, List *types, List *typmods, List *collations);
 
+extern void tupledesc_match(TupleDesc dst_tupdesc, TupleDesc src_tupdesc);
+
 #endif							/* TUPDESC_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9489051..8ada13e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -409,13 +409,6 @@ extern bool ExecCheck(ExprState *state, ExprContext *context);
 /*
  * prototypes from functions in execSRF.c
  */
-extern SetExprState *ExecInitTableFunctionResult(Expr *expr,
-												 ExprContext *econtext, PlanState *parent);
-extern Tuplestorestate *ExecMakeTableFunctionResult(SetExprState *setexpr,
-													ExprContext *econtext,
-													MemoryContext argContext,
-													TupleDesc expectedDesc,
-													bool randomAccess);
 extern SetExprState *ExecInitFunctionResultSet(Expr *expr,
 											   ExprContext *econtext, PlanState *parent);
 extern Datum ExecMakeFunctionResultSet(SetExprState *fcache,
diff --git a/src/include/executor/nodeFunctionscan.h b/src/include/executor/nodeFunctionscan.h
index 74e8eef..ca89980 100644
--- a/src/include/executor/nodeFunctionscan.h
+++ b/src/include/executor/nodeFunctionscan.h
@@ -16,6 +16,16 @@
 
 #include "nodes/execnodes.h"
 
+/*
+ * Runtime data for each function being scanned.
+ */
+typedef struct FunctionScanPerFuncState
+{
+	int			colcount;		/* expected number of result columns */
+	int64		rowcount;		/* # of rows in result set, -1 if not known */
+	ScanState  *scanstate;		/* scan node: either SRFScan or Materialize */
+} FunctionScanPerFuncState;
+
 extern FunctionScanState *ExecInitFunctionScan(FunctionScan *node, EState *estate, int eflags);
 extern void ExecEndFunctionScan(FunctionScanState *node);
 extern void ExecReScanFunctionScan(FunctionScanState *node);
diff --git a/src/include/executor/nodeMaterial.h b/src/include/executor/nodeMaterial.h
index 99e7cbf..f55922c 100644
--- a/src/include/executor/nodeMaterial.h
+++ b/src/include/executor/nodeMaterial.h
@@ -21,5 +21,6 @@ extern void ExecEndMaterial(MaterialState *node);
 extern void ExecMaterialMarkPos(MaterialState *node);
 extern void ExecMaterialRestrPos(MaterialState *node);
 extern void ExecReScanMaterial(MaterialState *node);
+extern void ExecMaterialReceiveResultStore(MaterialState *node, Tuplestorestate *store);
 
 #endif							/* NODEMATERIAL_H */
diff --git a/src/include/executor/nodeSRFScan.h b/src/include/executor/nodeSRFScan.h
new file mode 100644
index 0000000..2430de5
--- /dev/null
+++ b/src/include/executor/nodeSRFScan.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * IDENTIFICATION
+ *	  src/include/executor/nodeSRFScan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef nodeSRFScan_h
+#define nodeSRFScan_h
+
+#include "nodes/execnodes.h"
+
+typedef struct
+{
+	ScanState		ss;					/* its first field is NodeTag */
+	SetExprState 	*setexpr;			/* state of the expression being evaluated */
+	ExprDoneCond	elemdone;
+	int				colcount;			/* # of columns */
+	bool			tupdesc_checked;	/* has the return tupdesc been checked? */
+	MemoryContext 	argcontext;			/* context for SRF arguments */
+	PlanState		*parent;			/* the plan's parent node */
+} SRFScanState;
+
+extern SRFScanState *ExecInitSRFScan(SRFScanPlan *node, EState *estate, int eflags);
+extern void ExecEndSRFScan(SRFScanState *node);
+extern void ExecReScanSRF(SRFScanState *node);
+extern bool ExecSRFDonateResultTuplestore(SetExprState *fcache);
+
+#endif /* nodeSRFScan_h */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cd3ddf7..f1c8085 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -797,10 +797,16 @@ typedef struct SetExprState
 	/*
 	 * For a set-returning function (SRF) that returns a tuplestore, we keep
 	 * the tuplestore here and dole out the result rows one at a time. The
-	 * slot holds the row currently being returned.
+	 * slot holds the row currently being returned. The boolean
+	 * funcResultStoreDonationEnabled indicates whether the an SRF
+	 * returning SFRM_Materialize tupleStore should attempt to donate its
+	 * resultStore to a higher level Materialize node.
 	 */
 	Tuplestorestate *funcResultStore;
 	TupleTableSlot *funcResultSlot;
+	bool 		funcResultStoreDonationEnabled;
+	bool 		funcResultStoreDonated;
+	struct PlanState *funcResultStoreDonationTarget;
 
 	/*
 	 * In some cases we need to compute a tuple descriptor for the function's
@@ -1651,6 +1657,7 @@ typedef struct SubqueryScanState
  *		funcstates			per-function execution states (private in
  *							nodeFunctionscan.c)
  *		argcontext			memory context to evaluate function arguments in
+ *		pending_srf_tuples	still evaluating any SRFs?
  * ----------------
  */
 struct FunctionScanPerFuncState;
@@ -1978,6 +1985,7 @@ typedef struct MaterialState
 	int			eflags;			/* capability flags to pass to tuplestore */
 	bool		eof_underlying; /* reached end of underlying plan? */
 	Tuplestorestate *tuplestorestate;
+	bool		tuplestore_donated; /* was duplestore donated by another node? */
 } MaterialState;
 
 /* ----------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe..24a72a2 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -514,7 +514,9 @@ typedef enum NodeTag
 	T_SupportRequestSelectivity,	/* in nodes/supportnodes.h */
 	T_SupportRequestCost,		/* in nodes/supportnodes.h */
 	T_SupportRequestRows,		/* in nodes/supportnodes.h */
-	T_SupportRequestIndexCondition	/* in nodes/supportnodes.h */
+	T_SupportRequestIndexCondition,	/* in nodes/supportnodes.h */
+	T_SRFScanPlan,
+	T_SRFScanState
 } NodeTag;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4869fe7..3486ede 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -16,6 +16,7 @@
 
 #include "access/sdir.h"
 #include "access/stratnum.h"
+#include "access/tupdesc.h"
 #include "lib/stringinfo.h"
 #include "nodes/bitmapset.h"
 #include "nodes/lockoptions.h"
@@ -546,6 +547,14 @@ typedef struct TableFuncScan
 	TableFunc  *tablefunc;		/* table function node */
 } TableFuncScan;
 
+typedef struct SRFScanPlan {
+	Plan		plan;
+	Node		*funcexpr;
+	Node 		*rtfunc;
+	TupleDesc	funcResultDesc;		/* funciton output columns tuple descriptor */
+	bool		funcReturnsTuple;
+} SRFScanPlan;
+
 /* ----------------
  *		CteScan node
  * ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f457b5b..ab8e222 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -514,13 +514,15 @@ order by 1, 2;
          ->  Function Scan on pg_catalog.generate_series s1
                Output: s1.s1
                Function Call: generate_series(1, 3)
+               ->  SRF Scan
          ->  HashAggregate
                Output: s2.s2, sum((s1.s1 + s2.s2))
                Group Key: s2.s2
                ->  Function Scan on pg_catalog.generate_series s2
                      Output: s2.s2
                      Function Call: generate_series(1, 3)
-(14 rows)
+                     ->  SRF Scan
+(16 rows)
 
 select s1, s2, sm
 from generate_series(1, 3) s1,
@@ -549,6 +551,7 @@ select array(select sum(x+y) s
  Function Scan on pg_catalog.generate_series x
    Output: (SubPlan 1)
    Function Call: generate_series(1, 3)
+   ->  SRF Scan
    SubPlan 1
      ->  Sort
            Output: (sum((x.x + y.y))), y.y
@@ -559,7 +562,8 @@ select array(select sum(x+y) s
                  ->  Function Scan on pg_catalog.generate_series y
                        Output: y.y
                        Function Call: generate_series(1, 3)
-(13 rows)
+                       ->  SRF Scan
+(15 rows)
 
 select array(select sum(x+y) s
             from generate_series(1,3) y group by y order by s)
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index c1f802c..5eb7dba 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -374,7 +374,8 @@ select g as alias1, g as alias2
    ->  Sort
          Sort Key: g
          ->  Function Scan on generate_series g
-(6 rows)
+               ->  SRF Scan
+(7 rows)
 
 select g as alias1, g as alias2
   from generate_series(1,3) g
@@ -1234,7 +1235,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(8 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(10 rows)
 
 select *
   from (values (1),(2)) v(x),
@@ -1358,7 +1361,9 @@ explain (costs off)
          ->  Nested Loop
                ->  Values Scan on "*VALUES*"
                ->  Function Scan on gstest_data
-(10 rows)
+                     ->  Materialize
+                           ->  SRF Scan
+(12 rows)
 
 -- Verify that we correctly handle the child node returning a
 -- non-minimal slot, which happens if the input is pre-sorted,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dfd0ee4..c339722 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1684,6 +1684,7 @@ FROM generate_series(1, 3) g(i);
                            QUERY PLAN                           
 ----------------------------------------------------------------
  Function Scan on generate_series g
+   ->  SRF Scan
    SubPlan 1
      ->  Limit
            ->  Merge Append
@@ -1691,10 +1692,12 @@ FROM generate_series(1, 3) g(i);
                  ->  Sort
                        Sort Key: ((d.d + g.i))
                        ->  Function Scan on generate_series d
+                             ->  SRF Scan
                  ->  Sort
                        Sort Key: ((d_1.d + g.i))
                        ->  Function Scan on generate_series d_1
-(11 rows)
+                             ->  SRF Scan
+(14 rows)
 
 SELECT
     ARRAY(SELECT f.i FROM (
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b..3650aee 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3403,7 +3403,8 @@ select * from mki8(1,2);
  Function Scan on mki8
    Output: q1, q2
    Function Call: '(1,2)'::int8_tbl
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from mki8(1,2);
  q1 | q2 
@@ -3418,7 +3419,8 @@ select * from mki4(42);
  Function Scan on mki4
    Output: f1
    Function Call: '(42)'::int4_tbl
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from mki4(42);
  f1 
@@ -3660,9 +3662,10 @@ left join unnest(v1ys) as u1(u1y) on u1y = v2y;
          Hash Cond: (u1.u1y = "*VALUES*_1".column2)
          Filter: ("*VALUES*_1".column1 = "*VALUES*".column1)
          ->  Function Scan on unnest u1
+               ->  SRF Scan
          ->  Hash
                ->  Values Scan on "*VALUES*_1"
-(8 rows)
+(9 rows)
 
 select * from
 (values (1, array[10,20]), (2, array[20,30])) as v1(v1x,v1ys)
@@ -4475,7 +4478,9 @@ select 1 from (select a.id FROM a left join b on a.b_id = b.id) q,
    ->  Seq Scan on a
    ->  Function Scan on generate_series gs
          Filter: (a.id = i)
-(4 rows)
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
 
 rollback;
 create temp table parent (k int primary key, pd int);
@@ -4814,7 +4819,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 explain (costs off)
   select count(*) from tenk1 a cross join lateral generate_series(1,two) g;
@@ -4824,7 +4831,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 -- don't need the explicit LATERAL keyword for functions
 explain (costs off)
@@ -4835,7 +4844,9 @@ explain (costs off)
    ->  Nested Loop
          ->  Seq Scan on tenk1 a
          ->  Function Scan on generate_series g
-(4 rows)
+               ->  Materialize
+                     ->  SRF Scan
+(6 rows)
 
 -- lateral with UNION ALL subselect
 explain (costs off)
@@ -4846,12 +4857,13 @@ explain (costs off)
 ------------------------------------------
  Nested Loop
    ->  Function Scan on generate_series g
+         ->  SRF Scan
    ->  Append
          ->  Seq Scan on int8_tbl a
                Filter: (g.g = q1)
          ->  Seq Scan on int8_tbl b
                Filter: (g.g = q2)
-(7 rows)
+(8 rows)
 
 select * from generate_series(100,200) g,
   lateral (select * from int8_tbl a where g = q1 union all
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index e217b67..19b14f8 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -226,9 +226,10 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
  Hash Join
    Hash Cond: (g.g = a.unique1)
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Hash
          ->  Seq Scan on tenk1 a
-(5 rows)
+(6 rows)
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
@@ -236,7 +237,8 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
 -------------------------------------------------
  Nested Loop
    ->  Function Scan on my_gen_series g
+         ->  SRF Scan
    ->  Index Scan using tenk1_unique1 on tenk1 a
          Index Cond: (unique1 = g.g)
-(4 rows)
+(5 rows)
 
diff --git a/src/test/regress/expected/pg_lsn.out b/src/test/regress/expected/pg_lsn.out
index 64d41df..e68adc1 100644
--- a/src/test/regress/expected/pg_lsn.out
+++ b/src/test/regress/expected/pg_lsn.out
@@ -87,13 +87,17 @@ SELECT DISTINCT (i || '/' || j)::pg_lsn f
          Group Key: ((((i.i)::text || '/'::text) || (j.j)::text))::pg_lsn
          ->  Nested Loop
                ->  Function Scan on generate_series k
+                     ->  SRF Scan
                ->  Materialize
                      ->  Nested Loop
                            ->  Function Scan on generate_series j
                                  Filter: ((j > 0) AND (j <= 10))
+                                 ->  SRF Scan
                            ->  Function Scan on generate_series i
                                  Filter: (i <= 10)
-(12 rows)
+                                 ->  Materialize
+                                       ->  SRF Scan
+(16 rows)
 
 SELECT DISTINCT (i || '/' || j)::pg_lsn f
   FROM generate_series(1, 10) i,
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index cd2c79f..67a6f39 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3094,7 +3094,7 @@ select * from sc_test();
 
 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
@@ -4852,7 +4852,9 @@ select i, a from
    ->  Function Scan on public.consumes_rw_array i
          Output: i.i
          Function Call: consumes_rw_array((returns_rw_array(1)))
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)
 
 select i, a from
   (select returns_rw_array(1) as a offset 0) ss,
@@ -4869,7 +4871,8 @@ select consumes_rw_array(a), a from returns_rw_array(1) a;
  Function Scan on public.returns_rw_array a
    Output: consumes_rw_array(a), a
    Function Call: returns_rw_array(1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select consumes_rw_array(a), a from returns_rw_array(1) a;
  consumes_rw_array |   a   
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index a70060b..7f96baa 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -1841,7 +1841,8 @@ explain (verbose, costs off)
  Function Scan on public.array_to_set t
    Output: f1, f2
    Function Call: array_to_set('{one,two}'::text[])
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 -- but without, it can be:
 create or replace function array_to_set(anyarray) returns setof record as $$
@@ -1879,7 +1880,8 @@ explain (verbose, costs off)
  Function Scan on pg_catalog.generate_subscripts i
    Output: i.i, ('{one,two}'::text[])[i.i]
    Function Call: generate_subscripts('{one,two}'::text[], 1)
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 create temp table rngfunc(f1 int8, f2 int8);
 create function testrngfunc() returns record as $$
@@ -1950,7 +1952,8 @@ select * from testrngfunc();
  Function Scan on testrngfunc
    Output: f1, f2
    Function Call: '(7.136178,7.14)'::rngfunc_type
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from testrngfunc();
     f1    |  f2  
@@ -1982,7 +1985,8 @@ select * from testrngfunc();
  Function Scan on public.testrngfunc
    Output: f1, f2
    Function Call: testrngfunc()
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from testrngfunc();
     f1    |  f2  
@@ -2048,7 +2052,8 @@ select * from testrngfunc();
  Function Scan on public.testrngfunc
    Output: f1, f2
    Function Call: testrngfunc()
-(3 rows)
+   ->  SRF Scan
+(4 rows)
 
 select * from testrngfunc();
     f1    |  f2  
@@ -2217,7 +2222,9 @@ select x from int8_tbl, extractq2(int8_tbl) f(x);
    ->  Function Scan on f
          Output: f.x
          Function Call: int8_tbl.q2
-(7 rows)
+         ->  Materialize
+               ->  SRF Scan
+(9 rows)
 
 select x from int8_tbl, extractq2(int8_tbl) f(x);
          x         
@@ -2306,3 +2313,155 @@ select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 (0 rows)
 
 drop type rngfunc2;
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+	SELECT i, nextval('rngfunc_vpc_seq')
+		FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+	for i in $1..$2 loop
+		return next (i, nextval('rngfunc_mat_seq'));
+	end loop;
+end;
+$$
+LANGUAGE plpgsql;
+-- A VPC SRF that is not part of a complex query should not materialize.
+-- 
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+	select * from rngfunc_vpc(1, 3) t;
+           QUERY PLAN           
+--------------------------------
+ Function Scan on rngfunc_vpc t
+   ->  SRF Scan
+(2 rows)
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+-- 
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+ nextval 
+---------
+       1
+(1 row)
+
+select * from rngfunc_vpc(1, 3) t limit 2;
+ i | s 
+---+---
+ 1 | 2
+ 2 | 3
+(2 rows)
+
+select nextval('rngfunc_vpc_seq');
+ nextval 
+---------
+       4
+(1 row)
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_vpc t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                QUERY PLAN                
+------------------------------------------
+ Nested Loop
+   ->  Function Scan on generate_series n
+         ->  SRF Scan
+   ->  Function Scan on rngfunc_mat t
+         ->  Materialize
+               ->  SRF Scan
+(6 rows)
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_vpc t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=3 loops=1)
+                     SFRM: ValuePerCall
+(8 rows)
+
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Nested Loop (actual rows=9 loops=1)
+   ->  Function Scan on generate_series n (actual rows=3 loops=1)
+         ->  SRF Scan (actual rows=3 loops=1)
+               SFRM: ValuePerCall
+   ->  Function Scan on rngfunc_mat t (actual rows=3 loops=3)
+         ->  Materialize (actual rows=3 loops=3)
+               ->  SRF Scan (actual rows=0 loops=1)
+                     SFRM: Materialize
+                     Donated tuplestore: true
+(9 rows)
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+ nextval 
+---------
+       4
+(1 row)
+
+select * from rngfunc_mat(1, 3) t limit 2;
+ i | s 
+---+---
+ 1 | 5
+ 2 | 6
+(2 rows)
+
+select nextval('rngfunc_mat_seq');
+ nextval 
+---------
+       8
+(1 row)
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index fe1cd9d..9f6deff 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1669,8 +1669,9 @@ select * from test_tsquery, to_tsquery('new') q where txtsample @@ q;
  Nested Loop
    Join Filter: (test_tsquery.txtsample @@ q.q)
    ->  Function Scan on to_tsquery q
+         ->  SRF Scan
    ->  Seq Scan on test_tsquery
-(4 rows)
+(5 rows)
 
 -- to_tsquery(regconfig, text) is an immutable function.
 -- That allows us to get rid of using function scan and join at all.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6e72e92..6828582 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -577,8 +577,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  HashAggregate
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)
 
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -588,9 +590,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)
 
 select from generate_series(1,5) union select from generate_series(1,3);
 --
@@ -626,8 +630,10 @@ select from generate_series(1,5) union select from generate_series(1,3);
  Unique
    ->  Append
          ->  Function Scan on generate_series
+               ->  SRF Scan
          ->  Function Scan on generate_series generate_series_1
-(4 rows)
+               ->  SRF Scan
+(6 rows)
 
 explain (costs off)
 select from generate_series(1,5) intersect select from generate_series(1,3);
@@ -637,9 +643,11 @@ select from generate_series(1,5) intersect select from generate_series(1,3);
    ->  Append
          ->  Subquery Scan on "*SELECT* 1"
                ->  Function Scan on generate_series
+                     ->  SRF Scan
          ->  Subquery Scan on "*SELECT* 2"
                ->  Function Scan on generate_series generate_series_1
-(6 rows)
+                     ->  SRF Scan
+(8 rows)
 
 select from generate_series(1,5) union select from generate_series(1,3);
 --
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index d5fd404..d2cd0b5 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3851,7 +3851,8 @@ EXPLAIN (costs off) SELECT * FROM pg_temp.f(2);
          ->  Sort
                Sort Key: s.s
                ->  Function Scan on generate_series s
-(5 rows)
+                     ->  SRF Scan
+(6 rows)
 
 SELECT * FROM pg_temp.f(2);
     f    
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index d841d8c..4717b06 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2646,7 +2646,7 @@ select * from sc_test();
 
 create or replace function sc_test() returns setof integer as $$
 declare
-  c cursor for select * from generate_series(1, 10);
+  c scroll cursor for select * from generate_series(1, 10);
   x integer;
 begin
   open c;
diff --git a/src/test/regress/sql/rangefuncs.sql b/src/test/regress/sql/rangefuncs.sql
index 476b4f2..4d39f39 100644
--- a/src/test/regress/sql/rangefuncs.sql
+++ b/src/test/regress/sql/rangefuncs.sql
@@ -730,3 +730,82 @@ select *, row_to_json(u) from unnest(array[null::rngfunc2, (1,'foo')::rngfunc2,
 select *, row_to_json(u) from unnest(array[]::rngfunc2[]) u;
 
 drop type rngfunc2;
+
+--------------------------------------------------------------------------------
+-- Start of tests for support of ValuePerCall-mode SRFs
+
+CREATE TEMPORARY SEQUENCE rngfunc_vpc_seq;
+CREATE TEMPORARY SEQUENCE rngfunc_mat_seq;
+CREATE TYPE rngfunc_vpc_t AS (i integer, s bigint);
+
+-- rngfunc_vpc is SQL, so will yield a ValuePerCall SRF
+CREATE FUNCTION rngfunc_vpc(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+	SELECT i, nextval('rngfunc_vpc_seq')
+		FROM generate_series($1,$2) i;
+$$
+LANGUAGE SQL;
+
+-- rngfunc_mat is plpgsql, so will yield a Materialize SRF
+CREATE FUNCTION rngfunc_mat(int,int)
+	RETURNS setof rngfunc_vpc_t AS
+$$
+begin
+	for i in $1..$2 loop
+		return next (i, nextval('rngfunc_mat_seq'));
+	end loop;
+end;
+$$
+LANGUAGE plpgsql;
+
+-- A VPC SRF that is not part of a complex query should not materialize.
+-- 
+-- To illustrate this, we explain a simple VPC SRF scan, and note the
+-- absence of a Materialize node.
+--
+explain (costs off)
+	select * from rngfunc_vpc(1, 3) t;
+
+-- A VPC SRF that aborts early should do so without emitting all results.
+-- 
+-- To illustrate this, we show that an SRF that uses a sequence does not
+-- have its value incremented if the SRF is not invoked to generate a row.
+--
+select nextval('rngfunc_vpc_seq');
+select * from rngfunc_vpc(1, 3) t limit 2;
+select nextval('rngfunc_vpc_seq');
+
+-- A Marerialize SRF should show Materialization if the query demand rescan.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- The same plan should be generated for both VPC and Materialize mode SRFs.
+--
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (costs off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF should show donation of the returned tuplestore.
+--
+-- To illustrate this, we construct a cross join, which forces rescan.
+--
+-- Only the Materialize mode SRF should show donation.
+--
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_vpc(1, 3) t;
+explain (analyze, timing off, costs off, summary off)
+	select * from generate_series (1, 3) n, rngfunc_mat(1, 3) t;
+
+-- A Marerialize SRF that aborts early should still generate all results.
+--
+-- To illustrate this, we show that an SRF that uses a sequence still has
+-- its value incremented if even when SRF's rows are not emitted.
+--
+select nextval('rngfunc_mat_seq');
+select * from rngfunc_mat(1, 3) t limit 2;
+select nextval('rngfunc_mat_seq');
+
+-- End of tests for support of ValuePerCall-mode SRFs
+--------------------------------------------------------------------------------

#21

Thomas Munro

thomas.munro@gmail.com

almost 6 years ago

In reply to: Tom Lane (#20)

Re: The flinfo->fn_extra question, from me this time.

On Fri, Mar 13, 2020 at 7:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... (At least on the Linux side. I guess the cfbot's
Windows builds are sans cassert, which seems like an odd choice.)

I tried turning that on by adding $config{asserts} = 1 in the build
script and adding some scripting to dump all relevant logs on
appveyor. It had the desired effect, but I had some trouble getting
any useful information out of it. Somehow the FailedAssertion message
is not making it to the log, which seems to be the bare minimum you'd
need for this to be useful, and ideally you'd also want a backtrace.
I'll look into that next week with the help of a Windows-enabled
colleague.

#22

Dent John

denty@QQdd.eu

almost 6 years ago

In reply to: Tom Lane (#20)

Re: The flinfo->fn_extra question, from me this time.

On 12 Mar 2020, at 18:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[…]

I didn't want to spend any more effort on it than that, because I'm
not really on board with this line of attack.

Appreciate that. It was about the approach that I was most keen to get feedback upon.

This patch seems
awfully invasive for what it's accomplishing, both at the code level
and in terms of what users will see in EXPLAIN. No, I don't think
that adding additional "SRF Scan" nodes below FunctionScan is an
improvement, nor do I like your repurposing/abusing of Materialize.
It might be okay if you were just using Materialize as-is, but if
it's sort-of-materialize-but-not-always, I don't think that's going
to make anyone less confused.

Okay. Makes sense.

I wonder whether you think it's valuable to retain in the EXPLAIN output the mode the SRF operated in?

That information is not available to end users, yet it is important to understand when trying to create a pipeline-able plan.

More locally, this business with creating new "plan nodes" below the
FunctionScan at executor startup is a real abuse of a whole lot of stuff,
and I suspect that it's not unrelated to the assertion failures I'm
seeing. Don't do that. If you want to build some data structures at
executor start, fine, but they're not plans and shouldn't be mislabeled as
that.

I felt that FunctionScan was duplicating a bunch of stuff that the Materialize node could be doing for it. But in the end, I agree. Actually making re-use of Materialize turned out quite invasive.

On the other hand, if they do need to be plan nodes, they should be
made by the planner (which in turn would require a lot of infrastructure
you haven't built, eg copyfuncs/outfuncs/readfuncs/setrefs/...).

The v3 patch seemed closer to the sort of thing I was expecting
to get out of this (though I've not read it in any detail).

I did a bit more exploration down the route of pushing it into the planner. I figured perhaps some of the complexities would shake out by approaching it at the planner level, but I learned enough along the way to realise that it is a long journey.

I’ll dust off the v3 approach and resubmit. While I’m doing that, I'll pull it back from the CF.