Accessing original TupleDesc from SRF

Started by John Grayover 23 years ago14 messages
#1John Gray
jgray@azuli.co.uk

Please correct me if I've got this wrong, but it appears from the SRF
API, that a SRF cannot readily refer to the TupleDesc to which it is
expected to conform (i.e. the TupleDesc derived from the FROM clause of
an original SELECT statement) because that is held in the executor state
and not copied or linked into the function context.

The reason I'm interested (and this might be a crazy idea) is that a
function might choose to adapt its output based on what it is asked for.
i.e. the attribute names and types which it is asked to provide might
have some significance to the function.

The application in this case is the querying of an XML document (this
relates to the contrib/xml XPath functions) where you might want a
function which gives you a "virtual view" of the document. In order to
do so, you specify a query such as:

SELECT * FROM xmlquery_func('some text here') AS xq(document_author
text, document_publisher text, document_date text);

(this would likely be part of a subquery or join in practice.)

The function would see the requested attribute "document_author" and
translate that to '//document/author/text()' for use by the XPath
engine. This avoids having to have a function with varying arguments
-instead you have a 'virtual table' that generates only the attributes
requested.

Does this sound completely crazy?

Regards

John

--
John Gray
Azuli IT
www.azuli.co.uk

#2Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: John Gray (#1)
Re: Accessing original TupleDesc from SRF

Please correct me if I've got this wrong, but it appears from the SRF
API, that a SRF cannot readily refer to the TupleDesc to which it is
expected to conform (i.e. the TupleDesc derived from the FROM
clause of
an original SELECT statement) because that is held in the
executor state
and not copied or linked into the function context.

The reason I'm interested (and this might be a crazy idea) is that a
function might choose to adapt its output based on what it is
asked for.
i.e. the attribute names and types which it is asked to provide might
have some significance to the function.

The application in this case is the querying of an XML document (this
relates to the contrib/xml XPath functions) where you might want a
function which gives you a "virtual view" of the document. In order to
do so, you specify a query such as:

SELECT * FROM xmlquery_func('some text here') AS xq(document_author
text, document_publisher text, document_date text);

(this would likely be part of a subquery or join in practice.)

The function would see the requested attribute "document_author" and
translate that to '//document/author/text()' for use by the XPath
engine. This avoids having to have a function with varying arguments
-instead you have a 'virtual table' that generates only the attributes
requested.

Does this sound completely crazy?

Nope, sounds really useful.

Andreas

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Gray (#1)
Re: Accessing original TupleDesc from SRF

John Gray <jgray@azuli.co.uk> writes:

Please correct me if I've got this wrong, but it appears from the SRF
API, that a SRF cannot readily refer to the TupleDesc to which it is
expected to conform (i.e. the TupleDesc derived from the FROM clause of
an original SELECT statement) because that is held in the executor state
and not copied or linked into the function context.

The reason I'm interested (and this might be a crazy idea) is that a
function might choose to adapt its output based on what it is asked for.

Seems like a cool idea.

We could fairly readily add a field to ReturnSetInfo, but that would
only be available to functions defined as returning a set. That'd
probably cover most useful cases but it still seems a bit unclean.

I suppose that ExecMakeTableFunctionResult could be changed to *always*
pass ReturnSetInfo, even if it's not expecting the function to return
a set. That seems even less clean; but it would work, at least in the
current implementation.

Anyone have a better idea?

regards, tom lane

#4Joe Conway
mail@joeconway.com
In reply to: John Gray (#1)
Re: Accessing original TupleDesc from SRF

John Gray wrote:

Please correct me if I've got this wrong, but it appears from the SRF
API, that a SRF cannot readily refer to the TupleDesc to which it is
expected to conform (i.e. the TupleDesc derived from the FROM clause of
an original SELECT statement) because that is held in the executor state
and not copied or linked into the function context.

[snip]

Does this sound completely crazy?

Not crazy at all. I asked the same question a few days ago:
http://archives.postgresql.org/pgsql-hackers/2002-08/msg01914.php

Tom suggested a workaround for my purpose, but I do agree that this is
needed in the long run. I looked at it briefly, but there was no easy
answer I could spot. I'll take another look today.

Joe

#5Joe Conway
mail@joeconway.com
In reply to: John Gray (#1)
Re: Accessing original TupleDesc from SRF

Tom Lane wrote:

John Gray <jgray@azuli.co.uk> writes:

Please correct me if I've got this wrong, but it appears from the SRF
API, that a SRF cannot readily refer to the TupleDesc to which it is
expected to conform (i.e. the TupleDesc derived from the FROM clause of
an original SELECT statement) because that is held in the executor state
and not copied or linked into the function context.

The reason I'm interested (and this might be a crazy idea) is that a
function might choose to adapt its output based on what it is asked for.

Seems like a cool idea.

We could fairly readily add a field to ReturnSetInfo, but that would
only be available to functions defined as returning a set. That'd
probably cover most useful cases but it still seems a bit unclean.

I thought about that, but had the same concern.

I suppose that ExecMakeTableFunctionResult could be changed to *always*
pass ReturnSetInfo, even if it's not expecting the function to return
a set. That seems even less clean; but it would work, at least in the
current implementation.

Hmmm. I hadn't thought about this possibility. Why is it unclean? Are
there places where the lack of ReturnSetInfo is used to indicate that
the function does not return a set? Doesn't seem like there should be.

Anyone have a better idea?

I was looking to see if it could be added to FunctionCallInfoData, but
you might find that more unclean still ;-).

Actually, I left off trying to figure out how to pass the columndef to
ExecMakeFunctionResult in the first place. It wasn't obvious to me, and
since you offered an easy alternative solution I stopped trying. Any
suggestions? Preference of extending FunctionCallInfoData or
ReturnSetInfo? I'd really like to do this for 7.3.

Joe

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#4)
Re: Accessing original TupleDesc from SRF

Joe Conway <mail@joeconway.com> writes:

John Gray wrote:

Does this sound completely crazy?

Not crazy at all. I asked the same question a few days ago:
http://archives.postgresql.org/pgsql-hackers/2002-08/msg01914.php

I've been thinking more about this, and wondering if we should not
only make the tupdesc available but rely more heavily on it than we
do. Most of the C-coded functions do fairly substantial pushups to
construct tupdescs that are just going to duplicate what
nodeFunctionscan already has in its back pocket. They could save some
time by just picking that up and using it.

On the other hand, your experience yesterday with debugging a mismatched
function declaration suggests that it's still a good idea to make the
functions build the tupdesc they think they are returning.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#5)
Re: Accessing original TupleDesc from SRF

Joe Conway <mail@joeconway.com> writes:

I suppose that ExecMakeTableFunctionResult could be changed to *always*
pass ReturnSetInfo, even if it's not expecting the function to return
a set. That seems even less clean; but it would work, at least in the
current implementation.

Hmmm. I hadn't thought about this possibility. Why is it unclean? Are
there places where the lack of ReturnSetInfo is used to indicate that
the function does not return a set? Doesn't seem like there should be.

Probably not. If the function itself doesn't know whether it should
return a set, it can always look at the FmgrInfo struct to find out.

Actually, I left off trying to figure out how to pass the columndef to
ExecMakeFunctionResult in the first place.

That was hard yesterday, but it's easy today because nodeFunctionscan
isn't using ExecEvalExpr anymore --- we'd only have to add one more
parameter to ExecMakeTableFunctionResult and we're there.

Preference of extending FunctionCallInfoData or ReturnSetInfo?

Definitely ReturnSetInfo. If we put it in FunctionCallInfoData then
that's an extra word to zero for *every* fmgr function call, not only
table functions.

One thing to notice is that a table function that's depending on this
info being available will have to punt if it's invoked via
ExecMakeFunctionResult (ie, it's being called in a SELECT targetlist).
That doesn't bother me too much, but maybe others will see it
differently.

regards, tom lane

#8Joe Conway
mail@joeconway.com
In reply to: John Gray (#1)
Re: Accessing original TupleDesc from SRF

Tom Lane wrote:

I've been thinking more about this, and wondering if we should not
only make the tupdesc available but rely more heavily on it than we
do. Most of the C-coded functions do fairly substantial pushups to
construct tupdescs that are just going to duplicate what
nodeFunctionscan already has in its back pocket. They could save some
time by just picking that up and using it.

On the other hand, your experience yesterday with debugging a mismatched
function declaration suggests that it's still a good idea to make the
functions build the tupdesc they think they are returning.

In a function which *can* know what the tupledec should look like based
on independent information (contrib/tablefunc.c:crosstab), or based on a
priori knowledge (guc.c:show_all_settings), then the passed in tupdesc
could be used by the function to validate that it has been acceptably
declared (for named types) or called (for anonymous types).

But it is also interesting to let the function try to adapt to the way
in which it has been called, and punt if it can't deal with it. And in
some cases, like John's example, there *is* no other way.

Joe

#9Joe Conway
mail@joeconway.com
In reply to: John Gray (#1)
Re: Accessing original TupleDesc from SRF

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

Actually, I left off trying to figure out how to pass the columndef to
ExecMakeFunctionResult in the first place.

That was hard yesterday, but it's easy today because nodeFunctionscan
isn't using ExecEvalExpr anymore --- we'd only have to add one more
parameter to ExecMakeTableFunctionResult and we're there.

I didn't even realize you had changed that! Things move quickly around
here ;-). I'll take a look this morning.

Preference of extending FunctionCallInfoData or ReturnSetInfo?

Definitely ReturnSetInfo. If we put it in FunctionCallInfoData then
that's an extra word to zero for *every* fmgr function call, not only
table functions.

OK.

One thing to notice is that a table function that's depending on this
info being available will have to punt if it's invoked via
ExecMakeFunctionResult (ie, it's being called in a SELECT targetlist).
That doesn't bother me too much, but maybe others will see it
differently.

It's an important safety tip, but it doesn't bother me either. I think
you have suggested before that SRFs in SELECT targetlists should be
deprecated/removed. I'd take that one step further and say that SELECT
targetlists should only allow a scalar result, but obviously there are
some backwards compatibility issues there.

Joe

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#8)
Re: Accessing original TupleDesc from SRF

Joe Conway <mail@joeconway.com> writes:

On the other hand, your experience yesterday with debugging a mismatched
function declaration suggests that it's still a good idea to make the
functions build the tupdesc they think they are returning.

In a function which *can* know what the tupledec should look like based
on independent information (contrib/tablefunc.c:crosstab), or based on a
priori knowledge (guc.c:show_all_settings), then the passed in tupdesc
could be used by the function to validate that it has been acceptably
declared (for named types) or called (for anonymous types).

Yeah, I had also considered the idea of pushing the responsibility of
verifying the tupdesc matching out to the function (ie, nodeFunctionscan
wouldn't call tupdesc_mismatch anymore, but the function could).

I think this is a bad idea on balance though; it would save few cycles
and probably create lots more debugging headaches like the one you had.

regards, tom lane

#11Joe Conway
mail@joeconway.com
In reply to: John Gray (#1)
1 attachment(s)
Re: Accessing original TupleDesc from SRF

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

Preference of extending FunctionCallInfoData or ReturnSetInfo?

Definitely ReturnSetInfo. If we put it in FunctionCallInfoData then
that's an extra word to zero for *every* fmgr function call, not only
table functions.

Attached adds:
+ TupleDesc queryDesc; /* descriptor for planned query */
to ReturnSetInfo, and populates ReturnSetInfo for every function call to
ExecMakeTableFunctionResult, not just when fn_retset.

One thing to notice is that a table function that's depending on this
info being available will have to punt if it's invoked via
ExecMakeFunctionResult (ie, it's being called in a SELECT targetlist).
That doesn't bother me too much, but maybe others will see it
differently.

I haven't done it yet, but I suppose this should be documented in
xfunc.sgml. With this patch the behavior of a function called through
ExecMakeFunctionResult will be:

if (fn_retset)
ReturnSetInfo is populated but queryDesc is set to NULL

if (!fn_retset)
ReturnSetInfo is NULL

If there are no objections, please apply.

Thanks,

Joe

Attachments:

rsi-querydesc.1.patchtext/plain; name=rsi-querydesc.1.patchDownload
Index: src/backend/executor/execQual.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/executor/execQual.c,v
retrieving revision 1.102
diff -c -r1.102 execQual.c
*** src/backend/executor/execQual.c	30 Aug 2002 00:28:41 -0000	1.102
--- src/backend/executor/execQual.c	30 Aug 2002 19:30:38 -0000
***************
*** 709,714 ****
--- 709,715 ----
  		rsinfo.econtext = econtext;
  		rsinfo.allowedModes = (int) SFRM_ValuePerCall;
  		rsinfo.returnMode = SFRM_ValuePerCall;
+ 		rsinfo.queryDesc = NULL;
  		/* isDone is filled below */
  		rsinfo.setResult = NULL;
  		rsinfo.setDesc = NULL;
***************
*** 851,856 ****
--- 852,858 ----
  Tuplestorestate *
  ExecMakeTableFunctionResult(Expr *funcexpr,
  							ExprContext *econtext,
+ 							TupleDesc queryDesc,
  							TupleDesc *returnDesc)
  {
  	Tuplestorestate *tupstore = NULL;
***************
*** 859,865 ****
  	List	   *argList;
  	FunctionCachePtr fcache;
  	FunctionCallInfoData fcinfo;
! 	ReturnSetInfo rsinfo;		/* for functions returning sets */
  	ExprDoneCond argDone;
  	MemoryContext callerContext;
  	MemoryContext oldcontext;
--- 861,867 ----
  	List	   *argList;
  	FunctionCachePtr fcache;
  	FunctionCallInfoData fcinfo;
! 	ReturnSetInfo rsinfo;
  	ExprDoneCond argDone;
  	MemoryContext callerContext;
  	MemoryContext oldcontext;
***************
*** 918,938 ****
  	}
  
  	/*
! 	 * If function returns set, prepare a resultinfo node for
! 	 * communication
  	 */
! 	if (fcache->func.fn_retset)
! 	{
! 		fcinfo.resultinfo = (Node *) &rsinfo;
! 		rsinfo.type = T_ReturnSetInfo;
! 		rsinfo.econtext = econtext;
! 		rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
! 	}
! 	/* we set these fields always since we examine them below */
  	rsinfo.returnMode = SFRM_ValuePerCall;
  	/* isDone is filled below */
  	rsinfo.setResult = NULL;
  	rsinfo.setDesc = NULL;
  
  	/*
  	 * Switch to short-lived context for calling the function.
--- 920,942 ----
  	}
  
  	/*
! 	 * prepare a resultinfo node for communication
  	 */
! 	fcinfo.resultinfo = (Node *) &rsinfo;
! 	rsinfo.type = T_ReturnSetInfo;
! 	rsinfo.econtext = econtext;
! 	rsinfo.queryDesc = queryDesc;
  	rsinfo.returnMode = SFRM_ValuePerCall;
  	/* isDone is filled below */
  	rsinfo.setResult = NULL;
  	rsinfo.setDesc = NULL;
+ 
+ 	/*
+ 	 * If function returns set, we need to tell it what modes of return
+ 	 * we will accept
+ 	 */
+ 	if (fcache->func.fn_retset)
+ 		rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
  
  	/*
  	 * Switch to short-lived context for calling the function.
Index: src/backend/executor/nodeFunctionscan.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/executor/nodeFunctionscan.c,v
retrieving revision 1.8
diff -c -r1.8 nodeFunctionscan.c
*** src/backend/executor/nodeFunctionscan.c	30 Aug 2002 00:28:41 -0000	1.8
--- src/backend/executor/nodeFunctionscan.c	30 Aug 2002 19:01:25 -0000
***************
*** 79,84 ****
--- 79,85 ----
  		scanstate->tuplestorestate = tuplestorestate =
  			ExecMakeTableFunctionResult((Expr *) scanstate->funcexpr,
  										econtext,
+ 										scanstate->tupdesc,
  										&funcTupdesc);
  
  		/*
Index: src/include/executor/executor.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/executor/executor.h,v
retrieving revision 1.75
diff -c -r1.75 executor.h
*** src/include/executor/executor.h	30 Aug 2002 00:28:41 -0000	1.75
--- src/include/executor/executor.h	30 Aug 2002 19:02:02 -0000
***************
*** 82,87 ****
--- 82,88 ----
  					   ExprDoneCond *isDone);
  extern Tuplestorestate *ExecMakeTableFunctionResult(Expr *funcexpr,
  													ExprContext *econtext,
+ 													TupleDesc queryDesc,
  													TupleDesc *returnDesc);
  extern Datum ExecEvalExpr(Node *expression, ExprContext *econtext,
  			 bool *isNull, ExprDoneCond *isDone);
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/nodes/execnodes.h,v
retrieving revision 1.73
diff -c -r1.73 execnodes.h
*** src/include/nodes/execnodes.h	30 Aug 2002 00:28:41 -0000	1.73
--- src/include/nodes/execnodes.h	30 Aug 2002 19:29:36 -0000
***************
*** 152,157 ****
--- 152,158 ----
  	SetFunctionReturnMode	returnMode;	/* actual return mode */
  	ExprDoneCond isDone;		/* status for ValuePerCall mode */
  	Tuplestorestate *setResult;	/* return object for Materialize mode */
+ 	TupleDesc	queryDesc;		/* descriptor for planned query */
  	TupleDesc	setDesc;		/* descriptor for Materialize mode */
  } ReturnSetInfo;
  
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#11)
Re: Accessing original TupleDesc from SRF

Joe Conway <mail@joeconway.com> writes:

Attached adds:
+ TupleDesc queryDesc; /* descriptor for planned query */
to ReturnSetInfo, and populates ReturnSetInfo for every function call to
ExecMakeTableFunctionResult, not just when fn_retset.

I thought "expectedDesc" was a more sensible choice of name, so I made
it that. Otherwise, patch applied.

I haven't done it yet, but I suppose this should be documented in
xfunc.sgml.

Actually, most of what's in src/backend/utils/fmgr/README should be
transposed into xfunc.sgml someday.

regards, tom lane

#13Joe Conway
mail@joeconway.com
In reply to: John Gray (#1)
1 attachment(s)
Re: Accessing original TupleDesc from SRF

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

Attached adds:
+ TupleDesc queryDesc; /* descriptor for planned query */
to ReturnSetInfo, and populates ReturnSetInfo for every function call to
ExecMakeTableFunctionResult, not just when fn_retset.

I thought "expectedDesc" was a more sensible choice of name, so I made
it that. Otherwise, patch applied.

I was trying to actually use this new feature today, and ran into a
little bug in nodeFunctionscan.c that prevented it from actually working.

For anonymous composite types, ExecInitFunctionScan builds the tuple
description using:
tupdesc = BuildDescForRelation(coldeflist);

But BuildDescForRelation leaves initializes the tupdesc like this:
desc = CreateTemplateTupleDesc(natts, UNDEFOID);

The UNDEFOID later causes an assertion failure in heap_formtuple when
you try to use the tupdesc to build a tuple.

Attached is a very small patch to fix.

I haven't done it yet, but I suppose this should be documented in
xfunc.sgml.

Actually, most of what's in src/backend/utils/fmgr/README should be
transposed into xfunc.sgml someday.

After beta starts I'll work on migrating this to the sgml docs.

Joe

Attachments:

anontyp-bugfix.1.patchtext/plain; name=anontyp-bugfix.1.patchDownload
Index: src/backend/executor/nodeFunctionscan.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/executor/nodeFunctionscan.c,v
retrieving revision 1.9
diff -c -r1.9 nodeFunctionscan.c
*** src/backend/executor/nodeFunctionscan.c	30 Aug 2002 23:59:46 -0000	1.9
--- src/backend/executor/nodeFunctionscan.c	31 Aug 2002 18:01:48 -0000
***************
*** 226,231 ****
--- 226,232 ----
  		List *coldeflist = rte->coldeflist;
  
  		tupdesc = BuildDescForRelation(coldeflist);
+ 		tupdesc->tdhasoid = WITHOUTOID;
  	}
  	else
  		elog(ERROR, "Unknown kind of return type specified for function");
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#13)
Re: Accessing original TupleDesc from SRF

Joe Conway <mail@joeconway.com> writes:

But BuildDescForRelation leaves initializes the tupdesc like this:
desc = CreateTemplateTupleDesc(natts, UNDEFOID);

The UNDEFOID later causes an assertion failure in heap_formtuple when
you try to use the tupdesc to build a tuple.

So far, I haven't seen any reason to think that that three-way has-OID
stuff accomplishes anything but causing trouble ... but I've applied
this patch for the moment. I hope to get around to reviewing the
HeapTupleHeader hacks later in the weekend.

regards, tom lane