functions returning sets

Started by Jeff Rogersover 22 years ago17 messagesgeneral
Jump to latest
#1Jeff Rogers
jrogers@findlaw.com

I have a set-returning function that takes a text string and returns multiple
values from it. Its a small hack on the pgxml_xpath function from contrib/xml
that returns all matching nodes from an xml document rather than just a single
specified node.

I currently use it as follows:

create table xml_files (file text, doc text) ;
create view xnodes as select file, pgxml_xpath(doc, '/top/node') from
xml_files;

select * from xnodes ;

This gives me a table of all the matching nodes in all the documents.
However, the documentation says that using a SRF in the select list of a
query, but this capability is deprecated. I can't figure out how to call this
function in the from clause with it referring to a column in a table, I get an
error like
ERROR: FROM function expression may not refer to other relations of same
query level. Is there another way to accomplish this?

Another useful way to call my function would be
select file from xml_files where 'foo' in pgxml_xpath(doc,'/top/node')
but that gives be a parse error.
select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/node'))
parses, but it doesn't seem to give correct results.

Thanks
-J

PS: Here's the function. This goes with pgxml.c, not pgxml_dom.c, bt could
probably be modified to work there as well.

PG_FUNCTION_INFO_V1(pgxml_xpath_all);

Datum
pgxml_xpath_all(PG_FUNCTION_ARGS) {
/* called as pgxml_xpath(document,pathstr) */
/* returns set of all matching results */

XPath_Results *xpresults;
text *restext;
MemoryContext oldContext;
FuncCallContext *funcctx;

text *t = PG_GETARG_TEXT_P(0); /* document buffer */
text *t2 = PG_GETARG_TEXT_P(1);
int32 ind;

if (SRF_IS_FIRSTCALL()) {
funcctx=SRF_FIRSTCALL_INIT();
oldContext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
xpresults = build_xpath_results(t, t2);
funcctx->user_fctx=xpresults;
MemoryContextSwitchTo(oldContext);
}

funcctx = SRF_PERCALL_SETUP();
xpresults=funcctx->user_fctx;

ind=funcctx->call_cntr;

if (xpresults == NULL || ind >= xpresults->rescount) {
if (xpresults != NULL) {
pfree(xpresults->resbuf);
pfree(xpresults);
}
SRF_RETURN_DONE(funcctx);
}

restext = (text *) palloc(xpresults->reslens[ind] + VARHDRSZ);
memcpy(VARDATA(restext), xpresults->results[ind], xpresults->reslens[ind]);

VARATT_SIZEP(restext) = xpresults->reslens[ind] + VARHDRSZ;

SRF_RETURN_NEXT(funcctx,restext);
}

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Rogers (#1)
Re: functions returning sets

Jeff Rogers <jrogers@findlaw.com> writes:

However, the documentation says that using a SRF in the select list of
a query, but this capability is deprecated. I can't figure out how to
call this function in the from clause with it referring to a column in
a table, I get an error like
ERROR: FROM function expression may not refer to other relations of same
query level. Is there another way to accomplish this?

There isn't any good alternative at the moment (which is why SRFs in
select lists aren't deprecated yet). There has been some discussion
about implementing SQL99's LATERAL clause to support this, but it's
not done yet.

select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/node'))
parses, but it doesn't seem to give correct results.

That should work as far as I know. Can you give more detail?

regards, tom lane

#3Jeff Rogers
jrogers@findlaw.com
In reply to: Tom Lane (#2)
Re: functions returning sets

select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/nod

e'))

parses, but it doesn't seem to give correct results.

That should work as far as I know. Can you give more detail?

regards, tom lane

Cut & paste from a window:

xml=# create table foo (id text, doc text) ;
CREATE TABLE
xml=# insert into foo values (1,'<top><node>a</node><node>b</node></top>');
INSERT 22106552 1
xml=# insert into foo values (2,'<top><node>a</node><node>b</node></top>');
INSERT 22106553 1
xml=# insert into foo values (3,'<top><node>a</node><node>b</node></top>');
INSERT 22106554 1
xml=# select id,pgxml_xpath(doc,'/top/node') from foo ;
id | pgxml_xpath
----+-------------
1 | a
1 | b
2 | a
2 | b
3 | a
3 | b
(6 rows)

xml=# select id from foo where 'a' in (select pgxml_xpath(doc,'/top/node'));
id
----
1
3
(2 rows)

I expect this latter to return all the ids, but it seems to only return every
other one.

-J

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Rogers (#3)
Re: functions returning sets

Jeff Rogers <jrogers@findlaw.com> writes:

I expect this latter to return all the ids, but it seems to only return every
other one.

Hmm, this looks like a bug having to do with not resetting state
correctly for the next invocation of the SRF. Hard to tell whether
the bug is in your code or the system's support for SRFs though.

Joe, did you see
http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php
? I don't have time to look at this right now ...

regards, tom lane

#5Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#4)
Re: functions returning sets

Tom Lane wrote:

Jeff Rogers <jrogers@findlaw.com> writes:

I expect this latter to return all the ids, but it seems to only return every
other one.

Hmm, this looks like a bug having to do with not resetting state
correctly for the next invocation of the SRF. Hard to tell whether
the bug is in your code or the system's support for SRFs though.

Joe, did you see
http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php
? I don't have time to look at this right now ...

I'll try to take a look in the next day or so -- hopefully tonight.

Joe

#6Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#4)
Re: functions returning sets

Tom Lane wrote:

Jeff Rogers <jrogers@findlaw.com> writes:

I expect this latter to return all the ids, but it seems to only return every
other one.

Hmm, this looks like a bug having to do with not resetting state
correctly for the next invocation of the SRF. Hard to tell whether
the bug is in your code or the system's support for SRFs though.

Joe, did you see
http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php
? I don't have time to look at this right now ...

I've not gotten to the bottom of this, but I can confirm the issue.

The way I would expect this query to be written does work:

regression=# select id from foo where 'a' in (select * from
pgxml_xpath(doc,'/top/node'));
id
----
1
2
3
(3 rows)

So the problem seems to be related specifically to "SRF returning setof
scalar in targetlist of subselect". Maybe some difference between
ExecMakeFunctionResult and ExecMakeTableFunctionResult? SRFs returning
complex types are not allowed in targetlists, but in this case there is
no restriction because the return type is scalar.

I'll keep digging as time allows.

Joe

#7Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#4)
Re: functions returning sets

Tom Lane wrote:

Hmm, this looks like a bug having to do with not resetting state
correctly for the next invocation of the SRF. Hard to tell whether
the bug is in your code or the system's support for SRFs though.

Looks like a bug in ExecScanSubPlan() around line 401:
8<-----------------------------------------------------
if (subLinkType == ANY_SUBLINK)
{
/* combine across rows per OR semantics */
if (rownull)
*isNull = true;
else if (DatumGetBool(rowresult))
{
result = BoolGetDatum(true);
*isNull = false;
break; /* needn't look at any more rows */
}
}
8<-----------------------------------------------------

If the function returns a set, that comment is wrong, isn't it? I
removed the "break" to run out all the available tuples and got the
correct result:

regression=# select id from foo where 'a' in (select
pgxml_xpath(doc,'/top/node'));
id
----
1
2
3
(3 rows)

Any guidance on the preferred fix? I don't see an obvious way to
determine if the plan includes a set returning function in
ExecScanSubPlan(), and it seems unfortunate to remove the optimization
for all other cases, especially since targetlist SRFs are deprecated.

Joe

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#7)
Re: functions returning sets

Joe Conway <mail@joeconway.com> writes:

Any guidance on the preferred fix?

We cannot fix this by changing ExecScanSubPlan as you suggest.
That would amount to saying that all plans have to be run to completion,
which destroys LIMIT to name just one unpleasant consequence.

There is a mechanism available for functions to arrange to get cleanup
callbacks when a containing plan is shut down early --- see
RegisterExprContextCallback and ShutdownExprContext. It looks like this
is not used by the existing SRF support, but I suspect it should be.

[ scratches head ... ] Right at the moment I don't see where
ShutdownExprContext gets called during a ReScan operation. I'm quite
sure it once was ... there may be another bug here ...

regards, tom lane

#9Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#8)
Re: functions returning sets

Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

Any guidance on the preferred fix?

We cannot fix this by changing ExecScanSubPlan as you suggest.
That would amount to saying that all plans have to be run to completion,
which destroys LIMIT to name just one unpleasant consequence.

I suspected as much.

There is a mechanism available for functions to arrange to get cleanup
callbacks when a containing plan is shut down early --- see
RegisterExprContextCallback and ShutdownExprContext. It looks like this
is not used by the existing SRF support, but I suspect it should be.

OK -- I'll take a look.

[ scratches head ... ] Right at the moment I don't see where
ShutdownExprContext gets called during a ReScan operation. I'm quite
sure it once was ... there may be another bug here ...

Thanks, I'll keep that in mind too.

Joe

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#9)
Re: functions returning sets

Tom Lane wrote:

[ scratches head ... ] Right at the moment I don't see where
ShutdownExprContext gets called during a ReScan operation. I'm quite
sure it once was ... there may be another bug here ...

After further looking I've realized that memory is misserving me here;
having ReScan call ShutdownExprContext was not something that ever got
done. Instead I have an entry on my personal todo list that says

: Need to invent ExprContextRescan and call it at appropriate places.
: Knowing where all the econtexts are seems like the hard part ... though
: maybe we only care about econtexts that might contain set-returning
: functions, which might limit it to the targetlist...

A perfectly clean solution would require being careful to reset *all*
econtexts, which might be thought rather a lot of work to support a
feature that's eventually going to be deprecated anyway (viz, SRFs
outside of FROM). I'll see about the tlist-only case though.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: functions returning sets

I said:

: Need to invent ExprContextRescan and call it at appropriate places.

A perfectly clean solution would require being careful to reset *all*
econtexts, which might be thought rather a lot of work to support a
feature that's eventually going to be deprecated anyway (viz, SRFs
outside of FROM). I'll see about the tlist-only case though.

Actually, some study shows that we *only* support set-returning
functions in ExecProject(), which is only used with the ps_ExprContext
econtext of plan nodes, which makes it pretty easy to ensure that ReScan
gets everything it needs to. I have committed a patch into 7.4 and HEAD
branches that ensures shutdown callback functions are called during
ReScan. I've tested that the problem is fixed for SQL-language
functions; for example in the regression database do

create function foo(int) returns setof int as
'select unique1 from tenk1' language sql;

and compare the output of

select foo(1) limit 10;
select unique1 from tenk1 where unique1 in (select foo(1) limit 10);
select unique1 from tenk1 where unique1 in (select foo(ten) limit 10);

These should be the same (up to ordering) but are not without the patch.

We still need a code addition that uses an ExprState shutdown callback
to reset the state of an SRF that uses the SRF_XXX macros. Joe, have
you had time to give that any thought?

regards, tom lane

#12Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#11)
Re: functions returning sets

Tom Lane wrote:

We still need a code addition that uses an ExprState shutdown callback
to reset the state of an SRF that uses the SRF_XXX macros. Joe, have
you had time to give that any thought?

Yeah, I've gotten something started, but couldn't do much without the
shutdown getting called. I hadn't yet figured out the best way to make
that happen, so I'm glad you did ;-). I'll update to cvs tip and try
Jeff's function with my changes and your committed changes. Will get
back shortly with the results.

Joe

#13Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#12)
Re: functions returning sets

Joe Conway wrote:

Tom Lane wrote:

We still need a code addition that uses an ExprState shutdown callback
to reset the state of an SRF that uses the SRF_XXX macros. Joe, have
you had time to give that any thought?

Yeah, I've gotten something started, but couldn't do much without the
shutdown getting called. I hadn't yet figured out the best way to make
that happen, so I'm glad you did ;-). I'll update to cvs tip and try
Jeff's function with my changes and your committed changes. Will get
back shortly with the results.

OK, updated to cvs tip, and with the attached patch applied, all seems well:

regression=# select id from foo where 'a' in (select
pgxml_xpath(doc,'/top/node'));
id
----
1
2
3
(3 rows)

Any comment on the patch?

BTW, I am seeing:
[...]
test portals_p2 ... ok
test rules ... FAILED
test foreign_key ... ok
[...]
but it seems unrelated to this change -- caused by a redefinition of the
pg_stats view. I guess I need to initdb.

Thanks,

Joe

Attachments:

current.difftext/plain; name=current.diffDownload+34-9
#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#13)
Re: functions returning sets

Joe Conway <mail@joeconway.com> writes:

Any comment on the patch?

It seems like a bad idea to change the contents of the SRF_XXX() macros
for 7.4.1; if we do that, any existing already-compiled-for-7.4 user
SRFs will be broken, and there's no easy way to catch the problem.
We don't normally require people to recompile user-defined functions for
dot releases anyway.

I would suggest leaving end_MultiFuncCall() with its existing API,
and adding a separate shutdown callback function that is registered
during init_MultiFuncCall and deregistered by end_MultiFuncCall.
(This should be workable without API change since init_MultiFuncCall
can get to the econtext via fcinfo->resultinfo.)

You may not even need to add any fields to FuncCallContext --- consider
passing the fcinfo pointer to the callback, rather than passing the
FuncCallContext pointer.

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: functions returning sets

You may not even need to add any fields to FuncCallContext --- consider
passing the fcinfo pointer to the callback, rather than passing the
FuncCallContext pointer.

Dept. of second thoughts: better pass the flinfo pointer, instead.
fcinfo might point to temporary space on the stack.

regards, tom lane

#16Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#15)
Re: functions returning sets

Tom Lane wrote:

You may not even need to add any fields to FuncCallContext --- consider
passing the fcinfo pointer to the callback, rather than passing the
FuncCallContext pointer.

Dept. of second thoughts: better pass the flinfo pointer, instead.
fcinfo might point to temporary space on the stack.

OK -- this one is a good bit simpler. Any more comments?

Joe

Attachments:

funcapi-fix.02.patchtext/plain; name=funcapi-fix.02.patchDownload+39-30
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#16)
Re: functions returning sets

Joe Conway <mail@joeconway.com> writes:

OK -- this one is a good bit simpler. Any more comments?

Looks good to me. Please apply to 7.4 and HEAD.

regards, tom lane