Status of plperl inter-sp calling
While waiting for feedback on my earlier plperl refactor and feature
patches I'm working on a further patch that adds, among other things,
fast inter-plperl-sp calling.
I want to outline what I've got and get some feedback on open issues.
To make a call to a stored procedure from plperl you just call the
function name prefixed by SP::. For example:
create function poly() returns text language plperl
as $$ return "poly0" $$;
create function poly(text) returns text language plperl
as $$ return "poly1" $$
create function poly(text, text) returns text language plperl
as $$ return "poly2" $$
create function foo() returns text language plperl as $$
SP::poly();
SP::poly(1);
SP::poly(1,2);
return undef;
$$
That handles the arity of the calls and invokes the right SP, bypassing
SQL if the SP is already loaded.
That much works currently. Behind the scenes, when a stored procedure is
loaded into plperl the code ref for the perl sub is stored in a cache.
Effectively just
$cache{$name}[$nargs] = $coderef;
An SP::AUTOLOAD sub intercepts any SP::* call and effectively does
lookup_sp($name, \@_)->(@_);
For SPs that are already loaded lookup_sp returns $cache{$name}[$nargs]
so the overhead of the call is very small.
For SPs that are not cached, lookup_sp returns a code ref of a closure
that will invoke $name with the args in @_ via
spi_exec_query("select * from $name($encoded_args)");
The fallback-to-SQL behaviour neatly handles non-cached SPs (forcing
them to be loaded and thus cached), and inter-language calling (both
plperl<->plperl and other PLs).
Limitations:
* It's not meant to handle type polymorphism, only the number of args.
* When invoked via SQL, because the SP isn't cached, all non-ref args
are all expressed as strings via quote_nullable(). Any array refs
are encoded as ARRAY[...] via encode_array_constructor().
I don't see either of those as significant issues: "If you need more
control for a particular SP then don't use SP::* to call that SP."
Open issues:
* What should SP::foo(...) return? The plain as-if-called-by-perl
return value, or something closer to what spi_exec_query() returns?
* If the called SP::foo(...) calls return_next those rows are returned
directly to the client. That can be construed as a feature.
* Cache invalidation. How can I hook into an SP being dropped so I can
pro-actively invalidate the cache?
* Probably many other things I've not thought of.
This is all a little rough and exploratory at the moment.
I'm very keen to get any feedback you might have.
Tim.
p.s. Happy New Year! (I may be off-line for a few days.)
On Wed, Dec 30, 2009 at 5:54 PM, Tim Bunce <Tim.Bunce@pobox.com> wrote:
That much works currently. Behind the scenes, when a stored procedure is
loaded into plperl the code ref for the perl sub is stored in a cache.
Effectively just
$cache{$name}[$nargs] = $coderef;
That doesn't seem like enough to guarantee that you've got the right
function. What if you have two functions with the same number of
arguments but different argument types? And what about optional
arguments, variable arguments, etc.?
...Robert
On Dec 30, 2009, at 4:17 PM, Robert Haas wrote:
That much works currently. Behind the scenes, when a stored procedure is
loaded into plperl the code ref for the perl sub is stored in a cache.
Effectively just
$cache{$name}[$nargs] = $coderef;That doesn't seem like enough to guarantee that you've got the right
function. What if you have two functions with the same number of
arguments but different argument types? And what about optional
arguments, variable arguments, etc.?
As Tim said elsewhere:
I don't see either of those as significant issues: "If you need more
control for a particular SP then don't use SP::* to call that SP."
Best,
Davdi
"David E. Wheeler" <david@kineticode.com> writes:
On Dec 30, 2009, at 4:17 PM, Robert Haas wrote:
That doesn't seem like enough to guarantee that you've got the right
function.
As Tim said elsewhere:
I don't see either of those as significant issues: "If you need more
control for a particular SP then don't use SP::* to call that SP."
If the thing actively fails when there's more than one possible match,
that might be ok. Randomly choosing a match, not so much.
regards, tom lane
On Dec 30, 2009, at 2:54 PM, Tim Bunce wrote:
That handles the arity of the calls and invokes the right SP, bypassing
SQL if the SP is already loaded.
Nice.
That much works currently. Behind the scenes, when a stored procedure is
loaded into plperl the code ref for the perl sub is stored in a cache.
Effectively just
$cache{$name}[$nargs] = $coderef;
An SP::AUTOLOAD sub intercepts any SP::* call and effectively does
lookup_sp($name, \@_)->(@_);
For SPs that are already loaded lookup_sp returns $cache{$name}[$nargs]
so the overhead of the call is very small.
Definite benefit, there. How does it handle the difference between IMMUTABLE | STABLE | VOLATILE, as well as STRICT functions? And what does it do if the function called is not actually a Perl function?
For SPs that are not cached, lookup_sp returns a code ref of a closure
that will invoke $name with the args in @_ via
spi_exec_query("select * from $name($encoded_args)");The fallback-to-SQL behaviour neatly handles non-cached SPs (forcing
them to be loaded and thus cached), and inter-language calling (both
plperl<->plperl and other PLs).
Is there a way for such a function to be cached? If not, I'm not sure where cached functions come from.
Limitations:
* It's not meant to handle type polymorphism, only the number of args.
Well, spi_exec_query() handles the type polymorphism. So might it be possible to call SP::function() and have it not use a cached query? That way, one gets the benefit of polymorphism. Maybe there's a SP package that does caching, and an SPI package that does not? (Better named, though.)
* When invoked via SQL, because the SP isn't cached, all non-ref args
are all expressed as strings via quote_nullable(). Any array refs
are encoded as ARRAY[...] via encode_array_constructor().
Hrm. Why not use spi_prepare() and let spi_exec_prepared() handle the quoting?
I don't see either of those as significant issues: "If you need more
control for a particular SP then don't use SP::* to call that SP."
If there was a non-cached version that was essentially just sugar for the SPI stuff, I think that would be more predicable, no? I'm not saying there shouldn't be a cached interface, just that it should not be the first choice when using polymorphic functions and non-PL/Perl functions.
Open issues:
* What should SP::foo(...) return? The plain as-if-called-by-perl
return value, or something closer to what spi_exec_query() returns?
The former.
* If the called SP::foo(...) calls return_next those rows are returned
directly to the client. That can be construed as a feature.
As a list?
Best,
David
On Wed, Dec 30, 2009 at 7:41 PM, David E. Wheeler <david@kineticode.com> wrote:
On Dec 30, 2009, at 4:17 PM, Robert Haas wrote:
That much works currently. Behind the scenes, when a stored procedure is
loaded into plperl the code ref for the perl sub is stored in a cache.
Effectively just
$cache{$name}[$nargs] = $coderef;That doesn't seem like enough to guarantee that you've got the right
function. What if you have two functions with the same number of
arguments but different argument types? And what about optional
arguments, variable arguments, etc.?As Tim said elsewhere:
I don't see either of those as significant issues: "If you need more
control for a particular SP then don't use SP::* to call that SP."
Sorry, I missed that. I guess it seems weird to me to handle
overloading, but only partially. If we're OK with punting, why not
punt the whole thing and just have $cache{$name} = $coderef?
...Robert
Ok, Plan B...
Consider this (hypothetical) example:
CREATE OR REPLACE FUNCTION test() ... LANGUAGE plperl AS $$
use SP foo_int => 'foo(int)';
use SP foo_text => 'foo(text)', -cached;
foo_int(42);
foo_text(42);
...
$$
Here the user is importing into their function, at load/compile-time,
aliases for specific stored procedures with specific type signatures.
The importer builds and imports a custom closure.
At its most basic it would be something like:
my $h = spi_prepare('select foo($1)', 'text');
return sub { spi_exec_prepared($h, @_)->{rows} }
or perhaps, with added lazy smartness:
my $mk = sub { spi_prepare('select foo($1)', 'text') };
my $h; # initialized on first use
record_handle_for_later_freeing_if_needed(\$h);
return sub { spi_exec_prepared($h ||= $mk->(), @_)->{rows} }
As much as possible has been pre-computed. All foo_text() does is
call spi_exec_prepared and do something (to be decided) with the results.
That's likely to be fast enough to negate much of the desire for
caching. It'll also work for all functions in all languages.
I added an example with -cached above to indicate how extra attributes
could be specified to influence the behaviour of the import-time code
builder.
The code builder only needs to handle a few simple cases initially.
Enough to cover at least nargs and type polymorphism. I'd guess that
VARADIC won't be too hard, but I'll probably skip OUT & INOUT.
There probably won't be explicit support for DEFAULT args - just
import another alias that has the default arg missing.
The only question I have at the moment, before I try implementing this,
is the the need for freeing the plan. When would that be needed?
(Note that this scheme will only generate a fixed set of plans,
one per specific function name and type signature.)
Can someone give me some real-world examples? For example, does a plan
become 'broken' if an object it references gets dropped and recreated?
Assuming it does, or there's some other need to free/recreate plans,
then I can add a function call to do that (by recording a reference
to the $h's in the example above and using that to undef them).
Does the above sound workable? Anything I've missed?
Tim.
p.s. My earlier plperl feature patch enabled the use of 'use' within
plperl stored procedures - but only for modules that have been explicitly
configured and pre-loaded.
Tim Bunce <Tim.Bunce@pobox.com> writes:
The only question I have at the moment, before I try implementing this,
is the the need for freeing the plan. When would that be needed?
There's probably no strong need to do it at all, unless you are dropping
your last reference to the plan.
regards, tom lane
On Tue, Jan 05, 2010 at 07:06:35PM -0500, Tom Lane wrote:
Tim Bunce <Tim.Bunce@pobox.com> writes:
The only question I have at the moment, before I try implementing this,
is the the need for freeing the plan. When would that be needed?There's probably no strong need to do it at all,
That's good.
unless you are dropping your last reference to the plan.
Uh, now I'm confused again. The way I envisage it, each imported
function would contain a plan. So each would have the one and only
reference to that plan. So, if there was a need to drop them, I would be
dropping the last reference to the plan.
Let me ask the question another way... is there any reason to drop plans
other than limiting memory usage?
I couldn't find anything in the docs to suggest there was but want to be
sure.
Tim.
Tim Bunce <Tim.Bunce@pobox.com> writes:
Let me ask the question another way... is there any reason to drop plans
other than limiting memory usage?
No, that's about it. The only reason to care is if you'd otherwise have
a code path that would repetitively leak plans.
regards, tom lane