SQL-Invoked Procedures for 8.1
Hi all,
Following is a proposal to implement what SQL2003 calls 'SQL-Invoked
Procedures' and what most people refer to as stored procedures. Fujitsu
will be funding Neil Conway and I to work on this feature.
Procedures
----------
Procedures are nearly identical to functions. From my reading of
SQL2003 (see sections 4.27 and 11.50) procedures are different from
functions in three ways:
1) Unlike functions, procedures cannot return values.
2) Procedures support parameter modes which specify whether an argument is
IN, OUT or IN OUT
3) Procedures can be run in the same savepoint level as the caller when
OLD SAVEPOINT LEVEL is specified at creation time. According to SQL2003,
functions must be run on a new savepoint level. From my understanding, we
do not do this currently.
Work will focus on 1 and 2 until we have the concept of savepoint levels
with functions. Its possible that we will implement this too if there is
demand.
This makes the difference between procedures and functions quite
superficial: procedures are functions which return void and have parameter
modes. As such, I think we can largely wrap around the existing function
creation code using this rule. That is, CREATE PROCEDURE could and I think
should be syntactic sugar.
If we go down the route of saying that procedures are a type of function,
we have the option of allowing users access to OUT and INOUT in functions.
This would make procedures simply a subset of functions. What do people think?
Grammar modifications
---------------------
The grammar for creating a procedure, which is a subset of the SQL 2003
grammar, is as follows:
CREATE [ OR REPLACE ] PROCEDURE <name> ( [ [ IN | OUT | IN OUT ] [ name ] type, ... ] )
<routine characteristics>
<routine body>
<routine characteristics> are LANGUAGE, null-call and security clauses,
all of which are already implemented. <routine body> is the body, same as
with the existing functions implementation.
DROP PROCEDURE and CALL will also need to be added.
System catalog changes
-----------------------
There will be cases when we need to identify whether a routine is a
function or a procedure. This could be done two ways. We could say that
any proc in pg_proc which returns void is a procedure or we could store this
in some 'protype' column. Thoughts?
To handle parameter modes, we will also need to add an array of "char"s so
that we can determine the mode of an element in proargnames[].
Invocation
----------
Since procedures do not return a value they are invoked differently than
functions are (at least according to SQL2003). This is because if they were
used in a SELECT, UPDATE, DELETE or INSERT clause, their effect would be
ambiguous at best. ie,
SELECT * FROM foo WHERE bar();
Doesn't make much sense.
In SQL2003, procedures are invoked using:
CALL foo();
One of our original goals was to allow easier invocation of stored
procedures from PL/PgSQL, by allowing "foo()" rather than "CALL foo()"
or "PERFORM foo()". Neil submitted some preliminary patches
implementing this; following discussion with Tom, it was agreed to
make CALL optional at the SQL level, which provides the same effect:
http://archives.postgresql.org/pgsql-hackers/2004-09/msg00412.php
In this sense, invoking a procedure follows the same rules as invoking a
function which returns void.
Procedure arguments
-------------------
From what I can tell, there are only 3 ways to pass data to a
procedure:
1) As a literal value:
CALL foo('Then one day....');
CALL bar(1, 2, 3);
CALL baz(current_timestamp)
Note that this only works for parameters which are of mode IN (see
more below).
2) As a variable
Other databases, and SQL2003, support a few different implementations what
could be called variables. In SQL2003, there are a few types of
'variables':
i) Host parameters
ii) Column references to transition variables (OLD and NEW in the case of
triggers and rules)
iii) Dynamic parameters (ie, ? in the case of prepare)
iv) Embedded variables
(i) and (iv) are basically bindings to the client side variables. They
provide a mechanism for interface designers to allow users to associate
variables with a procedure, execute that procedure, and have the new value
of the variables available to the language once the execution call
returns. JDBC 3.0 focuses fairly heavily on this kind of thing, for
example.
I think we only need implement one of these and I discuss it below under
'Host Variables'.
3) A transition variable
These are the NEW and OLD variables in trigger and rule specifications.
If we are calling a procedure, we will need to make sure that every
argument is one of these.
Only (2) and (3) can be passed as parameters which are marked OUT or
INOUT. SQL2003 makes the following distinction between parameter modes:
IN - A 'read only' argument. It must be initialised.
OUT - An uninitialised argument which, from memory, SQL2003 says *must* be
initialised by the function itself.
INOUT - A combination of the two above. That is, an initialised argument
which can be written to.
Host Variables
--------------
Host variables are required to make INOUT and OUT parameters useful,
so this is an important part of the implementation of procedures.
The parser will need to be modified to recognise the host parameter syntax,
which is ':<varname>'. We could restrict this to something like CALL or
expand it so that normal queries could use variables too. I cannot
think of any reason to restrict use but I'm open to what people think.
We'll need a host variable node just like we have Const, etc, so that
the variable can be propagated through the system. The variable should be
substituted within the executor. This ensures that EXPLAIN wont break.
Then again, if we don't allow host parameters in normal queries, its not
a problem.
If we do, I think we could get away with piggy backing on the PREPARE/EXECUTE
code with the possibility of allowing the planner to take a look at the current
value of a given host variable so that it can generate a better plan.
Then again, using variables in standard queries may just be a very bad
idea.
As for the creation and modification of host parameters, we have two options:
an SQL language construct, which MySQL uses for example, or a protocol level
mechanism which Oracle appears to use.
As far as I can understand from SQL2003, host parameters are predefine there
at procedure creation time. I don't really get this and I haven't seen it in
another database. If someone could explain it and thinks it would serve us
better than that covered below, please let me know :-).
If we go down the grammar route, I'd imagine that we do something like:
SET VARIABLE <varname> = <val>
SHOW VARIABLE <varname>
I've used SET and SHOW because they're intuitive. Obviously these are used
by GUC and I'm not proposing we use GUC for host parameters as we want a
session-local implementation. It would be useful if we could also add some
type information to this so make procedure candidate selection easier. Ideas?
The other option is that we do it at the protocol level and modify libpq
(and psql) to add support. This would allow us something like:
template1=# \vc x int -- create a variable 'x' of type int
template1=# \vs x 2 -- set it to '2'
template1=# CALL proc_with_inout_param(:x);
CALL
template1=# \vp x -- print 'x'
120
That might need some massaging but you get the idea. The commands to psql
translate to lower level protocol level commands which are: create
variable, set (ie, modify the value of) variable and describe variable. Neil
thinks we should have create and set in one step. This is great, since it
covers most cases, but I'm not sure how we have uninitialised values. Perhaps
we set to NULL?
I think interacting with variables as above is actually quite ugly as well
but that has nothing to do with implementation at the protocol level rather
my own lack of UI skills :-).
So, what does doing this at the protocol level give us? Well, it keeps it out
of the grammar and decreases overhead for interfaces. Still, those reasons
aren't great when we're look at a protocol modification. I'd love to hear
other people's ideas here: maybe I've overlooked a much more attractive
alternative.
Getting back to the host variables themselves.
Most implementations I've seen only support a very limited range of types,
such as varchar, numeric and int. I don't see any reason why we can't
support the full range of types available in the system.
The visibility of variables is restricted to the session which created
them. That is, they're just like our implementation of temporary tables.
The only other question (that I can think of now) with respect to
variables is how they are affected by transactions. My gut feel is
that they should have transactional semantics. For example:
template1=# \vc x int
template1=# \vs x 2
template1=# BEGIN;
BEGIN
template1=# CALL proc_which_sets_arg_to_100(:x);
CALL
template1=# ABORT;
template1=# \vp x
What should 'x' be? Following transactional semantics, it should be 2.
The only problem is, implementing the transaction semantics might be
painful. Bruce made a suggestion to me a while ago that we could just put
the variable in a temp table. But that limits the scope for types and it
also may slow performance in a long running transaction which modifies a
variable a lot. That is, its a lot of overhead.
The other option is a hash (or some dynamic structure) in which we
attach some transaction information to each variable. We will need to
process this from CommitTransaction()/AbortTransaction(). We'll also
need to be subtransaction aware. Because we don't need to manage
concurrent access to variables, the logic needed to implement
transactional semantics by hand shouldn't be too difficult.
PL/PgSQL
--------
Its in PL/PgSQL that procedures will be most intensively used, I
believe. Since variables are already supported in the language we
will need to modify that system to make OUT and INOUT parameters
work. Currently, we copy each the argument before we call a function
so that any (potential) write to it will not be reflected in the
caller. We're also protected here by the executor which doesn't allow
us to update arguments since they are marked as constants. We'll also
need to tweak this.
The other thing we will need to look out for if we're executing a
procedure is if the user attempts to RETURN a non-void value. This would be an
error, of course. Simply using RETURN to return control to the caller shouldn't
be a problem in my opinion, however. Neil has also pointed out to me that
this is also the exact behaviour required of functions returning void.
Permissions
-----------
We can use the same permissions as for functions.
----
I will be away for the next few days (until Monday). I hope to have Internet
access but I'm not sure. Neil and I have worked through this together so
he can also answer any questions too.
Gavin
Hi Gavin,
Although I have not read the SQL 2003 spec, my recollection of other database
products' stored procs differed from your description in one significant way,
namely that they could return multiple (and varied) sets of rows.
For example, a stored proc could do a SELECT over foo and then a SELECT over
bar and return the tuples of both foo and bar. (each having different column
counts, types, etc)
The JDBC interfaces would appear to illustrate this point.
(In CallableStatement)
"A CallableStatement can return one ResultSet object or multiple ResultSet
objects. Multiple ResultSet objects are handled using operations inherited
from Statement."
Regards,
Grant
Gavin Sherry wrote:
Hi all,
Following is a proposal to implement what SQL2003 calls 'SQL-Invoked
Procedures' and what most people refer to as stored procedures. Fujitsu
will be funding Neil Conway and I to work on this feature.
[lots of interesting detail]
On Thu, 23 Sep 2004, Grant Finnemore wrote:
Hi Gavin,
Although I have not read the SQL 2003 spec, my recollection of other database
products' stored procs differed from your description in one significant way,
namely that they could return multiple (and varied) sets of rows.For example, a stored proc could do a SELECT over foo and then a SELECT over
bar and return the tuples of both foo and bar. (each having different column
counts, types, etc)The JDBC interfaces would appear to illustrate this point.
(In CallableStatement)"A CallableStatement can return one ResultSet object or multiple ResultSet
objects. Multiple ResultSet objects are handled using operations inherited
from Statement."
I read the JDBC 3.0 spec and didn't see this. Like with all specs, some
details are hard to find. However, from what I've seen in the spec, I
think they have functions in mind here. That being said, I can't think how
SQL2003 would allow such behaviour. If you could show us an example,
that'd be great.
Thanks,
Gavin
Quoth the JDBC spec:
public interface CallableStatement
extends PreparedStatement
The interface used to execute SQL stored procedures. The JDBC API provides a
stored procedure SQL escape syntax that allows stored procedures to be called
in a standard way for all RDBMSs. This escape syntax has one form that includes
a result parameter and one that does not. If used, the result parameter must be
registered as an OUT parameter. The other parameters can be used for input,
output or both. Parameters are referred to sequentially, by number, with the
first parameter being 1.
{?= call <procedure-name>[<arg1>,<arg2>, ...]}
{call <procedure-name>[<arg1>,<arg2>, ...]}
IN parameter values are set using the set methods inherited from
PreparedStatement. The type of all OUT parameters must be registered prior to
executing the stored procedure; their values are retrieved after execution via
the get methods provided here.
A CallableStatement can return one ResultSet object or multiple ResultSet
objects. Multiple ResultSet objects are handled using operations inherited from
Statement.
For maximum portability, a call's ResultSet objects and update counts should be
processed prior to getting the values of output parameters.
Regards,
Grant
Gavin Sherry wrote:
Show quoted text
On Thu, 23 Sep 2004, Grant Finnemore wrote:
Hi Gavin,
Although I have not read the SQL 2003 spec, my recollection of other database
products' stored procs differed from your description in one significant way,
namely that they could return multiple (and varied) sets of rows.For example, a stored proc could do a SELECT over foo and then a SELECT over
bar and return the tuples of both foo and bar. (each having different column
counts, types, etc)The JDBC interfaces would appear to illustrate this point.
(In CallableStatement)"A CallableStatement can return one ResultSet object or multiple ResultSet
objects. Multiple ResultSet objects are handled using operations inherited
from Statement."I read the JDBC 3.0 spec and didn't see this. Like with all specs, some
details are hard to find. However, from what I've seen in the spec, I
think they have functions in mind here. That being said, I can't think how
SQL2003 would allow such behaviour. If you could show us an example,
that'd be great.Thanks,
Gavin
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
On Thu, 23 Sep 2004, Grant Finnemore wrote:
Quoth the JDBC spec:
public interface CallableStatement
extends PreparedStatementThe interface used to execute SQL stored procedures. The JDBC API provides a
stored procedure SQL escape syntax that allows stored procedures to be called
in a standard way for all RDBMSs. This escape syntax has one form that includes
a result parameter and one that does not. If used, the result parameter must be
registered as an OUT parameter. The other parameters can be used for input,
output or both. Parameters are referred to sequentially, by number, with the
first parameter being 1.{?= call <procedure-name>[<arg1>,<arg2>, ...]}
{call <procedure-name>[<arg1>,<arg2>, ...]}
I didn't see this in my copy of the spec, which is version 3.0 FR (final
release). Still, I think we're fine. As I said before, what I think the
spec had in mind was allowing functions to be called from the
callablestatement stuff and have their output put in the first OUT
variable.
This is... reasonable. Compare the stuff Neil's been working on with
"bare" function calls in PL/PgSQL and Tom (I think) saying that it might
be reasonable to just issue func(); as an SQL query: no CALL, no SELECT.
IN parameter values are set using the set methods inherited from
PreparedStatement. The type of all OUT parameters must be registered prior to
executing the stored procedure; their values are retrieved after execution via
the get methods provided here.A CallableStatement can return one ResultSet object or multiple ResultSet
objects. Multiple ResultSet objects are handled using operations inherited from
Statement.
I don't get this multiple ResultSet stuff. All I can think of is that the
spec has this in mind:
CallableStatement cstmt = conn.prepareCall("{call foo(); call bar();}");
or
CallableStatement cstmt = conn.prepareCall("{call foo()} {call bar();}");
or some other permutation.
I see plenty of references to multiple ResultSets but I cannot find an
example or information on how to generate one.
Gavin
IN parameter values are set using the set methods inherited from
PreparedStatement. The type of all OUT parameters must beregistered
prior to executing the stored procedure; their values are retrieved
after execution via the get methods provided here.A CallableStatement can return one ResultSet object or multiple
ResultSet objects. Multiple ResultSet objects are handled using
operations inherited from Statement.I don't get this multiple ResultSet stuff. All I can think of
is that the spec has this in mind:CallableStatement cstmt = conn.prepareCall("{call foo(); call
bar();}");or
CallableStatement cstmt = conn.prepareCall("{call foo()}
{call bar();}");or some other permutation.
I see plenty of references to multiple ResultSets but I
cannot find an example or information on how to generate one.
Not a user of JDBC, but this is fairly common in the ADO/ADO.NET world
with MS SQL Server as well (not sure about other dbs and .NET - I'ev
only used it with mssql and pgsql)... As for an example, something along
the line of (though in my cases usually with a *lot* more parameters):
--
CREATE PROCEDURE get_info_for_user(@userid varchar(16)) AS
SELECT something FROM contentstable WHERE userid=@userid
SELECT whatever,somethingelse FROM anothertable WHERE
something=anything
--
You get the point :-)
Then in my .net code I'd do a simple:
SqlDataReader rdr = cmd.ExecuteReader();
... process first result ...
rdr.NextResult();
... process second result...
This is the very simple case. In this case, the only thing you gain is
less server roundtrips and less parsing steps.
In an example of a more complex case, the first part of the stored
procedure will do some complex (and expensive) work to get to a
resulting variable. This variable is then applied to several different
queries after each other, and their respective resultsets are returned
to the client. In this case, you save having to run that complex logic
more than once. (You could cache the result at the client, but if you're
going down that path then you don't need stored procs at all.. It is
usually necessary to keep it in the db to maintain abstraction)
//Magnus
Import Notes
Resolved by subject fallback
Gavin Sherry wrote:
I don't get this multiple ResultSet stuff. All I can think of is that the
spec has this in mind:CallableStatement cstmt = conn.prepareCall("{call foo(); call bar();}");
or
CallableStatement cstmt = conn.prepareCall("{call foo()} {call bar();}");
or some other permutation.
It's not specific to CallableStatement; you can generate multiple
resultsets from a plain Statement, and CallableStatement is just
inheriting that functionality.
The common way of generating multiple resultsets is, indeed, a
multi-statement query. For example:
Statement stmt = conn.createStatement();
stmt.execute("SELECT * FROM foo; SELECT * FROM bar");ResultSet rs1 = stmt.getResultSet();
// process rs1
rs1.close();boolean moreResults = stmt.getMoreResults();
assert moreResults;ResultSet rs2 = stmt.getResultSet();
// process rs2
rs2.close();stmt.close();
AFAIK the multiple-resultset stuff is not *required* functionality in
JDBC, it's just there to support it if it does happen. The postgresql
JDBC driver didn't actually support multiple resultsets at all until
recently.
For function/procedure calls, I'd expect it to look like:
CallableStatement cstmt = conn.prepareCall("{call foo()}; {call bar()}");
and for the driver to turn that into two separate SELECT/CALL/whatever
queries at the protocol level, and manage the multiple resultsets
itself. The current driver doesn't handle multiple call escapes in one
query at all, but that's really just a limitation of the reasonably dumb
call-escape parser it currently has.
I wouldn't worry about this case unless there's some other reason that a
*single* function/procedure call needs to return more than one set of
results.
I see plenty of references to multiple ResultSets but I cannot find an
example or information on how to generate one.
That's because there's no standard way to generate them :)
-O
Hi Magnus,
Yes, this is the situation that I have been thinking about. Specifically
when a single stored procedure returns many recordsets.
Perhaps I should also clarify that the "spec" I have been using is
the JDK javadoc documentation.
Using java with Magnus' procedure:
CallableStatement cs = connection.prepareCall("call get_info_for_user ?");
cs.setString(1, "test");
if(cs.execute()) {
ResultSet rs = cs.getResultSet();
while(rs != null) {
// Process rs
}
}
Regards,
Grant
Magnus Hagander wrote:
[snip]
Show quoted text
Not a user of JDBC, but this is fairly common in the ADO/ADO.NET world
with MS SQL Server as well (not sure about other dbs and .NET - I'ev
only used it with mssql and pgsql)... As for an example, something along
the line of (though in my cases usually with a *lot* more parameters):--
CREATE PROCEDURE get_info_for_user(@userid varchar(16)) AS
SELECT something FROM contentstable WHERE userid=@useridSELECT whatever,somethingelse FROM anothertable WHERE
something=anything
--You get the point :-)
Then in my .net code I'd do a simple:
SqlDataReader rdr = cmd.ExecuteReader();
... process first result ...
rdr.NextResult();
... process second result...
On Thu, 23 Sep 2004, Oliver Jowett wrote:
Gavin Sherry wrote:
I don't get this multiple ResultSet stuff. All I can think of is that the
spec has this in mind:CallableStatement cstmt = conn.prepareCall("{call foo(); call bar();}");
or
CallableStatement cstmt = conn.prepareCall("{call foo()} {call bar();}");
or some other permutation.
It's not specific to CallableStatement; you can generate multiple
resultsets from a plain Statement, and CallableStatement is just
inheriting that functionality.The common way of generating multiple resultsets is, indeed, a
multi-statement query. For example:Statement stmt = conn.createStatement();
stmt.execute("SELECT * FROM foo; SELECT * FROM bar");ResultSet rs1 = stmt.getResultSet();
// process rs1
rs1.close();boolean moreResults = stmt.getMoreResults();
assert moreResults;ResultSet rs2 = stmt.getResultSet();
// process rs2
rs2.close();stmt.close();
AFAIK the multiple-resultset stuff is not *required* functionality in
JDBC, it's just there to support it if it does happen. The postgresql
JDBC driver didn't actually support multiple resultsets at all until
recently.For function/procedure calls, I'd expect it to look like:
CallableStatement cstmt = conn.prepareCall("{call foo()}; {call bar()}");
and for the driver to turn that into two separate SELECT/CALL/whatever
queries at the protocol level, and manage the multiple resultsets
itself. The current driver doesn't handle multiple call escapes in one
query at all, but that's really just a limitation of the reasonably dumb
call-escape parser it currently has.I wouldn't worry about this case unless there's some other reason that a
*single* function/procedure call needs to return more than one set of
results.I see plenty of references to multiple ResultSets but I cannot find an
example or information on how to generate one.That's because there's no standard way to generate them :)
Okay. So, its something that can be handled in the driver. That's what I
thought.
Do you have any idea about databases returning result sets from SQL
procedures (ie, not functions).
Thanks,
Gavin
On N, 2004-09-23 at 11:34, Grant Finnemore wrote:
Hi Magnus,
Yes, this is the situation that I have been thinking about. Specifically
when a single stored procedure returns many recordsets.
IIRC support for returning multiple recordsets was removed from
postgresql fe-be protocol years ago as "nobody ever needs it" ;)
---------------
Hannu
Hannu Krosing wrote:
On N, 2004-09-23 at 11:34, Grant Finnemore wrote:
Hi Magnus,
Yes, this is the situation that I have been thinking about. Specifically
when a single stored procedure returns many recordsets.IIRC support for returning multiple recordsets was removed from
postgresql fe-be protocol years ago as "nobody ever needs it" ;)
Until recently I would have said the same thing.
A couple of months ago I started a new job where they are MS-SQL based,
and they do have several procedures that actually do return multiple
recordsets from a single procedure. However this is the only time that
I've ever seen any procedure actually require it.
Peter
On N, 2004-09-23 at 15:22, Peter Mount wrote:
Hannu Krosing wrote:
On N, 2004-09-23 at 11:34, Grant Finnemore wrote:
Hi Magnus,
Yes, this is the situation that I have been thinking about. Specifically
when a single stored procedure returns many recordsets.IIRC support for returning multiple recordsets was removed from
postgresql fe-be protocol years ago as "nobody ever needs it" ;)Until recently I would have said the same thing.
A couple of months ago I started a new job where they are MS-SQL based,
and they do have several procedures that actually do return multiple
recordsets from a single procedure. However this is the only time that
I've ever seen any procedure actually require it.
Actually the original support was inhetited from Postgres4.2 where
PostQuel had native support for it.
It was probably decided that SQL (which replaced PostQuel) would not
generate such things.
------------
Hannu
Hannu Krosing wrote:
On N, 2004-09-23 at 15:22, Peter Mount wrote:
Hannu Krosing wrote:
On N, 2004-09-23 at 11:34, Grant Finnemore wrote:
Hi Magnus,
Yes, this is the situation that I have been thinking about. Specifically
when a single stored procedure returns many recordsets.IIRC support for returning multiple recordsets was removed from
postgresql fe-be protocol years ago as "nobody ever needs it" ;)Until recently I would have said the same thing.
A couple of months ago I started a new job where they are MS-SQL based,
and they do have several procedures that actually do return multiple
recordsets from a single procedure. However this is the only time that
I've ever seen any procedure actually require it.Actually the original support was inhetited from Postgres4.2 where
PostQuel had native support for it.It was probably decided that SQL (which replaced PostQuel) would not
generate such things.
I'm not so sure as both JDBC & ODBC have implicit support for it and
they are younger than SQL.
ie: with Prepared/Callable statements in JDBC you are supposed to check
for the existence of any other ResultSets when reading the results...
this was what I had to do with CallableStatement last week with MSSQL.
Peter
Hannu Krosing <hannu@tm.ee> writes:
IIRC support for returning multiple recordsets was removed from
postgresql fe-be protocol years ago as "nobody ever needs it" ;)
The protocol can still do it, and so can the backend, but it will
certainly break most if not all clients. Here's an example:
regression=# create table surprise(f1 text);
CREATE TABLE
regression=# create rule r1 as on insert to surprise do
regression-# ( select 'hello' ; select 'how are you' );
CREATE RULE
regression=# insert into surprise values ('boo');
?column?
-------------
how are you
(1 row)
regression=#
The 'hello' result was in fact computed and sent by the backend, but it
was discarded in libpq (see the documentation about PQexec: only the
last resultset returned by a querystring is returned to the caller).
psql could have printed both results, but it would need to use
PQsendQuery/PQgetResult instead of PQexec.
regards, tom lane
Gavin Sherry wrote:
Do you have any idea about databases returning result sets from SQL
procedures (ie, not functions).
As other's have pointed out, this is very common in the MS SQL Server
world (and I believe Sysbase also supports it). It works like:
<begin proc def>
select * from something
...
select * from somethingelse
...
<end proc def>
We get requests for this kind of functionality at least a couple of
times a month, and although it's been a few years since I mucked with
MSSQL, I found it to be very useful in a number of different circumstances.
It is only workable because stored procedures cannot participate in
normal SELECT statements. In MSSQL you would do something like:
exec sp_my_multiresultset_proc
GO
-- or --
sp_my_multiresultset_proc
GO
so the analogy to your stored procedure proposal holds:
call sp_my_multiresultset_proc();
-- or --
sp_my_multiresultset_proc();
I had always envisioned implementing this by projecting tuples directly
the way that SHOW ALL or EXPLAIN ANALYZE do. See, e.g.
ShowAllGUCConfig() in guc.c.
Joe
Gavin Sherry <swm@linuxworld.com.au> writes:
3) Procedures can be run in the same savepoint level as the caller when
OLD SAVEPOINT LEVEL is specified at creation time. According to SQL2003,
functions must be run on a new savepoint level. From my understanding, we
do not do this currently.
It's irrelevant since we don't allow functions to call SAVEPOINT/RELEASE/
ROLLBACK TO explicitly, and probably won't do so anytime soon. The only
thing we can really manage for a function is constrained use of
subtransactions a la plpgsql exceptions. This doesn't require the
savepoints to be named at all, so savepoint levels need not enter into it.
This makes the difference between procedures and functions quite
superficial: procedures are functions which return void and have parameter
modes.
If you implement it that way I think it'll be very largely a waste of
effort :-(. What you're talking about seems mere syntactic sugar and
not a fundamental advance in capability.
What I'd like to see is a "procedure" capability which is somehow
outside the transaction system and can therefore invoke BEGIN, COMMIT,
SAVEPOINT, etc. I have no immediate ideas about how to do this, but
I think that's what people are really after when they ask for
server-side procedures. They want to be able, for example, to have
a procedure encapsulate an abort-and-retry loop around a serializable
transaction. (It'd be great if we could do that in a function, but
I haven't thought of a way to make it work.)
I concur with Grant Finnemore's objection as well: people expect
procedures to be able to return resultsets, ie SETOF something,
not only scalar values. Whether this is what SQL2003 says is not
really the issue -- we have to look at what's out there in competing
products.
regards, tom lane
Gavin, Neil,
Following is a proposal to implement what SQL2003 calls 'SQL-Invoked
Procedures' and what most people refer to as stored procedures. Fujitsu
will be funding Neil Conway and I to work on this feature.
Which, by the way, is way keen.
My comments are based on having professionally written several hundred
thousand lines of procedural code for PostgreSQL, SQL Server, and Oracle. I
believe that your interpretation of the spec is correct but that there are
several things not covered by the spec, but implemented by other RDBMSes,
which make stored procedures *useful* which have been omitted. I feel
strongly that these things will make a large difference to people thinking of
migrating to PostgreSQL from other DBMSes, and want to make sure that Neil's
implementation does not make them harder, instead of easier, to to implement
later.
Procedures are nearly identical to functions.
IMHO, this is largely because the spec regards a great deal of SP
functionality to be "implementation-defined", and is thus kept as vague as
possible. In practice, other DBMSes which have both SPs and Functions treat
them *very* differently.
3) Procedures can be run in the same savepoint level as the caller when
OLD SAVEPOINT LEVEL is specified at creation time. According to SQL2003,
functions must be run on a new savepoint level. From my understanding, we
do not do this currently.Work will focus on 1 and 2 until we have the concept of savepoint levels
with functions. Its possible that we will implement this too if there is
demand.
One of the things which differentiates SPs on other DBs from PostgreSQL
Functions is transactionality. In SQL Server and Oracle, SPs are not
automatically a transaction; instead, they contain transactions within them.
This is vitally important to DBAs who want to use SPs to automate database
maintenance, loads, transformations, and other activities which require
checkpointing within the course of a program.
For example, I run a nightly data transformation for one client which requires
16 steps with VACUUMs, ANALYZEs and error-handling between them.
Currently, the only way I can implement this for PostgreSQL is to have an
external program (Perl, in my case) manage this and call each step as a
separate function. It would be far easier to manage if I could put all of
the steps, including the vaccums inside one long-running SP, but the required
transaction container prevents this.
If we go down the route of saying that procedures are a type of function,
we have the option of allowing users access to OUT and INOUT in functions.
This would make procedures simply a subset of functions. What do people
think?
Well, to be frank, my first thought is, why bother? If you're just going to
implement some syntatic sugar on top of the current Function feature, why
bother at all?
Given the opportunity, I would far prefer to set us on a road that would allow
us to replicate -- and exceed -- the functionality of Oracle's PL/SQL. This
proposal does not do that; in fact, if someone were to start implementing
such functionality later they might find this code a stumbling block.
There will be cases when we need to identify whether a routine is a
function or a procedure. This could be done two ways. We could say that
any proc in pg_proc which returns void is a procedure or we could store
this in some 'protype' column. Thoughts?
Well, see my thoughts above on differentiating SPs from Functions. I
certainly don't think we should be using the same table.
A second point, which I brought up with you on IRC, is to eliminate
overloading and allow named parameter calls on SPs. This is extremely
useful functionality in T-SQL and PL/SQL; in fact, I'd say that it's
essential for any operation that wants to create an SP-centric middleware as
only named parameter calls allow developers to add parameters to existing
procedures without breaking existing calls.
For anyone who doesn't know what I'm talking about, it's this form:
CREATE PROCEDURE do_some_work ( alpha INT, beta INT, gamma TEXT )
etc.
Where you can:
CALL do_some_work( alpha = 5, beta = 7 )
.. and then gamma picks up its default, if any, or even:
CALL do_some_work( gamma = 'Lord Fontleroy', beta = 7, alpha = 1 )
The idea being that for SPs, schema.name is unique regardless of the
parameters. Even if implementing named parameter calls is beyond the
current spec, I will argue strongly in favor of eliminating overloading for
SPs. Overloading serves no purpose for them and prohibits the whole concept
of default values.
Other databases, and SQL2003, support a few different implementations what
could be called variables. In SQL2003, there are a few types of
'variables':
I completely follow your thinking about variables, and find it makes sense.
SET VARIABLE <varname> = <val>
SHOW VARIABLE <varname>
The problem with this is name collisions with the GUC -- especially as it now
allows add-ins to create their own GUC variables. However intuitive SET and
SHOW are, they will lead to problems. Maybe SETVAR and SHOWVAR? Or using
the PL/pgsql assignment operator for the first, and DISPLAY for the second,
e.g.:
num_logos := 917;
DISPLAY num_logos;
Also, you do realize that this is going to lead to requests for SELECT ....
INTO on the SQL command line, don't you?
The other option is that we do it at the protocol level and modify libpq
(and psql) to add support. This would allow us something like:
This may be a better approach. I've personally never been comfortable with
the use of variables outside of SPs and packages; it seems orthagonal to the
declaritive nature of SQL. However, this is a aesthic thing and not really
based on practical considerations.
The more practical consideration is, where will OUT and INOUT parameters be
used? Do we want them returned to the SQL session or directly to the
calling client? I would think that practicality would argue in favor of the
latter; I can't see needing variables in SQL except for testing, and having
them in psql will allow me that.
That might need some massaging but you get the idea. The commands to psql
translate to lower level protocol level commands which are: create
variable, set (ie, modify the value of) variable and describe variable.
Neil thinks we should have create and set in one step. This is great, since
it covers most cases, but I'm not sure how we have uninitialised values.
Perhaps we set to NULL?
Yes.
The only other question (that I can think of now) with respect to
variables is how they are affected by transactions. My gut feel is
that they should have transactional semantics. For example:
I agree strongly with this, especially since we'll be using Savepoints inside
the SPs. Having one's data mods roll back but not the variable values would
be confusing and lead to a *lot* of debugging.
We can use the same permissions as for functions.
Agreed.
--
Josh Berkus
Aglio Database Solutions
San Francisco
Import Notes
Reply to msg id not found: 20040923111312.7725F329E3B@svr1.postgresql.orgReference msg id not found: 20040923111312.7725F329E3B@svr1.postgresql.org | Resolved by subject fallback
On Thu, Sep 23, 2004 at 12:40:41PM -0400, Tom Lane wrote:
What I'd like to see is a "procedure" capability which is somehow
outside the transaction system and can therefore invoke BEGIN, COMMIT,
SAVEPOINT, etc. I have no immediate ideas about how to do this, but
I think that's what people are really after when they ask for
server-side procedures. They want to be able, for example, to have
a procedure encapsulate an abort-and-retry loop around a serializable
transaction. (It'd be great if we could do that in a function, but
I haven't thought of a way to make it work.)
I don't think we can do that in a standard function, at least not
without a lot of work. If we think of procedures as different from
functions, however, it seems doable.
What's needed for this is to isolate the transaction-initiating code
from the main query-processing loop. So for CALL statements it wouldn't
be invoked, and the procedure would be able to use its own explicit
transaction blocks and savepoints.
This part is not hard to do at all. It can be handled from the parser,
I think.
What's harder is handling the execution code. If the procedure uses
SPI, we need a mechanism to keep its SPI state, outside the normal
transaction-bound SPI stack.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"And as an added bonus, now my computer goes to the toilet for me, leaving me
free to spend time on more useful activities! yay slug codefests!" (C. Parker)
Tom Lane wrote:
regression=# create rule r1 as on insert to surprise do
regression-# ( select 'hello' ; select 'how are you' );
CREATE RULE
The 'hello' result was in fact computed and sent by the backend, but it
was discarded in libpq (see the documentation about PQexec: only the
last resultset returned by a querystring is returned to the caller).
psql could have printed both results, but it would need to use
PQsendQuery/PQgetResult instead of PQexec.
Yikes. I thought this couldn't happen under the V3 extended query protocol.
The JDBC driver currently sends Describe/Execute and expects exactly one
of RowDescription/NoData followed by zero or more DataRows followed by
one of CommandComplete/EmptyQueryResponse/PortalSuspended. This seems
wrong if there could be multiple resultsets from a single Execute.
How can clients distinguish multiple resultsets if they're using the
extended query protocol?
-O
Oliver Jowett <oliver@opencloud.com> writes:
The JDBC driver currently sends Describe/Execute and expects exactly one
of RowDescription/NoData followed by zero or more DataRows followed by
one of CommandComplete/EmptyQueryResponse/PortalSuspended. This seems
wrong if there could be multiple resultsets from a single Execute.
This is okay if you know the query was a SELECT, since we don't allow
ON SELECT rules that aren't view-like (ie, DO INSTEAD SELECT something-else).
Non-SELECT queries can return multiple result sets, though, as I
illustrated. It's probably reasonable for the driver to throw these
away if it's not looking for a resultset at all. Or you could follow
PQexec's lead and discard all but the last.
How can clients distinguish multiple resultsets if they're using the
extended query protocol?
You'll get multiple repetitions of RowDescription/DataRows.
regards, tom lane