SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.
Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:
CREATE FUNCTION add(a integer, b integer) RETURNS integer
LANGUAGE SQL
RETURN a + b;
or as a block
CREATE PROCEDURE insert_data(a integer, b integer)
LANGUAGE SQL
BEGIN ATOMIC
INSERT INTO tbl VALUES (a);
INSERT INTO tbl VALUES (b);
END;
The function body is parsed at function definition time and stored as
expression nodes in probin. So at run time, no further parsing is
required.
However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.
Dependencies between the function and the objects it uses are fully
tracked.
A new RETURN statement is introduced. This can only be used inside
function bodies. Internally, it is treated much like a SELECT
statement.
psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.
Also, per SQL standard, LANGUAGE SQL is the default, so it does not
need to be specified anymore.
Note: Some parts of the patch look better under git diff -w (ignoring
whitespace changes) because if/else blocks were introduced around
existing code.
TODOs and discussion points:
- pg_dump is not yet supported. As a consequence, the pg_upgrade
tests don't pass yet. I'm thinking about changing pg_dump to use
pg_get_functiondef here instead of coding everything by hand. Some
initial experimenting showed that this would be possible with minimal
tweaking and it would surely be beneficial in the long run.
- The compiled function body is stored in the probin field of pg_proc.
This matches the historical split similar to adsrc/adbin, consrc/conbin,
but this has now been abandoned. Also, this field should ideally be of
type pg_node_tree, so reusing probin for that is probably not good.
Seems like a new field might be best.
- More test coverage is needed. Surprisingly, there wasn't actually any
test AFAICT that just creates and SQL function and runs it. Most of
that code is tested incidentally, but there is very little or no
targeted testing of this functionality.
- Some of the changes in pg_proc.c, functioncmds.c, and functions.c in
particular were jammed in and could use some reorganization after the
basic ideas are solidified.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v1-0001-SQL-standard-function-body.patchtext/plain; charset=UTF-8; name=v1-0001-SQL-standard-function-body.patch; x-mac-creator=0; x-mac-type=0Download+1041-179
On Tue, Jun 30, 2020 at 1:49 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.
With what other implementations is it compatible?
The function body is parsed at function definition time and stored as
expression nodes in probin. So at run time, no further parsing is
required.However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.Dependencies between the function and the objects it uses are fully
tracked.A new RETURN statement is introduced. This can only be used inside
function bodies. Internally, it is treated much like a SELECT
statement.psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.Also, per SQL standard, LANGUAGE SQL is the default, so it does not
need to be specified anymore.
Hmm, this all seems like a pretty big semantic change. IIUC, right
now, a SQL function can only contain one statement, but it seems like
with this patch you can have a block in there with a bunch of
statements, sorta like plpgsql. But probably you don't have all of the
functionality of plpgsql available. Also, the fact that you're doing
parsing earlier means that e.g. creating a table and inserting into it
won't work. Maybe that's fine. But it almost seems like you are
inventing a whole new PL....
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greetings,
* Robert Haas (robertmhaas@gmail.com) wrote:
Hmm, this all seems like a pretty big semantic change. IIUC, right
now, a SQL function can only contain one statement, but it seems like
with this patch you can have a block in there with a bunch of
statements, sorta like plpgsql.
From our docs:
CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
UPDATE bank
SET balance = balance - debit
WHERE accountno = tf1.accountno;
SELECT 1;
$$ LANGUAGE SQL;
https://www.postgresql.org/docs/current/xfunc-sql.html
Haven't looked at the patch, tho if it adds support for something the
SQL standard defines, that generally seems like a positive to me.
Thanks,
Stephen
út 30. 6. 2020 v 19:58 odesílatel Robert Haas <robertmhaas@gmail.com>
napsal:
On Tue, Jun 30, 2020 at 1:49 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.With what other implementations is it compatible?
The function body is parsed at function definition time and stored as
expression nodes in probin. So at run time, no further parsing is
required.However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.Dependencies between the function and the objects it uses are fully
tracked.A new RETURN statement is introduced. This can only be used inside
function bodies. Internally, it is treated much like a SELECT
statement.psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.Also, per SQL standard, LANGUAGE SQL is the default, so it does not
need to be specified anymore.Hmm, this all seems like a pretty big semantic change. IIUC, right
now, a SQL function can only contain one statement, but it seems like
with this patch you can have a block in there with a bunch of
statements, sorta like plpgsql. But probably you don't have all of the
functionality of plpgsql available. Also, the fact that you're doing
parsing earlier means that e.g. creating a table and inserting into it
won't work. Maybe that's fine. But it almost seems like you are
inventing a whole new PL....
It is SQL/PSM and can be nice to have it.
I am a little bit afraid about performance - SQL functions doesn't use plan
cache and simple expressions. Without inlining it can be too slow.
Regards
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Jun 30, 2020 at 1:49 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.
With what other implementations is it compatible?
Yeah ... I'm sort of wondering exactly what this really accomplishes.
I think "portability" is a red herring unfortunately.
Tracking the dependencies of the function body sounds nice at first
glance, so it might be a feature. But given our experiences with having
to use check_function_bodies = off to not have impossible dependency loops
in dump/restore, I rather wonder whether it'll be a net loss in practice.
IIUC, this implementation is flat out incapable of doing the equivalent of
check_function_bodies = off, and that sounds like trouble.
Hmm, this all seems like a pretty big semantic change. IIUC, right
now, a SQL function can only contain one statement,
Not true, you can have more. However, it's nonetheless an enormous
semantic change, if only because the CREATE FUNCTION-time search_path
is now relevant instead of the execution-time path. That *will*
break use-cases I've heard of, where the same function is applied
to different tables by adjusting the path. It'd certainly be useful
from some perspectives (eg better security), but it's ... different.
Replicating the creation-time search path will be a big headache for
pg_dump, I bet.
But it almost seems like you are
inventing a whole new PL....
Yes. Having this replace the existing SQL PL would be a disaster,
because there are use-cases this simply can't meet (even assuming
that we can fix the polymorphism problem, which seems a bit unlikely).
We'd need to treat it as a new PL type.
Perhaps this is useful enough to justify all the work involved,
but I'm not sure.
regards, tom lane
I wrote:
Replicating the creation-time search path will be a big headache for
pg_dump, I bet.
On further thought, we probably don't have to. Re-parsing the function
body the same way is exactly the same problem as re-parsing a view or
matview body the same way. I don't want to claim that that's a 100%
solved problem, but I've heard few complaints in that area lately.
The point remains that exposing the function body's dependencies will
constrain restore order far more than we are accustomed to see. It
might be possible to build examples that flat out can't be restored,
even granting that we teach pg_dump how to break dependency loops
by first creating the function with empty body and later redefining
it with the real body. (Admittedly, if that's possible then you
likely could make it happen with views too. But somehow it seems
more likely that people would create spaghetti dependencies for
functions than views.)
regards, tom lane
Hi,
On 2020-06-30 19:49:04 +0200, Peter Eisentraut wrote:
The function body is parsed at function definition time and stored as
expression nodes in probin. So at run time, no further parsing is
required.
As raw parse tree or as a parse-analysed tree? I assume the latter?
Isn't a consequence of that that we'd get a lot more errors if any DDL
is done to tables involved in the query? In contrast to other languages
we'd not be able to handle column type changes etc, right?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2020-06-30 19:49:04 +0200, Peter Eisentraut wrote:
The function body is parsed at function definition time and stored as
expression nodes in probin. So at run time, no further parsing is
required.
Isn't a consequence of that that we'd get a lot more errors if any DDL
is done to tables involved in the query? In contrast to other languages
we'd not be able to handle column type changes etc, right?
I suppose it'd act like column references in a view, ie the dependency
mechanisms would forbid you from changing/dropping any column mentioned
in one of these functions.
regards, tom lane
On Tue, Jun 30, 2020 at 2:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
On further thought, we probably don't have to. Re-parsing the function
body the same way is exactly the same problem as re-parsing a view or
matview body the same way. I don't want to claim that that's a 100%
solved problem, but I've heard few complaints in that area lately.The point remains that exposing the function body's dependencies will
constrain restore order far more than we are accustomed to see. It
might be possible to build examples that flat out can't be restored,
even granting that we teach pg_dump how to break dependency loops
by first creating the function with empty body and later redefining
it with the real body. (Admittedly, if that's possible then you
likely could make it happen with views too. But somehow it seems
more likely that people would create spaghetti dependencies for
functions than views.)
In my experience, there's certainly demand for some kind of mode where
plpgsql functions get checked at function definition time, rather than
at execution time. The model we have is advantageous not only because
it simplifies dump and reload, but also because it handles cases where
the table is created on the fly properly. However, it also means that
you can have silly mistakes in your function definitions that you
don't find out about until runtime, and in my experience, people don't
like that behavior much at all. So I don't think that it's a bad idea
on principle, or anything like that, but the details seem like they
need a lot of thought. The dump and restore issues need to be
considered, but also, what about things like IF and WHILE? People are
going to want those constructs with these new semantics, too.
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too? If we go that direction with this, then
we're actually going to end up with two different implementations of
what's now plpgsql, or something. But if we don't, then I'm not sure
how far this takes us. I'm not saying it's bad, but the comment "I
love the early binding but where's my IF statement" seems like an
inevitable one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
st 1. 7. 2020 v 15:37 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
On Tue, Jun 30, 2020 at 2:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
On further thought, we probably don't have to. Re-parsing the function
body the same way is exactly the same problem as re-parsing a view or
matview body the same way. I don't want to claim that that's a 100%
solved problem, but I've heard few complaints in that area lately.The point remains that exposing the function body's dependencies will
constrain restore order far more than we are accustomed to see. It
might be possible to build examples that flat out can't be restored,
even granting that we teach pg_dump how to break dependency loops
by first creating the function with empty body and later redefining
it with the real body. (Admittedly, if that's possible then you
likely could make it happen with views too. But somehow it seems
more likely that people would create spaghetti dependencies for
functions than views.)In my experience, there's certainly demand for some kind of mode where
plpgsql functions get checked at function definition time, rather than
at execution time. The model we have is advantageous not only because
it simplifies dump and reload, but also because it handles cases where
the table is created on the fly properly. However, it also means that
you can have silly mistakes in your function definitions that you
don't find out about until runtime, and in my experience, people don't
like that behavior much at all. So I don't think that it's a bad idea
on principle, or anything like that, but the details seem like they
need a lot of thought. The dump and restore issues need to be
considered, but also, what about things like IF and WHILE? People are
going to want those constructs with these new semantics, too.
plpgsql_check can be integrated to upstream.
https://github.com/okbob/plpgsql_check
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too? If we go that direction with this, then
we're actually going to end up with two different implementations of
what's now plpgsql, or something. But if we don't, then I'm not sure
how far this takes us. I'm not saying it's bad, but the comment "I
love the early binding but where's my IF statement" seems like an
inevitable one.
The standard SQL/PSM is a full functionality language with variables,
conditional statements, exception handlings, ..
https://postgres.cz/wiki/SQL/PSM_Manual
Unfortunately a basic implementation integrated into the main SQL parser
can be pretty hard work. First issue can be SET statement implementation.
Regards
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
In my experience, there's certainly demand for some kind of mode where
plpgsql functions get checked at function definition time, rather than
at execution time.
Yeah, absolutely agreed. But I'm afraid this proposal takes us too
far in the other direction: with this, you *must* have a 100% parseable
and semantically valid function body, every time all the time.
So far as plpgsql is concerned, I could see extending the validator
to run parse analysis (not just raw parsing) on all SQL statements in
the body. This wouldn't happen of course with check_function_bodies off,
so it wouldn't affect dump/reload. But likely there would still be
demand for more fine-grained control over it ... or maybe it could
stop doing analysis as soon as it finds a DDL command?
regards, tom lane
st 1. 7. 2020 v 16:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Robert Haas <robertmhaas@gmail.com> writes:
In my experience, there's certainly demand for some kind of mode where
plpgsql functions get checked at function definition time, rather than
at execution time.Yeah, absolutely agreed. But I'm afraid this proposal takes us too
far in the other direction: with this, you *must* have a 100% parseable
and semantically valid function body, every time all the time.So far as plpgsql is concerned, I could see extending the validator
to run parse analysis (not just raw parsing) on all SQL statements in
the body. This wouldn't happen of course with check_function_bodies off,
so it wouldn't affect dump/reload. But likely there would still be
demand for more fine-grained control over it ... or maybe it could
stop doing analysis as soon as it finds a DDL command?
This simple analysis stops on first record type usage. PLpgSQL allows some
dynamic work that increases the complexity of static analysis.
Regards
Pavel
Show quoted text
regards, tom lane
On Wed, Jul 1, 2020 at 10:14:10AM -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
In my experience, there's certainly demand for some kind of mode where
plpgsql functions get checked at function definition time, rather than
at execution time.Yeah, absolutely agreed. But I'm afraid this proposal takes us too
far in the other direction: with this, you *must* have a 100% parseable
and semantically valid function body, every time all the time.So far as plpgsql is concerned, I could see extending the validator
to run parse analysis (not just raw parsing) on all SQL statements in
the body. This wouldn't happen of course with check_function_bodies off,
so it wouldn't affect dump/reload. But likely there would still be
demand for more fine-grained control over it ... or maybe it could
stop doing analysis as soon as it finds a DDL command?
Is the SQL-standard function body verified as preventing function
inlining? That seems to be a major downside.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
Bruce Momjian <bruce@momjian.us> writes:
Is the SQL-standard function body verified as preventing function
inlining? That seems to be a major downside.
I see no reason why that would make any difference. There might
be more code to be written than is in the patch, but in principle
inlining should not care whether the function is pre-parsed or not.
regards, tom lane
On 7/1/20 3:36 PM, Robert Haas wrote:
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too?
It's either a single sql statement, or a collection of them between
"begin atomic" and "end". There are no variables or flow control
constructs or anything like that, just as there are no such things
outside of a function.
(There are a few statements that are not allowed, such as COMMIT.)
--
Vik Fearing
st 1. 7. 2020 v 20:19 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:
On 7/1/20 3:36 PM, Robert Haas wrote:
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too?It's either a single sql statement, or a collection of them between
"begin atomic" and "end". There are no variables or flow control
constructs or anything like that, just as there are no such things
outside of a function.
What is the source of this comment? Maybe we are speaking (and thinking)
about different languages.
I thought the language of SQL functions (ANSI/SQL) is SQL/PSM.
Regards
Pavel
Show quoted text
(There are a few statements that are not allowed, such as COMMIT.)
--
Vik Fearing
On 7/1/20 9:32 PM, Pavel Stehule wrote:
st 1. 7. 2020 v 20:19 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 7/1/20 3:36 PM, Robert Haas wrote:
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too?It's either a single sql statement, or a collection of them between
"begin atomic" and "end". There are no variables or flow control
constructs or anything like that, just as there are no such things
outside of a function.What is the source of this comment?
The SQL Standard.
Maybe we are speaking (and thinking)
about different languages.
I think so, yes.
I thought the language of SQL functions (ANSI/SQL) is SQL/PSM.
That is something else entirely, and not at all what Peter's patch is about.
--
Vik Fearing
st 1. 7. 2020 v 22:31 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:
On 7/1/20 9:32 PM, Pavel Stehule wrote:
st 1. 7. 2020 v 20:19 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 7/1/20 3:36 PM, Robert Haas wrote:
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too?It's either a single sql statement, or a collection of them between
"begin atomic" and "end". There are no variables or flow control
constructs or anything like that, just as there are no such things
outside of a function.What is the source of this comment?
The SQL Standard.
The SQL Standard is really big, and is very possible so I miss this part.
Can you send me a link?
Regards
Pavel
Maybe we are speaking (and thinking)
about different languages.I think so, yes.
I thought the language of SQL functions (ANSI/SQL) is SQL/PSM.
That is something else entirely, and not at all what Peter's patch is
about.
--
Show quoted text
Vik Fearing
On 7/1/20 10:34 PM, Pavel Stehule wrote:
st 1. 7. 2020 v 22:31 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 7/1/20 9:32 PM, Pavel Stehule wrote:
st 1. 7. 2020 v 20:19 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 7/1/20 3:36 PM, Robert Haas wrote:
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too?It's either a single sql statement, or a collection of them between
"begin atomic" and "end". There are no variables or flow control
constructs or anything like that, just as there are no such things
outside of a function.What is the source of this comment?
The SQL Standard.
The SQL Standard is really big, and is very possible so I miss this part.
Can you send me a link?
ISO/IEC 9075-2:2016 Section 11.60 <SQL-invoked routine>
--
Vik Fearing
st 1. 7. 2020 v 22:54 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:
On 7/1/20 10:34 PM, Pavel Stehule wrote:
st 1. 7. 2020 v 22:31 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 7/1/20 9:32 PM, Pavel Stehule wrote:
st 1. 7. 2020 v 20:19 odesílatel Vik Fearing <vik@postgresfriends.org>
napsal:On 7/1/20 3:36 PM, Robert Haas wrote:
I actually don't have a very clear idea of what the standard has to
say about SQL-language functions. Does it just say it's a list of
statements, or does it involve variables and control-flow constructs
and stuff like that, too?It's either a single sql statement, or a collection of them between
"begin atomic" and "end". There are no variables or flow control
constructs or anything like that, just as there are no such things
outside of a function.What is the source of this comment?
The SQL Standard.
The SQL Standard is really big, and is very possible so I miss this part.
Can you send me a link?ISO/IEC 9075-2:2016 Section 11.60 <SQL-invoked routine>
Thank you
Pavel
--
Show quoted text
Vik Fearing