CREATE ROUTINE MAPPING
A few months ago, I was researching ways for formalizing calling functions
on one postgres instance from another. RPC, basically. In doing so, I
stumbled across an obscure part of the the SQL Standard called ROUTINE
MAPPING, which is exactly what I'm looking for.
The syntax specified is, roughly:
CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec
SERVER my_server [ OPTIONS( ... ) ]
Which isn't too different from CREATE USER MAPPING.
The idea here is that if I had a local query:
SELECT t.x, remote_func1(), remote_func2(t.y)
FROM remote_table t
WHERE t.active = true;
that would become this query on the remote side:
SELECT t.x, local_func1(), local_func2(t.y)
FROM local_table t
WHERE t.active = true;
That was probably the main intention of this feature, but I see a different
possibility there. Consider the cases:
SELECT remote_func(1,'a');
and
SELECT * FROM remote_srf(10, true);
Now we could have written remote_func() and remote_srf() in plpythonu, and
it could access whatever remote data that we wanted to see, but that
exposes our local server to the untrusted pl/python module as well as
python process overhead.
We could create a specialized foreign data wrapper that requires a WHERE
clause to include all the require parameters as predicates, essentially
making every function a table, but that's awkward and unclear to an end
user.
Having the ability to import functions from other servers allows us to
write foreign servers that expose functions to the local database, and
those foreign servers handle the bloat and risks associated with accessing
that remote data.
Moreover, it would allow hosted environments (AWS, etc) that restrict the
extensions that can be added to the database to still connect to those
foreign data sources.
I'm hoping to submit a patch for this someday, but it touches on several
areas of the codebase where I have no familiarity, so I've put forth to
spark interest in the feature, to see if any similar work is underway, or
if anyone can offer guidance.
Thanks in advance.
On Thu, Jan 11, 2018 at 09:37:43PM -0500, Corey Huinker wrote:
A few months ago, I was researching ways for formalizing calling functions
on one postgres instance from another. RPC, basically. In doing so, I
stumbled across an obscure part of the the SQL Standard called ROUTINE
MAPPING, which is exactly what I'm looking for.The syntax specified is, roughly:
CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec
SERVER my_server [ OPTIONS( ... ) ]
[neat use cases elided]
For what it's worth, the now-defunct DBI-Link I wrote had
remote_execute(), which did many of the things you describe here, only
with no help from the rest of PostgreSQL, as it was implemented
strictly in userland.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Jan 12, 2018 at 8:07 AM, Corey Huinker <corey.huinker@gmail.com> wrote:
A few months ago, I was researching ways for formalizing calling functions
on one postgres instance from another. RPC, basically. In doing so, I
stumbled across an obscure part of the the SQL Standard called ROUTINE
MAPPING, which is exactly what I'm looking for.The syntax specified is, roughly:
CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec
SERVER my_server [ OPTIONS( ... ) ]Which isn't too different from CREATE USER MAPPING.
The idea here is that if I had a local query:
SELECT t.x, remote_func1(), remote_func2(t.y)
FROM remote_table t
WHERE t.active = true;
that would become this query on the remote side:
SELECT t.x, local_func1(), local_func2(t.y)
FROM local_table t
WHERE t.active = true;
I think this is a desired feature. Being able to call a function on
remote server through local server is often useful.
PostgreSQL allows function overloading, which means that there can be
multiple functions with same name differing in argument types. So, the
syntax has to include the input parameters or their types at least.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
2018-01-12 10:02 GMT+01:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:
On Fri, Jan 12, 2018 at 8:07 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:A few months ago, I was researching ways for formalizing calling
functions
on one postgres instance from another. RPC, basically. In doing so, I
stumbled across an obscure part of the the SQL Standard called ROUTINE
MAPPING, which is exactly what I'm looking for.The syntax specified is, roughly:
CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec
SERVER my_server [ OPTIONS( ... ) ]Which isn't too different from CREATE USER MAPPING.
The idea here is that if I had a local query:
SELECT t.x, remote_func1(), remote_func2(t.y)
FROM remote_table t
WHERE t.active = true;
that would become this query on the remote side:
SELECT t.x, local_func1(), local_func2(t.y)
FROM local_table t
WHERE t.active = true;
I think this is a desired feature. Being able to call a function on
remote server through local server is often useful.PostgreSQL allows function overloading, which means that there can be
multiple functions with same name differing in argument types. So, the
syntax has to include the input parameters or their types at least.
+1
Pavel
Show quoted text
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
PostgreSQL allows function overloading, which means that there can be
multiple functions with same name differing in argument types. So, the
syntax has to include the input parameters or their types at least.
"local_routine_name" and "remote_routine_spec" were my own paraphrasings of
what the spec implies. I'm nearly certain that the local routine name,
which the spec says is just an identifier, cannot have a parameter spec on
it, which leaves only one other place to define it, remote_routine_spec,
which wasn't defined at all. I _suppose_ parameter definitions could be
pushed into options, but that'd be ugly.
On Fri, Jan 12, 2018 at 11:11:26AM -0500, Corey Huinker wrote:
PostgreSQL allows function overloading, which means that there can
be multiple functions with same name differing in argument types.
So, the syntax has to include the input parameters or their types
at least."local_routine_name" and "remote_routine_spec" were my own
paraphrasings of what the spec implies. I'm nearly certain that the
local routine name, which the spec says is just an identifier,
cannot have a parameter spec on it, which leaves only one other
place to define it, remote_routine_spec, which wasn't defined at
all. I _suppose_ parameter definitions could be pushed into options,
but that'd be ugly.
In my draft of SQL:2011, which I don't think has substantive changes
to what's either in the official SQL:2011 or SQL:2016, it says:
<routine mapping definition> ::=
CREATE ROUTINE MAPPING <routine mapping name> FOR <specific routine designator>
SERVER <foreign server name> [ <generic options> ]
Syntax Rules
1) Let FSN be the <foreign server name>. Let RMN be the <routine mapping name>.
2) The catalog identified by the explicit or implicit catalog name of FSN shall include a foreign server
descriptor whose foreign server name is equivalent to FSN.
3) The SQL-environment shall not include a routine mapping descriptor whose routine mapping name is
RMN.
4) Let R be the SQL-invoked routine identified by the <specific
routine designator>. R shall identify an SQL-invoked regular function.
It goes on from there, but I think there's a reasonable interpretation
of this which allows us to use the same syntax as CREATE
(FUNCTION|PROCEDURE), apart from the body, e.g.:
CREATE ROUTINE MAPPING local_routine_name
FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [ argname ] argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
[ RETURNS rettype
| RETURNS TABLE ( column_name column_type [, ...] ) ]
SERVER foreign_server_name
[ (option [, ...]) ]
Does that seem like too broad an interpretation?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
It goes on from there, but I think there's a reasonable interpretation
of this which allows us to use the same syntax as CREATE
(FUNCTION|PROCEDURE), apart from the body, e.g.:CREATE ROUTINE MAPPING local_routine_name
FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [ argname ]
argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
[ RETURNS rettype
| RETURNS TABLE ( column_name column_type [, ...] ) ]
SERVER foreign_server_name
[ (option [, ...]) ]Does that seem like too broad an interpretation?
That's really interesting. I didn't think to look in the definition of
CREATE FUNCTION to see if a SERVER option popped in there, but seems like a
more accessible way to introduce the notion of remote functions, because I
talked to a few developers about this before posting to the list, and only
one had ever heard of ROUTINE MAPPING and had no clear recollection of it.
An option on CREATE FUNCTION is going to get noticed (and used!) a lot
sooner.
Having said that, I think syntactically we have to implement CREATE ROUTINE
MAPPING, even if it is just translated to a CREATE FUNCTION call.
In either case, I suspected that pg_proc would need a nullable srvid column
pointing to pg_foreign_server, and possibly a new row in pg_language for
'external'. I had entertained having a pg_routine_mappings table like
pg_user_mappings, and we still could, if the proc's language of 'external'
clued the planner to look for the mapping. I can see arguments for either
approach.
Before anyone asks, I looked for, and did not find, any suggestion of
IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there
wouldn't be any way to grab all the functions that .a foreign server is
offering up.
On Fri, Jan 12, 2018 at 02:29:53PM -0500, Corey Huinker wrote:
It goes on from there, but I think there's a reasonable interpretation
of this which allows us to use the same syntax as CREATE
(FUNCTION|PROCEDURE), apart from the body, e.g.:CREATE ROUTINE MAPPING local_routine_name
FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [ argname ]
argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
[ RETURNS rettype
| RETURNS TABLE ( column_name column_type [, ...] ) ]
SERVER foreign_server_name
[ (option [, ...]) ]Does that seem like too broad an interpretation?
That's really interesting. I didn't think to look in the definition of
CREATE FUNCTION to see if a SERVER option popped in there, but seems like a
more accessible way to introduce the notion of remote functions,
It does indeed. Adding the functionality to CREATE
(FUNCTION|PROCEDURE) seems like a *much* better idea than trying to
wedge it into the CREATE ROUTINE MAPPING syntax.
because I talked to a few developers about this before posting to
the list, and only one had ever heard of ROUTINE MAPPING and had no
clear recollection of it. An option on CREATE FUNCTION is going to
get noticed (and used!) a lot sooner.
+1
Having said that, I think syntactically we have to implement CREATE ROUTINE
MAPPING, even if it is just translated to a CREATE FUNCTION call.In either case, I suspected that pg_proc would need a nullable srvid column
pointing to pg_foreign_server, and possibly a new row in pg_language for
'external'.
Makes a lot of sense.
I had entertained having a pg_routine_mappings table like
pg_user_mappings, and we still could, if the proc's language of
'external' clued the planner to look for the mapping. I can see
arguments for either approach.
It would be good to have them in the catalog somehow if we make CREATE
ROUTINE MAPPING a DDL. If I've read the standard correctly, there are
parts of information_schema which come into play for those routine
mappings.
Before anyone asks, I looked for, and did not find, any suggestion of
IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there
wouldn't be any way to grab all the functions that .a foreign server is
offering up.
How about making an option to IMPORT FOREIGN SCHEMA do it?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
CREATE ROUTINE MAPPING local_routine_name
FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [
argname ]
argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
[ RETURNS rettype
| RETURNS TABLE ( column_name column_type [, ...] ) ]
SERVER foreign_server_name
[ (option [, ...]) ]Does that seem like too broad an interpretation?
I had entertained having a pg_routine_mappings table like
pg_user_mappings, and we still could, if the proc's language of
'external' clued the planner to look for the mapping. I can see
arguments for either approach.It would be good to have them in the catalog somehow if we make CREATE
ROUTINE MAPPING a DDL. If I've read the standard correctly, there are
parts of information_schema which come into play for those routine
mappings.Before anyone asks, I looked for, and did not find, any suggestion of
IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there
wouldn't be any way to grab all the functions that .a foreign server is
offering up.How about making an option to IMPORT FOREIGN SCHEMA do it?
Ok, so the steps seem to be:
1. settle on syntax.
2. determine data dictionary structures
3. parse and create those structures
4. "handle" external functions locally
5. provide structures passed to FDW handlers so that they can handle
external functions
6. implement those handlers in postgres_fdw
#1 is largely prescribed for us, though I'm curious as to how the CRM
statements I've made up in examples above would look like as CREATE
FUNCTION ... SERVER ...
#2 deserves a lot of debate, but probably mostly hinges on the new
"language" and how to associate a pg_proc entry with a pg_foreign_server
#3 i'm guessing this is a lot of borrowing code from CREATE ROUTINE MAPPING
but is otherwise pretty straightforward.
#4 an external function obviously cannot be executed locally, doing so
means that the planner failed to push it down, so this is probably
stub-error functions
#5 These functions would essentially be passed in the same as foreign
columns with the "name" as "f(a,b,4)", and the burden of forming the remote
query is on the FDW
Which gets tricky. What should happen in simple situations is obvious:
SELECT t.x, remote_func1(), remote_func2(t.y)
FROM remote_table t
WHERE t.active = true;
that would become this query on the remote side:
SELECT t.x, local_func1(), local_func2(t.y)
FROM local_table t
WHERE t.active = true;
And it's still simple when local functions consume remote input
SELECT local_func1(remote_func1(r.x)) FROM remote_table r WHERE r.active =
true;
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
In those cases, at least initially, I think the FDW handler is right to
raise an error, because the function inputs are unknowable at query time,
and the inputs cannot also be pushed down to the remote server. That might
not be common, but I can see situations like this:
SELECT r.*
FROM remote_srf( ( SELECT remote_code_value FROM
local_table_of_remote_codes WHERE local_code_value = 'xyz' ) ) r;
and we would want things like that to work. Currently is similar
table-situations the FDW has no choice but to fetch the entire table and
filter locally. That's good for tables, whose contents are knowable, but
the set of possible function inputs is unreasonably large. The current
workaround in table-land is to run the inner query locally, and present the
result at a constant to a follow-up query, so maybe that's what we have to
do here, at least initially.
#6 is where the FDW either does the translation or rejects the notion that
functions can be pushed down, either outright or based on the usage of the
function in the query.
I'm doing this thinking on the mailing list in the hopes that it evokes
suggestions, warnings, suggested code samples, and of course, help.
On Wed, Jan 17, 2018 at 11:09:19AM -0500, Corey Huinker wrote:
CREATE ROUTINE MAPPING local_routine_name
FOR (FUNCTION | PROCEDURE) remote_routine_name ( [ [ argmode ] [
argname ]
argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
[ RETURNS rettype
| RETURNS TABLE ( column_name column_type [, ...] ) ]
SERVER foreign_server_name
[ (option [, ...]) ]Does that seem like too broad an interpretation?
I had entertained having a pg_routine_mappings table like
pg_user_mappings, and we still could, if the proc's language of
'external' clued the planner to look for the mapping. I can see
arguments for either approach.It would be good to have them in the catalog somehow if we make CREATE
ROUTINE MAPPING a DDL. If I've read the standard correctly, there are
parts of information_schema which come into play for those routine
mappings.Before anyone asks, I looked for, and did not find, any suggestion of
IMPORT FOREIGN ROUTINE a la IMPORT FOREIGN SCHEMA, so as of yet there
wouldn't be any way to grab all the functions that .a foreign server is
offering up.How about making an option to IMPORT FOREIGN SCHEMA do it?
Ok, so the steps seem to be:
1. settle on syntax.
2. determine data dictionary structures
3. parse and create those structures
4. "handle" external functions locally
5. provide structures passed to FDW handlers so that they can handle
external functions
6. implement those handlers in postgres_fdw#1 is largely prescribed for us, though I'm curious as to how the CRM
statements I've made up in examples above would look like as CREATE
FUNCTION ... SERVER ...#2 deserves a lot of debate, but probably mostly hinges on the new
"language" and how to associate a pg_proc entry with a pg_foreign_server#3 i'm guessing this is a lot of borrowing code from CREATE ROUTINE MAPPING
but is otherwise pretty straightforward.#4 an external function obviously cannot be executed locally, doing so
means that the planner failed to push it down, so this is probably
stub-error functions#5 These functions would essentially be passed in the same as foreign
columns with the "name" as "f(a,b,4)", and the burden of forming the remote
query is on the FDWWhich gets tricky. What should happen in simple situations is obvious:
SELECT t.x, remote_func1(), remote_func2(t.y)
FROM remote_table t
WHERE t.active = true;
that would become this query on the remote side:
SELECT t.x, local_func1(), local_func2(t.y)
FROM local_table t
WHERE t.active = true;
And it's still simple when local functions consume remote input
SELECT local_func1(remote_func1(r.x)) FROM remote_table r WHERE r.active =
true;But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.
No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.
So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:
- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.
- In pg_proc, store foreign functions as having a new language,
sql_med, which doesn't actually exist. This "language" would
function as a hint to the planner.
- Add a new system catalog for foreign functions that references
pg_proc and pg_foreign_server. Writing to it would also do the usual
stuff with pg_depend.
- During planning, at least to start, we'd ensure that foreign
functions can only take arguments on the same server.
- Once it's established that the combinations could actually work,
execution gets pushed to the foreign server(s)
What say?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, Jan 25, 2018 at 10:43 AM, David Fetter <david@fetter.org> wrote:
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.- In pg_proc, store foreign functions as having a new language,
sql_med, which doesn't actually exist. This "language" would
function as a hint to the planner.- Add a new system catalog for foreign functions that references
pg_proc and pg_foreign_server. Writing to it would also do the usual
stuff with pg_depend.- During planning, at least to start, we'd ensure that foreign
functions can only take arguments on the same server.
May be I am going in details, not expected at this stage. Right now
FDWs have a notion of shippability - i.e. certain expressions can be
evaluated on the remote server. Shippable expressions are pushed down
to the foreign server, but that's optional. Unshippable expressions
however can not be pushed down to the foreign server. With this
change, we will have a new notion of shippability where a
function/expression must be shipped to the foreign server. As long as
these strict-shippable expressions are part of shippable expressions,
things work as they are today, but as an earlier mail by Corey shows,
if those are expressions are not part of shippable expressions, they
need to be evaluated on foreign server apart from the query that gets
pushed down. You seem to be suggesting that we do not implement it
right now, which is fine. But whatever design we chose should be
extensible to do that.
A possible way to implement this may be to implement sql-med language
handler which takes the responsibility to interact with FDW and
evaluate the function. That way we can use existing function
evaluation infrastructure.
- Once it's established that the combinations could actually work,
execution gets pushed to the foreign server(s)
Overall this structure looks ok to me.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Thu, Jan 25, 2018 at 2:13 PM, David Fetter <david@fetter.org> wrote:
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.
I'm also interested in this feature. While studying this feature, I
understood that this feature just pair a local function with a remote
function, not means that creates a kind of virtual function that can
be invoked on only foreign servers. For example, if we execute the
following SQL the local_func() is invoked in local because the col1
column of local_table is referenced by it.
SELECT * FROM local_table l WHERE local_func(l.col1) = 1;
On the other hand, suppose we have the following routine mapping,
CREATE ROUTINE MAPPING rmap FOR local_func(integer) OPTIONS
(remote_func_schema = 'myschema', remote_func_name = 'remote_func');
and execute the similar SQL for a foreign table. We will get the
following remote SQL.
- Local SQL
SELECT * FROM foreign_table f WHERE local_func(f.col1) = 1;
- Remote SQL
SELECT * FROM foreign_table f WHERE my_schema.remote_func(f.col1) = 1;
In this concept, the CREATE ROUTINE MAPPING doesn't need to specify
the return type of function but must specify the existing function in
the local PostgreSQL. The mapped remote function is expected to have
the same properly(arguments, return type etc) as the local function. I
might be missing something, please give me feedback.
Please find a attached PoC patch of ROUTINE MAPPING feature. This
patch is missing many things such as the doc and the shippability
supports but this patch adds the new system catalog pg_routine_mapping
with three attributes: name, procid, serverid and enables FDWs to
refer this mapping and and to replace the function.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Fri, Aug 31, 2018 at 05:18:26PM +0900, Masahiko Sawada wrote:
On Thu, Jan 25, 2018 at 2:13 PM, David Fetter <david@fetter.org> wrote:
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.I'm also interested in this feature. While studying this feature, I
understood that this feature just pair a local function with a remote
function, not means that creates a kind of virtual function that can
be invoked on only foreign servers. For example, if we execute the
following SQL the local_func() is invoked in local because the col1
column of local_table is referenced by it.SELECT * FROM local_table l WHERE local_func(l.col1) = 1;
On the other hand, suppose we have the following routine mapping,
CREATE ROUTINE MAPPING rmap FOR local_func(integer) OPTIONS
(remote_func_schema = 'myschema', remote_func_name = 'remote_func');and execute the similar SQL for a foreign table. We will get the
following remote SQL.- Local SQL
SELECT * FROM foreign_table f WHERE local_func(f.col1) = 1;- Remote SQL
SELECT * FROM foreign_table f WHERE my_schema.remote_func(f.col1) = 1;In this concept, the CREATE ROUTINE MAPPING doesn't need to specify
the return type of function but must specify the existing function in
the local PostgreSQL. The mapped remote function is expected to have
the same properly(arguments, return type etc) as the local function. I
might be missing something, please give me feedback.Please find a attached PoC patch of ROUTINE MAPPING feature. This
patch is missing many things such as the doc and the shippability
supports but this patch adds the new system catalog pg_routine_mapping
with three attributes: name, procid, serverid and enables FDWs to
refer this mapping and and to replace the function.
Sawada-san,
Thanks very much for sending this. I just tried to compile it, and
got:
make
make -C ./src/backend generated-headers
make[1]: Entering directory '/home/shackle/pggit/postgresql/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/shackle/pggit/postgresql/src/backend/catalog'
make[2]: *** No rule to make target '../../../src/include/catalog/pg_routine_mapping.h', needed by 'bki-stamp'. Stop.
make[2]: Leaving directory '/home/shackle/pggit/postgresql/src/backend/catalog'
make[1]: *** [Makefile:141: submake-catalog-headers] Error 2
make[1]: Leaving directory '/home/shackle/pggit/postgresql/src/backend'
make: *** [src/Makefile.global:369: submake-generated-headers] Error 2
Was there a file missing from the patch?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Sep 4, 2018 at 5:48 AM, David Fetter <david@fetter.org> wrote:
On Fri, Aug 31, 2018 at 05:18:26PM +0900, Masahiko Sawada wrote:
On Thu, Jan 25, 2018 at 2:13 PM, David Fetter <david@fetter.org> wrote:
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.I'm also interested in this feature. While studying this feature, I
understood that this feature just pair a local function with a remote
function, not means that creates a kind of virtual function that can
be invoked on only foreign servers. For example, if we execute the
following SQL the local_func() is invoked in local because the col1
column of local_table is referenced by it.SELECT * FROM local_table l WHERE local_func(l.col1) = 1;
On the other hand, suppose we have the following routine mapping,
CREATE ROUTINE MAPPING rmap FOR local_func(integer) OPTIONS
(remote_func_schema = 'myschema', remote_func_name = 'remote_func');and execute the similar SQL for a foreign table. We will get the
following remote SQL.- Local SQL
SELECT * FROM foreign_table f WHERE local_func(f.col1) = 1;- Remote SQL
SELECT * FROM foreign_table f WHERE my_schema.remote_func(f.col1) = 1;In this concept, the CREATE ROUTINE MAPPING doesn't need to specify
the return type of function but must specify the existing function in
the local PostgreSQL. The mapped remote function is expected to have
the same properly(arguments, return type etc) as the local function. I
might be missing something, please give me feedback.Please find a attached PoC patch of ROUTINE MAPPING feature. This
patch is missing many things such as the doc and the shippability
supports but this patch adds the new system catalog pg_routine_mapping
with three attributes: name, procid, serverid and enables FDWs to
refer this mapping and and to replace the function.Sawada-san,
Thanks very much for sending this. I just tried to compile it, and
got:make
make -C ./src/backend generated-headers
make[1]: Entering directory '/home/shackle/pggit/postgresql/src/backend'
make -C catalog distprep generated-header-symlinks
make[2]: Entering directory '/home/shackle/pggit/postgresql/src/backend/catalog'
make[2]: *** No rule to make target '../../../src/include/catalog/pg_routine_mapping.h', needed by 'bki-stamp'. Stop.
make[2]: Leaving directory '/home/shackle/pggit/postgresql/src/backend/catalog'
make[1]: *** [Makefile:141: submake-catalog-headers] Error 2
make[1]: Leaving directory '/home/shackle/pggit/postgresql/src/backend'
make: *** [src/Makefile.global:369: submake-generated-headers] Error 2Was there a file missing from the patch?
Oops, sorry I missed the header file. Attached the PoC patches again.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Hello.
At Tue, 4 Sep 2018 09:34:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBfTKRFPwboss4xVEoVwUmi0gKBgwsWQijviJP3hScwQ@mail.gmail.com>
On Tue, Sep 4, 2018 at 5:48 AM, David Fetter <david@fetter.org> wrote:
On Fri, Aug 31, 2018 at 05:18:26PM +0900, Masahiko Sawada wrote:
On Thu, Jan 25, 2018 at 2:13 PM, David Fetter <david@fetter.org> wrote:
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.I'm also interested in this feature. While studying this feature, I
understood that this feature just pair a local function with a remote
function, not means that creates a kind of virtual function that can
be invoked on only foreign servers. For example, if we execute the
following SQL the local_func() is invoked in local because the col1
column of local_table is referenced by it.
Do you mean that ISO/IEC 9075-9:2016 (right?) is defining that
(and we must follow it)? Or does it comes by referring to
something like [1]http://cs.unibo.it/~montesi/CBD/Articoli/SQL-MED.pdf? As far as I see David's mail upthread,
OPTIONS is not precisely defined.
[1]: http://cs.unibo.it/~montesi/CBD/Articoli/SQL-MED.pdf
Unfortunately I don't have access to the document nor concrete
use cases. With a rough idea of "remote mapping", I can guess the
followng four use cases. Each example syntax is just a guess
without any consideration on implementability or other
restrictions. The patch looks currently covering B.
A. Just notify a function can be just pushed down.
ex. SELECT foo(1, 'bar'); Remote: SELECT foo(1, 'bar');
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem;
(or same as B)
B. Replace function name with the remote equivalent.
ex. SELECT foo(1, 'bar'); Remote: SELECT hoge(1, 'bar');
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
OPTIONS (remote_func_name 'hoge'));
C. Adjust function specification with remote.
ex. SELECT foo(1, 'bar'); Remote: SELECT hoge('bar', 1, true);
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
OPTIONS (remote_expression 'hoge($2,$1,true)');
D. Replace with an equivalent remote expression.
ex. SELECT foo(1, 'bar'); Remote: SELECT ('bar' || to_char(1 % 10));
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
OPTIONS (remote_expression '$2 || to_char($1 % 10)');
I haven't looked the patch in depth, but the core side looks
generic and the FDW side is extensible to A, C and D. I think B
is enough as a starter. I don't mean that we should implement all
of them. They are just possibilities.
I have some comments on the patch.
It doesn't seem working. Am I missing something?
====
create server sv1 foreign data wrapper postgres_fdw options (host '/tmp', port '5432', dbname 'postgres');
create table lt (a int);
create foreign table ft (a int) server sv1 options (table_name 'lt');
create function lhoge(int) returns int as 'begin return $1 * 2; end;' language plpgsql;
create routine mapping rm1 for function lhoge(int) server sv1 options (remote_func_name 'rhoge');
explain verbose select * from ft where a = lhoge(3);
QUERY PLAN
------------------------------------------------------------------
Foreign Scan on public.ft (cost=100.00..936.31 rows=15 width=4)
Output: a
Filter: (ft.a = lhoge(3))
Remote SQL: SELECT a FROM public.lt
(4 rows)
====
Perhaps it cannot push down simple SQL local functions. (I'm not
sure we should do that.)
Can't we specify remote schema in remote_func_name just as
(remote_func_name "fooschema.funcname")?
Can't we provide the syntax without making MAPPING reserved?
Chainging the syntax for ALTER/DROP ROUTINE MAPPING like USER
MAPPING, specifically to ALTER/DROP ROUTINE MAPPING FOR FUNCTION
fname SERVER sname would evade the conflict.
Can't we just push down the function itself with no option defined?
CREATE ROUTINE MAPPING rm1 FOR FUNCTION pg_backend_pid() SERVER sv1;
funcid is not used in objectaddress.c.
(The patch is missing a LF in a comment in lsyscache.c)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi Corey
Have you looked at pl/proxy ?
It does this and then some (sharding)
It actually started out as a set of pl/pythonu functions, but then got
formalized into a full extension language for defining remote (potentially
sharded) function calls
Best Regards
Hannu Krosng
On Fri, 12 Jan 2018 at 03:38, Corey Huinker <corey.huinker@gmail.com> wrote:
Show quoted text
A few months ago, I was researching ways for formalizing calling functions
on one postgres instance from another. RPC, basically. In doing so, I
stumbled across an obscure part of the the SQL Standard called ROUTINE
MAPPING, which is exactly what I'm looking for.The syntax specified is, roughly:
CREATE ROUTINE MAPPING local_routine_name FOR remote_routine_spec
SERVER my_server [ OPTIONS( ... ) ]Which isn't too different from CREATE USER MAPPING.
The idea here is that if I had a local query:
SELECT t.x, remote_func1(), remote_func2(t.y)
FROM remote_table t
WHERE t.active = true;
that would become this query on the remote side:
SELECT t.x, local_func1(), local_func2(t.y)
FROM local_table t
WHERE t.active = true;
That was probably the main intention of this feature, but I see a
different possibility there. Consider the cases:SELECT remote_func(1,'a');
and
SELECT * FROM remote_srf(10, true);
Now we could have written remote_func() and remote_srf() in plpythonu, and
it could access whatever remote data that we wanted to see, but that
exposes our local server to the untrusted pl/python module as well as
python process overhead.We could create a specialized foreign data wrapper that requires a WHERE
clause to include all the require parameters as predicates, essentially
making every function a table, but that's awkward and unclear to an end
user.Having the ability to import functions from other servers allows us to
write foreign servers that expose functions to the local database, and
those foreign servers handle the bloat and risks associated with accessing
that remote data.Moreover, it would allow hosted environments (AWS, etc) that restrict the
extensions that can be added to the database to still connect to those
foreign data sources.I'm hoping to submit a patch for this someday, but it touches on several
areas of the codebase where I have no familiarity, so I've put forth to
spark interest in the feature, to see if any similar work is underway, or
if anyone can offer guidance.Thanks in advance.
On Mon, Sep 10, 2018 at 3:28 AM Hannu Krosing <hannu.krosing@2ndquadrant.com>
wrote:
Hi Corey
Have you looked at pl/proxy ?
I have, a long while ago.
It does this and then some (sharding)
PL/proxy isn't a part of the SQL Standard.
PL/proxy only connects to other libpq-speaking databases.
The hope with routine mapping is that other data sources that do not easily
conform to a rows-and-columns metaphor can still expose their data to
postgresql.
Thank you for the comment.
On Mon, Sep 10, 2018 at 4:16 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello.
At Tue, 4 Sep 2018 09:34:21 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCBfTKRFPwboss4xVEoVwUmi0gKBgwsWQijviJP3hScwQ@mail.gmail.com>
On Tue, Sep 4, 2018 at 5:48 AM, David Fetter <david@fetter.org> wrote:
On Fri, Aug 31, 2018 at 05:18:26PM +0900, Masahiko Sawada wrote:
On Thu, Jan 25, 2018 at 2:13 PM, David Fetter <david@fetter.org> wrote:
On Thu, Jan 18, 2018 at 04:09:13PM -0500, Corey Huinker wrote:
But other situations seem un-handle-able to me:
SELECT remote_func1(l.x) FROM local_table l WHERE l.active = true;
Do we have any way, or any plan to make a way, to push the set (SELECT
x FROM local_table WHERE active = true) to the remote side for
execution there? Obviously, there are foreign DBs that couldn't
support this, but I'm guessing they wouldn't have much by way of UDFs
either.No. The remote query has to be generated at planning time, so it can't make
predicates out of anything that can't be resolved into constants by the
planner itself. The complexities of doing so would be excessive, far better
to let the application developer split the queries up because they know
better which parts have to resolve first.So Corey and I, with lots of inputs from Andrew Gierth and Matheus
Oliveira, have come up with a sketch of how to do this, to wit:- Extend CREATE FUNCTION to take either FOREIGN and SERVER or AS and
LANGUAGE as parameters, but not both. This seems simpler, at least
in a proof of concept, than creating SQL standard compliant grammar
out of whole cloth. The SQL standard grammar could be layered in
later via the rewriter if this turns out to work.I'm also interested in this feature. While studying this feature, I
understood that this feature just pair a local function with a remote
function, not means that creates a kind of virtual function that can
be invoked on only foreign servers. For example, if we execute the
following SQL the local_func() is invoked in local because the col1
column of local_table is referenced by it.Do you mean that ISO/IEC 9075-9:2016 (right?) is defining that
(and we must follow it)? Or does it comes by referring to
something like [1]? As far as I see David's mail upthread,
OPTIONS is not precisely defined.
Yeah, I read [1] and the final committee draft ISO/IEC 9075-9:2006, it
might be old though.
[1] http://cs.unibo.it/~montesi/CBD/Articoli/SQL-MED.pdf
Unfortunately I don't have access to the document nor concrete
use cases. With a rough idea of "remote mapping", I can guess the
followng four use cases. Each example syntax is just a guess
without any consideration on implementability or other
restrictions. The patch looks currently covering B.
Thank you for summarizing.
A. Just notify a function can be just pushed down.
ex. SELECT foo(1, 'bar'); Remote: SELECT foo(1, 'bar');
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem;
(or same as B)B. Replace function name with the remote equivalent.
ex. SELECT foo(1, 'bar'); Remote: SELECT hoge(1, 'bar');
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
OPTIONS (remote_func_name 'hoge'));C. Adjust function specification with remote.
ex. SELECT foo(1, 'bar'); Remote: SELECT hoge('bar', 1, true);
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
OPTIONS (remote_expression 'hoge($2,$1,true)');D. Replace with an equivalent remote expression.
ex. SELECT foo(1, 'bar'); Remote: SELECT ('bar' || to_char(1 % 10));
CREATE ROUTINE MAPPING foomap FOR foo(int, text) SERVER rem
OPTIONS (remote_expression '$2 || to_char($1 % 10)');I haven't looked the patch in depth, but the core side looks
generic and the FDW side is extensible to A, C and D. I think B
is enough as a starter.
I don't mean that we should implement all of them. They are just possibilities.
I agree that this feature covers A and B as the first step. But I'm
concerned that for D (and maybe for C?) the volatility of mapped
function could be changed. That is, currently we allow to push down
only immutable functions but they might break it. Also, can the
replacing a function with any expression be a risk of sql injections?
Also, according to the standard the routine mapping seems to work when
columns of referenced foreign table are passed to the function that is
mapped to the remote function. The functions in WHERE clause will
obviously be mapped but I'm not sure for function in target lists.
I have some comments on the patch.
It doesn't seem working. Am I missing something?
====
create server sv1 foreign data wrapper postgres_fdw options (host '/tmp', port '5432', dbname 'postgres');
create table lt (a int);
create foreign table ft (a int) server sv1 options (table_name 'lt');
create function lhoge(int) returns int as 'begin return $1 * 2; end;' language plpgsql;
create routine mapping rm1 for function lhoge(int) server sv1 options (remote_func_name 'rhoge');
explain verbose select * from ft where a = lhoge(3);
QUERY PLAN
------------------------------------------------------------------
Foreign Scan on public.ft (cost=100.00..936.31 rows=15 width=4)
Output: a
Filter: (ft.a = lhoge(3))
Remote SQL: SELECT a FROM public.lt
(4 rows)
====Perhaps it cannot push down simple SQL local functions. (I'm not
sure we should do that.)
Yeah, the current patch maps only when the function can be pushed down
to the foreign server. So maybe you can use either the built-in
functions or functions that are provided an extension that is listed
in 'extension' option.
Can't we specify remote schema in remote_func_name just as
(remote_func_name "fooschema.funcname")?
That's a possible option.
Can't we provide the syntax without making MAPPING reserved?
I think it's possible but I haven't tried it hard yet. The problem I
faced is that, for example, without that the "DROP ROUTINE MAPPING
name" where the name can be the ColId conflicts with the "DROP ROUTINE
function_with_argtype_list [CASCADE | RESTRICT]" where
function_with_argtype_list can be the ColId as well.
Chainging the syntax for ALTER/DROP ROUTINE MAPPING like USER
MAPPING, specifically to ALTER/DROP ROUTINE MAPPING FOR FUNCTION
fname SERVER sname would evade the conflict.
Hmm, the syntax of ALTER/DROP ROUTINE MAPPING follow the standard.
Maybe we can discuss later.
Can't we just push down the function itself with no option defined?
CREATE ROUTINE MAPPING rm1 FOR FUNCTION pg_backend_pid() SERVER sv1;
Yeah, it would be an good idea.
funcid is not used in objectaddress.c.
(The patch is missing a LF in a comment in lsyscache.c)
Thanks, I'll fix them in the next patch.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center