review: FDW API
Hi,
what follows is a review of the FDW API patch from
http://archives.postgresql.org/message-id/20110114212358.82C7.6989961C@metrosystems.co.jp
All three patches apply cleanly and compile without warnings. Regression
tests pass.
Let me go patch by patch, starting with the first one that adds the
HANDLER option.
It adds one useless hunk in src/backend/commands/foreigncmds.c (changes
order if #includes).
There's a typo in a C commnent ("determin which validator to be used").
Other than that, it looks OK.
The third patch just adds a GetForeignTable helper function and it looks OK.
The second patch actually adds the API. First of all, I'd like say that
it's a really cool piece of code, allowing all kinds of awesome
funcionality to be added. I'm already excited by the things that this
will make possible. Congratulations!
To get a feel of the API I wrote a simple FDW wrapper that presents data
from the commitfest RSS feed, based heavily on twitter_fdw by Hitoshi
Harada. Please treat my thoughts as someone's who doesn't really know
*why* the API looks like it does, but has some observations about what
was missing or what felt strange when using it. I guess that's the
position a typical FDW wrapper writer will be in.
First of all, the C comments mention that PlanRelScan should put a tuple
descriptor in FdwPlan, but there's no such field in it. Also, comments
for PlanRelScan talk about the 'attnos' argument, which is not in the
function's signature. I guess the comments are just obsolete and should
be updated. I think it's actually a good thing you don't have to put a
TupleDesc in FdwPlan.
There are two API methods, PlanNative and PlanQuery that are ifdef'd out
using IN_THE_FUTURE. Isn't the standard symbol we use NOT_USED? Also,
the comments say you can implement either PlanRelScan or PlanNative, and
only the former is available for now. If we keep these methods
commented, they should be moved to the end of the struct, so that
uncommenting them will not break compatibility with existing FDWs.
The only documentation a FDW writer has is fdwapi.h, so comments there
need to be top-notch. We might contemplate writing a documentation
chapter explaining how FDW handlers should be written, like we do for C
SRFs and libpq programs, but I guess for now just the headers file would
be enough.
FdwExecutionState is just a struct around a void pointer, can we imagine
adding more fields there? If not, maybe we could just remove the
structure and pass void pointers around? OTOH that gives us some
compiler checking and possibility of extending the struct, so I guess we
could also just leave it like that.
The Iterate method gets passed a TupleTableSlot. Do we really need such
a low-level structure? It makes returning the result easy, because you
just form your tuple and call ExecStoreTuple, but perhaps we could
abstract that away a bit and add a helper method that will take a tuple
and call ExecStoreTuple for us, passing InvalidBuffer and false as the
remaining arguments. Or maybe make Iterate return the tuple and call
ExecStoreTuple internally? I don't have strong opinions, but
TupleTableSlot feels a bit too gutty - I'm having a hard time imagining
what fields from it would be useful for a FDW writer, and so perhaps you
don't need to expose it.
Why does BeginScan accept a ParamListInfo argument? First of all, it
feels like a parsing thing, not a relation scan thing, so perhaps it
should be available at some other, earlier stage. Second of all, what
would it be useful for anyway? Neither file_fdw nor my commitfest_fdw
does anything with it.
We could use comments about how to return tuples from Iterate and how to
finish returning them. I had to look at the example to figure out that
you need ExecClearTuple(slot) in your last Iterate. If Iterate's
interface were to change, we could just return NULL instead of a tuple
to say that we're done.
We could be a bit more explicit about how to allocate objects, for
instance if I'm allocating a FdwPlan in my PlanRelScan with a palloc,
will it not go away too soon, or too late (IOW leak)?
I ran into a problem when doing:
select i from generate_series(1, 100000) as g(i), pgcommitfest;
when I was trying to check for leaks. It returned four rows
(pgcommitfest is the foreign table that returns four rows). Explain
analyze shows a nested loop with a foreign scan as inner loop. Maybe
it's because I didn't implement ReScan, but the API says I don't have to.
If you don't implement Iterate you get a elog(ERROR). But if you don't
implement one of the other required methods, you segfault. Feels
inconsistent.
PlanRelScan looks like something that could use all kinds of information
to come up with a good plan. Maybe we could change its input argument to
a single struct that would contain all the current arguments, so it'll
be easier to extend when people will start writing FDWs and will find
out that they'd like more information available. Doing that would mean
that adding a field in a release would mean FDWs just need to be
recompiled and they keep working. Or do we opt for the "better an
explicitly compile error than a silent change" approach? If we'd be just
passing additional info to PlanRelScan I'd say keeping old source
compilable would not be a problem.
Storing private info in FdwPlan's private list is really awkward. I know
it's because you need copyObject support, but it's just a big pain if
you want to store several different things. Passing them around as a big
list and doing list_nth(private, MY_WIDGET_OFFSET) feels like writing
Lisp before you learn it has structures :) Is there really no other way?
Maybe PlanNative should get the foreign table OID, not the server OID,
to resemble PlanRelScan more. The server OID is just a syscache lookup
away, anyway.
If you do EXPLAIN SELECT * FROM foregintable, BeginScan is not called,
but EndScan is. That's weird, and I noticed because I got a segfault
when EndScan tried to free things that BeginScan allocated. Maybe just
don't call EndScan for EXPLAIN?
That's all as far as the API goes. Feel free to ignore most of these
remarks if you see a reason why your choices are better (or necessary).
I just thought I'd try to take a look at it as a user would (which is
what I am, as I don't fully understand the internals) and offer my
impressions.
In general, the feature looks great and I hope it'll make it into 9.1.
And it we'd get the possibility to write FDW handlers in other PLs than
C, it would rock so hard...
I'm going to mark this a Waiting for Author because of the typos, the
BeginScan/EndScan issue, and the nested loop stopping halfway issue. The
rest are suggestions or just thoughts, and if you don't see them as
justified, I'll mark the next patch Ready for Committer.
Cheers,
Jan
On Sun, 16 Jan 2011 01:55:19 +0100
Jan Urbański <wulczer@wulczer.org> wrote:
what follows is a review of the FDW API patch from
http://archives.postgresql.org/message-id/20110114212358.82C7.6989961C@metrosystems.co.jp
Thanks for the comments!
For now, I answer to the first half of your comments. I'll answer to
the rest soon.
All three patches apply cleanly and compile without warnings. Regression
tests pass.Let me go patch by patch, starting with the first one that adds the
HANDLER option.
Sure.
It adds one useless hunk in src/backend/commands/foreigncmds.c (changes
order if #includes).There's a typo in a C commnent ("determin which validator to be used").
Other than that, it looks OK.
Fixed in attached patch.
The third patch just adds a GetForeignTable helper function and it looks OK.
Thanks. This patch might be able to committed separately because it
would be small enough, and similar to existing lookup functions such
as GetForeignDataWrapper() and GetForeignServer().
The second patch actually adds the API. First of all, I'd like say that
it's a really cool piece of code, allowing all kinds of awesome
funcionality to be added. I'm already excited by the things that this
will make possible. Congratulations!To get a feel of the API I wrote a simple FDW wrapper that presents data
from the commitfest RSS feed, based heavily on twitter_fdw by Hitoshi
Harada. Please treat my thoughts as someone's who doesn't really know
*why* the API looks like it does, but has some observations about what
was missing or what felt strange when using it. I guess that's the
position a typical FDW wrapper writer will be in.
Sure, I think your point of view is very important.
First of all, the C comments mention that PlanRelScan should put a tuple
descriptor in FdwPlan, but there's no such field in it. Also, comments
for PlanRelScan talk about the 'attnos' argument, which is not in the
function's signature. I guess the comments are just obsolete and should
be updated. I think it's actually a good thing you don't have to put a
TupleDesc in FdwPlan.
Removed comments about 'attnos' and tuple descriptor.
There are two API methods, PlanNative and PlanQuery that are ifdef'd out
using IN_THE_FUTURE. Isn't the standard symbol we use NOT_USED? Also,
the comments say you can implement either PlanRelScan or PlanNative, and
only the former is available for now. If we keep these methods
commented, they should be moved to the end of the struct, so that
uncommenting them will not break compatibility with existing FDWs.
Agreed. Moved ifdef'd part to the end of struct.
The only documentation a FDW writer has is fdwapi.h, so comments there
need to be top-notch. We might contemplate writing a documentation
chapter explaining how FDW handlers should be written, like we do for C
SRFs and libpq programs, but I guess for now just the headers file would
be enough.
file_fdw and postgresql_fdw would be good samples for wrapper
developer if we could ship them as contrib modules.
FdwExecutionState is just a struct around a void pointer, can we imagine
adding more fields there? If not, maybe we could just remove the
structure and pass void pointers around? OTOH that gives us some
compiler checking and possibility of extending the struct, so I guess we
could also just leave it like that.
ISTM that using a struct as a interface is better than void*, as you
mentioned.
The Iterate method gets passed a TupleTableSlot. Do we really need such
a low-level structure? It makes returning the result easy, because you
just form your tuple and call ExecStoreTuple, but perhaps we could
abstract that away a bit and add a helper method that will take a tuple
and call ExecStoreTuple for us, passing InvalidBuffer and false as the
remaining arguments. Or maybe make Iterate return the tuple and call
ExecStoreTuple internally? I don't have strong opinions, but
TupleTableSlot feels a bit too gutty - I'm having a hard time imagining
what fields from it would be useful for a FDW writer, and so perhaps you
don't need to expose it.
This would be debatable issue. Currently Iterate() is expected to
return materialized HeapTuple through TupleTableSlot.
I think an advantage to use TupleTableSlot is that FDW can set shoudFree
flag for the tuple.
Why does BeginScan accept a ParamListInfo argument? First of all, it
feels like a parsing thing, not a relation scan thing, so perhaps it
should be available at some other, earlier stage. Second of all, what
would it be useful for anyway? Neither file_fdw nor my commitfest_fdw
does anything with it.
ParamListInfo is added to pass parameters of PREPARE/EXECUTE statement
to FDWs.
Plan for a prepared query is generated at PREPARE with placeholders,
and executed at EXECUTE with actual values. With ParamListInfo
parameter for BeginScan(), FDWs would be able to optimize their remote
query with actual parameter values.
Regard,
--
Shigeru Hanada
On Mon, 17 Jan 2011 22:13:19 +0900
Shigeru HANADA <hanada@metrosystems.co.jp> wrote:
Fixed in attached patch.
Sorry, I have not attached patch to last message.
I'll post revised patch in next message soon.
Regards,
--
Shigeru Hanada
On Sun, 16 Jan 2011 01:55:19 +0100
Jan Urbański <wulczer@wulczer.org> wrote:
<snip>
In general, the feature looks great and I hope it'll make it into 9.1.
And it we'd get the possibility to write FDW handlers in other PLs than
C, it would rock so hard...I'm going to mark this a Waiting for Author because of the typos, the
BeginScan/EndScan issue, and the nested loop stopping halfway issue. The
rest are suggestions or just thoughts, and if you don't see them as
justified, I'll mark the next patch Ready for Committer.
Thanks a lot for the comments. I've (hopefully) fixed issues above.
Please find attached patches.
== patch list ==
1) 20110118-no_fdw_perm_check.patch - This patch is not included in
last post. This had been proposed on 2011-01-05 first, but maybe has
not been reviewd yet. I re-propose this patch for SQL standard
conformance. This patch removes permission check that requires USAGE
on the foreign-data wrapper at CREATE FOREIGN TABLE.
Please see original post for details.
http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp
2) 20110118-fdw_catalog_lookup.patch - This patch adds GetForeignTables.
Fixed lack of pg_foreign_table.h inclusion.
3) 20110118-fdw_handler.patch - This patch adds support for HANDLER
option to FOREIGN DATA WRAPPER object.
4) 20110118-foreign_scan.patch - This patch adds ForeignScan executor
node and FDW API hooks based on Heikki's proposal. As Itagaki-san
suggested on 2010-12-21, FDW must generate information for EXPLAIN
VERBOSE every PlanRelScan() call. It's better to avoid such overhead,
so new EXPLAIN hook would be needed. I'll try to make it cleaner.
================
And I'll reply to the rest of your comments.
We could use comments about how to return tuples from Iterate and how to
finish returning them. I had to look at the example to figure out that
you need ExecClearTuple(slot) in your last Iterate. If Iterate's
interface were to change, we could just return NULL instead of a tuple
to say that we're done.
I've added some comments for FDW-developer to fdwapi.h, though they
wouldn't be enough.
We could be a bit more explicit about how to allocate objects, for
instance if I'm allocating a FdwPlan in my PlanRelScan with a palloc,
will it not go away too soon, or too late (IOW leak)?
For that example, the answer is no. Objects are allocated in
MessageContext if you don't switch context and released when the query
has been finished. I agree that more documentation or comments for
FDW-developers should be added.
Maybe PlanNative should get the foreign table OID, not the server OID,
to resemble PlanRelScan more. The server OID is just a syscache lookup
away, anyway.
You would missed a case that multiple foreign tables are used in a
query. Main purpose of PlanNative is to support pass-through
execution of remote query. In pass-through mode, you can use syntax
as if you have directly connected to external server, so can't use
PostgreSQL's parser.
If you do EXPLAIN SELECT * FROM foregintable, BeginScan is not called,
but EndScan is. That's weird, and I noticed because I got a segfault
when EndScan tried to free things that BeginScan allocated. Maybe just
don't call EndScan for EXPLAIN?
Fixed to not call EndScan if it was EXPLAIN execution.
Regards,
--
Shigeru Hanada
On 18.01.2011 17:26, Shigeru HANADA wrote:
1) 20110118-no_fdw_perm_check.patch - This patch is not included in
last post. This had been proposed on 2011-01-05 first, but maybe has
not been reviewd yet. I re-propose this patch for SQL standard
conformance. This patch removes permission check that requires USAGE
on the foreign-data wrapper at CREATE FOREIGN TABLE.
Please see original post for details.
http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp
Committed this part.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, Jan 21, 2011 at 10:17 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 18.01.2011 17:26, Shigeru HANADA wrote:
1) 20110118-no_fdw_perm_check.patch - This patch is not included in
last post. This had been proposed on 2011-01-05 first, but maybe has
not been reviewd yet. I re-propose this patch for SQL standard
conformance. This patch removes permission check that requires USAGE
on the foreign-data wrapper at CREATE FOREIGN TABLE.
Please see original post for details.http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp
Committed this part.
How much review have you done of parts (3) and (4)? The key issue for
all of the FDW work in progress seems to be what the handler API is
going to look like, and so once we get that committed it will unblock
a lot of other things.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 21.01.2011 17:57, Robert Haas wrote:
How much review have you done of parts (3) and (4)?
Not much. I'm getting there..
The key issue for
all of the FDW work in progress seems to be what the handler API is
going to look like, and so once we get that committed it will unblock
a lot of other things.
Yep. The API that's there now was originally suggested by me, so I
probably won't have big complaints about it. I'll have to also look at
the PostgreSQL and file implementations of it to see that it really fits
the bill.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 18.01.2011 17:26, Shigeru HANADA wrote:
3) 20110118-fdw_handler.patch - This patch adds support for HANDLER
option to FOREIGN DATA WRAPPER object.
Some quick comments on that:
* I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
the handler function, if it doesn't exist yet. That's what CREATE
LANGUAGE does, which is similar. Although it doesn't seem to be
documented for CREATE LANGUAGE either, is it deprecated?
* The elogs in parse_func_options() should be ereports.
* pg_dump should check the version number and only try to select
fdwhandler column if >= 9.1. See the other functions there for example
of that.
* dumpForeignDataWrapper() in pg_dump checks if fdwhandler field is "-".
I don't think we use that as magic value there, do we? Same with validator.
* Please check that the HANDLER and VALIDATOR options that pg_dump
creates properly quoted.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Some quick comments on that:
* I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
the handler function, if it doesn't exist yet. That's what CREATE
LANGUAGE does, which is similar. Although it doesn't seem to be
documented for CREATE LANGUAGE either, is it deprecated?
Doing that would require the equivalent of pg_pltemplate for FDWs, no?
I think we're a long way from wanting to do that. Also, it seems to me
that add-on FDWs are likely to end up getting packaged as extensions,
so the extension machinery will probably render the question moot pretty
soon.
regards, tom lane
On Sat, Jan 22, 2011 at 07:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
* I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
the handler function, if it doesn't exist yet.Doing that would require the equivalent of pg_pltemplate for FDWs, no?
I think we're a long way from wanting to do that. Also, it seems to me
that add-on FDWs are likely to end up getting packaged as extensions,
The proposed file_fdw.sql actually creates a default FDW on installation.
So I think the installation scripts work as a template even if we don't
have FDW template catalogs.
+ /* contrib/file_fdw/file_fdw.sql.in */
+ -- create wrapper with validator and handler
+ CREATE OR REPLACE FUNCTION file_fdw_validator (text[], oid)
+ CREATE OR REPLACE FUNCTION file_fdw_handler ()
+ CREATE FOREIGN DATA WRAPPER file_fdw
+ VALIDATOR file_fdw_validator HANDLER file_fdw_handler;
--
Itagaki Takahiro
On 21.01.2011 17:57, Robert Haas wrote:
On Fri, Jan 21, 2011 at 10:17 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:On 18.01.2011 17:26, Shigeru HANADA wrote:
1) 20110118-no_fdw_perm_check.patch - This patch is not included in
last post. This had been proposed on 2011-01-05 first, but maybe has
not been reviewd yet. I re-propose this patch for SQL standard
conformance. This patch removes permission check that requires USAGE
on the foreign-data wrapper at CREATE FOREIGN TABLE.
Please see original post for details.http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp
Committed this part.
How much review have you done of parts (3) and (4)? The key issue for
all of the FDW work in progress seems to be what the handler API is
going to look like, and so once we get that committed it will unblock
a lot of other things.
I've gone through the code in a bit more detail now. I did a bunch of
cosmetic changes along the way, patch attached. I also added a few
paragraphs in the docs. We need more extensive documentation, but this
at least marks the places where I think the docs need to go.
Comments:
* How can a FDW fetch only the columns required by the scan? The file
FDW has to read the whole file anyhow, but it could perhaps skip calling
the input function for unnecessary columns. But more importantly, with
something like the postgresql_fdw you don't want to fetch any extra
columns across the wire. I gather the way to do it is to copy
RelOptInfo->attr_needed to private storage at plan stage, and fill the
not-needed attributes with NULLs in Iterate. That gets a bit awkward,
you need to transform attr_needed to something that can be copied for
starters. Or is that information somehow available at execution phase
otherwise?
* I think we need something in RelOptInfo to mark foreign tables. At the
moment, you need to call IsForeignTable() which does a catalog lookup.
Maybe a new RTEKind, or a boolean flag.
* Can/should we make ReScan optional? Could the executor just do
EndScan+BeginScan if there's no ReScan function?
* Is there any point in allowing a FDW without a handler? It's totally
useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in
previous versions, and it allowed it, but it has always been totally
useless so I don't think we need to worry much about
backwards-compatibility here.
* Is there any use case for changing the handler or validator function
of an existign FDW with ALTER? To me it just seems like an unnecessary
complication.
* IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
experience with another DBMS that had this feature, the SQL being sent
to the remote server was almost always the key piece of information that
I was looking for in the query plans.
* this check in expand_inherited_rtentry seems misplaced:
/*
* SELECT FOR UPDATE/SHARE is not allowed on foreign tables because
* they are read-only.
*/
if (newrelation->rd_rel->relkind == RELKIND_FOREIGN_TABLE &&
lockmode != AccessShareLock)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("SELECT FOR UPDATE/SHARE is not allowed with foreign tables")));
I don't understand why we'd need to do that for inherited tables in
particular. And it's not working for regular non-inherited foreign tables:
postgres=# SELECT * FROM filetbl2 FOR UPDATE;
ERROR: could not open file "base/11933/16397": No such file or directory
* Need to document how the FDW interacts with transaction
commit/rollback. In particular, I believe EndScan is never called if the
transaction is aborted. That needs to be noted explicitly, and need to
suggest how to clean up any external resources in that case. For
example, postgresql_fdw will need to close any open cursors or result sets.
In general, I think the design is sound. What we need is more
documentation. It'd also be nice to see the postgresql_fdw brought back
to shape so that it works against this latest version of the api patch.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
small-fdw-changes.patchtext/x-diff; name=small-fdw-changes.patchDownload+236-60
On Mon, Jan 24, 2011 at 8:08 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
* Is there any point in allowing a FDW without a handler? It's totally
useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in previous
versions, and it allowed it, but it has always been totally useless so I
don't think we need to worry much about backwards-compatibility here.
Aren't things like dblink using this in its existing form?
* Is there any use case for changing the handler or validator function of an
existign FDW with ALTER? To me it just seems like an unnecessary
complication.
+1.
* IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
experience with another DBMS that had this feature, the SQL being sent to
the remote server was almost always the key piece of information that I was
looking for in the query plans.
+1.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jan 24, 2011 at 8:08 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
How much review have you done of parts (3) and (4)? The key issue for
all of the FDW work in progress seems to be what the handler API is
going to look like, and so once we get that committed it will unblock
a lot of other things.I've gone through the code in a bit more detail now. I did a bunch of
cosmetic changes along the way, patch attached. I also added a few
paragraphs in the docs. We need more extensive documentation, but this at
least marks the places where I think the docs need to go.Comments:
I haven't seen any responses to these comments. Time grows short to
get this committed to PostgreSQL 9.1. We need responses to these
comments and an updated patch ASAP.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, 21 Jan 2011 18:28:19 +0200
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 18.01.2011 17:26, Shigeru HANADA wrote:
3) 20110118-fdw_handler.patch - This patch adds support for HANDLER
option to FOREIGN DATA WRAPPER object.Some quick comments on that:
Thanks for the comments.
I'll post revised version of patches soon.
* The elogs in parse_func_options() should be ereports.
* pg_dump should check the version number and only try to select
fdwhandler column if >= 9.1. See the other functions there for example
of that.
Fixed.
* dumpForeignDataWrapper() in pg_dump checks if fdwhandler field is "-".
I don't think we use that as magic value there, do we? Same with validator.
That magic value, "-", is used as "no-function-was-set" in
dumpForeignDataWrapper() and dumpForeignServer(), and I followed them.
Agreed that magic value should be removed, but it would be a refactoring
issue about pg_dump.
* Please check that the HANDLER and VALIDATOR options that pg_dump
creates properly quoted.
I checked quoting for HANDLER and VALIDATOR with file_fdw functions,
and it seems works fine. The pg_dump generats:
------------
CREATE FOREIGN DATA WRAPPER dummy_fdw VALIDATOR public."File_Fdw_Validator"
HANDLER public."FILE_FDW_HANDLER";
------------
from these DDLs:
------------
CREATE OR REPLACE FUNCTION "File_Fdw_Validator" (text[], oid)
RETURNS bool
AS '$libdir/file_fdw','file_fdw_validator'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION "FILE_FDW_HANDLER" ()
RETURNS fdw_handler
AS '$libdir/file_fdw','file_fdw_handler'
LANGUAGE C STRICT;
CREATE FOREIGN DATA WRAPPER dummy_fdw
VALIDATOR "File_Fdw_Validator" HANDLER "FILE_FDW_HANDLER";
------------
Regard,
--
Shigeru Hanada
On Mon, 24 Jan 2011 15:08:11 +0200
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
I've gone through the code in a bit more detail now. I did a bunch of
cosmetic changes along the way, patch attached. I also added a few
paragraphs in the docs. We need more extensive documentation, but this
at least marks the places where I think the docs need to go.Comments:
Thanks for the comments!
* How can a FDW fetch only the columns required by the scan? The file
FDW has to read the whole file anyhow, but it could perhaps skip calling
the input function for unnecessary columns. But more importantly, with
something like the postgresql_fdw you don't want to fetch any extra
columns across the wire. I gather the way to do it is to copy
RelOptInfo->attr_needed to private storage at plan stage, and fill the
not-needed attributes with NULLs in Iterate. That gets a bit awkward,
you need to transform attr_needed to something that can be copied for
starters. Or is that information somehow available at execution phase
otherwise?
I thought that RelOptInfo->reltargetlist, a list of Var, can be used
for that purpose. FdwPlan can copy it with copyObject(), and pass
it to Iterate through FdwPlan->fdw_private.
Then, postgresql_fdw would be able to retrieve only necessary columns,
or just use "NULL" for unnecessary columns in the SELECT clause to
avoid mapping values to columns. Each way would be decrease amount of
data transfer.
* I think we need something in RelOptInfo to mark foreign tables. At the
moment, you need to call IsForeignTable() which does a catalog lookup.
Maybe a new RTEKind, or a boolean flag.
We can avoid catalog lookup with checking table type in get_relation_info()
and updating RelOptInfo->is_foreign_table if the target was a foreign
table.
* Can/should we make ReScan optional? Could the executor just do
EndScan+BeginScan if there's no ReScan function?
Right, we have enough information to call BeginScan again. Will fix.
* Is there any point in allowing a FDW without a handler? It's totally
useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in
previous versions, and it allowed it, but it has always been totally
useless so I don't think we need to worry much about
backwards-compatibility here.
dblink (and possibly other external modules) uses FDW without a
handler.
* Is there any use case for changing the handler or validator function
of an existign FDW with ALTER? To me it just seems like an unnecessary
complication.
AFAICS, the only case for that is upgrading FDW to new one without
re-creating foreign tables. I don't have strong opinion for this
issue, and it seems reasonable to remove ALTER feature in first
version.
* IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
experience with another DBMS that had this feature, the SQL being sent
to the remote server was almost always the key piece of information that
I was looking for in the query plans.
Agreed, will fix to show FDW-info always. Is it reasonable to show
"FDW-info" row even if a FDW set explainInfo to NULL?
* this check in expand_inherited_rtentry seems misplaced:
/*
* SELECT FOR UPDATE/SHARE is not allowed on foreign tables because
* they are read-only.
*/
if (newrelation->rd_rel->relkind == RELKIND_FOREIGN_TABLE &&
lockmode != AccessShareLock)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("SELECT FOR UPDATE/SHARE is not allowed with foreign tables")));I don't understand why we'd need to do that for inherited tables in
particular. And it's not working for regular non-inherited foreign tables:postgres=# SELECT * FROM filetbl2 FOR UPDATE;
ERROR: could not open file "base/11933/16397": No such file or directory
It's a remnants of table inheritance support for foreign tables. This
check should be removed from here, and another check should be added
to avoid above error.
* Need to document how the FDW interacts with transaction
commit/rollback. In particular, I believe EndScan is never called if the
transaction is aborted. That needs to be noted explicitly, and need to
suggest how to clean up any external resources in that case. For
example, postgresql_fdw will need to close any open cursors or result sets.
I agree that resource cleanup is an important issue when writing FDW.
FDW should use transaction-safe resources like VirtualFile, or use
ResourceOwner callback mechanism. Is it reasonable to add new page
under "Chapter 35. Extending SQL"?
In general, I think the design is sound. What we need is more
documentation. It'd also be nice to see the postgresql_fdw brought back
to shape so that it works against this latest version of the api patch.
I'll post FDW API patches which reflect comments first, and then I'll
rebase postgresql_fdw against them.
Regards,
--
Shigeru Hanada
On Mon, Jan 31, 2011 at 8:00 AM, Shigeru HANADA
<hanada@metrosystems.co.jp> wrote:
* Is there any use case for changing the handler or validator function
of an existign FDW with ALTER? To me it just seems like an unnecessary
complication.AFAICS, the only case for that is upgrading FDW to new one without
re-creating foreign tables. I don't have strong opinion for this
issue, and it seems reasonable to remove ALTER feature in first
version.
-1. I don't think that removing the ability to change this is going
to save a measurable amount of complexity, and it certainly will suck
if you need it and don't have it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, 31 Jan 2011 22:00:55 +0900
Shigeru HANADA <hanada@metrosystems.co.jp> wrote:
I'll post FDW API patches which reflect comments first, and then I'll
rebase postgresql_fdw against them.
Sorry for late, attached are revised version of FDW API patches which
reflect Heikki's comments except removing catalog lookup via
IsForeignTable(). ISTM that the point is avoiding catalog lookup
during planning, but I have not found when we can set "foreign table
flag" without catalog lookup during RelOptInfo generation.
Please apply attached patches in this order.
1) fdw_catalog_lookup.patch
2) fdw_handler.patch
3) foreign_scan.patch
To execute SELECT quereis for foreign tables, you need a FDW which has
valid fdwhandler function. The file_fdw which is posted in another
thread "SQL/MED file_fdw" would help.
Changes from last patches are:
1) Now SELECT FOR UPDATE check for foreign tables are done properly in
executor phase, in ExecLockTuple(). Or such check should be done in
parser or planner?
2) Server version is checked in pg_dump (>= 90100).
3) ReScan is not required now. If ReScan is not supplied, ForeignScan
uses EndScan + BeginSacn instead.
4) FDW-Info in EXPLAIN is shown always, except FDW set NULL to
explainInfo.
Regards,
--
Shigeru Hanada
On 07.02.2011 08:00, Shigeru HANADA wrote:
Sorry for late, attached are revised version of FDW API patches which
reflect Heikki's comments except removing catalog lookup via
IsForeignTable(). ISTM that the point is avoiding catalog lookup
during planning, but I have not found when we can set "foreign table
flag" without catalog lookup during RelOptInfo generation.
In get_relation_info(), you do the catalog lookup anyway and have the
Relation object at hand. Add a flag to RelOptInfo indicating if it's a
foreign table or not, and set that in get_relation_info().
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, 07 Feb 2011 09:37:37 +0100
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
On 07.02.2011 08:00, Shigeru HANADA wrote:
Sorry for late, attached are revised version of FDW API patches which
reflect Heikki's comments except removing catalog lookup via
IsForeignTable(). ISTM that the point is avoiding catalog lookup
during planning, but I have not found when we can set "foreign table
flag" without catalog lookup during RelOptInfo generation.In get_relation_info(), you do the catalog lookup anyway and have the
Relation object at hand. Add a flag to RelOptInfo indicating if it's a
foreign table or not, and set that in get_relation_info().
Thanks a lot.
Attached is a revised version of foreign_scan patch. This still
requires fdw_handler patch which was attached to the orginal post.
Avoid_catalog_lookup.patch is attached for review purpose.
This patch includes changes for this fix.
Regards,
--
Shigeru Hanada
On 08.02.2011 13:07, Shigeru HANADA wrote:
On Mon, 07 Feb 2011 09:37:37 +0100
Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:In get_relation_info(), you do the catalog lookup anyway and have the
Relation object at hand. Add a flag to RelOptInfo indicating if it's a
foreign table or not, and set that in get_relation_info().Thanks a lot.
Attached is a revised version of foreign_scan patch. This still
requires fdw_handler patch which was attached to the orginal post.Avoid_catalog_lookup.patch is attached for review purpose.
This patch includes changes for this fix.
Thanks.
I spent some more time reviewing this, and working on the PostgreSQL FDW
in tandem. Here's an updated API patch, with a bunch of cosmetic
changes, and a bug fix for FOR SHARE/UPDATE. FOR SHARE/UPDATE on a
foreign table should behave the same as on other relations that can't be
locked, like a function scan. If you explicitly specify such a relation
with "FOR UPDATE OF foo", you should get an error, and if you just have
an unspecified "FOR UPDATE", it should be silently ignored for foreign
tables.
Doing that requires that we can distinguish foreign tables from other
relations in transformLockingClause(), but we don't have a RelOptInfo at
that stage yet. I just used get_rel_relkind() there (and elsewhere
instead of the IsForeignTable() function), but I've got a nagging
feeling that sooner or later we'll have to bite the bullet and add that
field to RangeTblEntry, or introduce a whole new rtekind for foreign tables.
As for the PostgreSQL FDW, I'm trying to make it do two basic tricks, to
validate the API:
1. Only fetch those columns that are actually needed by the query. This
involves examining the baserel->reltargetlist, also paying attention to
whole-row Vars.
2. Push down simple quals, like "column = 'foo'". To do that, I'm trying
to use the deparsing code from ruleutils.c.
That's pretty much what Shigeru's original postgresql_fdw patch also
did, but I've changed the implementation quite heavily to make it work
with the new API. That said, it's still a mess, but I think it validates
that the API is usable. We could offer a lot more help for FDW authors
to make those things easier, but I think this is acceptable for now.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com