pgsql_fdw, FDW for PostgreSQL server
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.
I attached three patches for this new FDW. They should be applied in
the order below. I separated these patches so that first (or first two)
can be committed separately.
* fdw_helper_doc.patch provides documents for FDW developers about
helper functions existing in 9.1, so this can be back-patched.
* fdw_helper_funcs.patch provides additional helper functions which
would make manipulation of FDW options easier.
* pgsql_fdw.patch provides new FDW for external PG server.
Here are details of pgsql_fdw.
Name of the wrapper
===================
I used the name "pgsql_fdw" for the wrapper and its derivatives. I
think it would be better to leave contrib/dblink and built-in
postgresql_fdw_validator for backward compatibility, and use new name
for new wrapper.
Or, it might be OK to rename postgresql_fdw_validator to
dblink_validator or something, or fix dblink to use validator of new
wrapper. I'm not sure that dblink should be alone or integrated with
pgsql_fdw...
Connection management
=====================
The pgsql_fdw establishes a new connection when a foreign server is
accessed first for the local session. Established connection is shared
between all foreign scans in the local query, and shared between even
scans in following queries. Connections are discarded when the current
transaction aborts so that unexpected failure won't cause connection
leak. This is implemented with resource owner mechanism.
User can see active connections via pgsql_fdw_connections view, and
discard arbitrary connection via pgsql_fdw_disconnect() function. These
can be done from only same local session.
If local role has changed via SET ROLE or SET SESSION AUTHENTICATION,
pgsql_fdw ignores old role's connections and looks up appropriate
connection for the new role from the pool. If there wasn't suitable
one, pgsql_fdw establishes new connection. When local role has changed
to old role again, pooled connection will be used again.
Unlike contrib/dblink, one foreign server can have only one connection
at a time for one local role. This is because pgsql_fdw doesn't support
named connections.
Cost estimation
===============
The pgsql_fdw executes an EXPLAIN command on remote side for each
PlanForeignScan call. Returned costs and rows are used as local
estimation for the Path with adding connection costs and data transfer
costs.
SELECT optimization
===================
To reduce amount of data transferred from remote server, references to
unnecessary columns are replaced with NULL literal in remote query.
WHERE clause push-down
======================
Some kind of qualifiers in WHERE clause are pushed down to remote server
so that the query result can be reduced. Currently qualifiers which
include any volatile or stable element can't be pushed down. Even with
these limitations, most qualifiers would be pushed down in usual cases.
Cursor mode
===========
The pgsql_fdw switches the way to retrieve result records according to
estimated result rows; use simple SELECT for small result, and use
cursor with DECLARE/FETCH statements for large result. The threshold
is default to 1000, and configurable with FDW option "min_cursor_rows".
In cursor mode, number of rows fetched at once can be controlled by FDW
option "fetch_count".
EXPLAIN output
==============
The pgsql_fdw shows a remote query used for each foreign scan node in
the output of EXPLAIN command with title "Remote SQL". If pgsql_fdw
decided to use cursor for the scan, DECLARE statement is shown.
Regards,
--
Shigeru Hanada
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.
I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. �I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.
I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...
We've just spent a whole lot of blood and sweat on making the extension
mechanism work nicely. I don't understand this urge to not use it.
ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
Once we do that its release schedule will get locked to core's ---
wouldn't it be better to keep flexibility for now, while it's in such
active development?
regards, tom lane
On Tue, Oct 25, 2011 at 14:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...We've just spent a whole lot of blood and sweat on making the extension
mechanism work nicely. I don't understand this urge to not use it.
We're back to the old discussion, I guess.. I'm happy to see it as an
extension, but I think it should be included with the standard
installation. Like we do with for example pl/pgsql (which I realize
has a dependency on the backend anyway, so it can't be done another
way easily) and pl/perl (which doesn't, AFAIK, so it's a better
example)
ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
Once we do that its release schedule will get locked to core's ---
wouldn't it be better to keep flexibility for now, while it's in such
active development?
I would be happy to keep it outside, and integrate it in the final CF
for example :)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
Once we do that its release schedule will get locked to core's ---
wouldn't it be better to keep flexibility for now, while it's in such
active development?I would be happy to keep it outside, and integrate it in the final CF
for example :)
Right now, file_fdw is the only FDW module that we have in the core,
however, it is inadequacy to proof the new concept of FDW feature
to utilize external RDBMS, such as join push-down of foreign tables.
I think the pgsql-fdw module also should be included in the core
distribution as a basis of future enhancement, unless we don't
need any working modules when an enhancement of FDW is
proposed.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
Right now, file_fdw is the only FDW module that we have in the core,
Erm, guess I'm a bit confused why we've got that in core while not
putting pgsql_fdw in core. This all gets back to previous discussions
around 'recommended' contrib modules (which should really be installed
by default on the filesystem through the distros, ala Debian's
"recommends:" approach) and 'other' contrib modules.
I'm in favor of making that distinction. I would still have pgsql_fdw,
file_fdw, etc, be packaged more-or-less the same way and still use the
CREATE EXTENTION framework, of course.
It would be nice if we didn't have to lock the release schedule of those
recommended modules to the core release schedule, or even to each other,
but that's a separate issue, imv.
Thanks,
Stephen
On Tue, Oct 25, 2011 at 3:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...We've just spent a whole lot of blood and sweat on making the extension
mechanism work nicely. I don't understand this urge to not use it.ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib.
Once we do that its release schedule will get locked to core's ---
wouldn't it be better to keep flexibility for now, while it's in such
active development?
Simple question - do FDW internals need work?
--
marko
(2011/10/25 19:15), Magnus Hagander wrote:
2011/10/25 Shigeru Hanada<shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...
Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb? It would be accomplished with
some more changes:
* Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
install dynamically loadable module during "make install" for core. The
pgsql_fdw_handler function can't be included into core binary because we
must avoid liking libpq with server binary directly. This method is
also used for libwalreceiver of replication module.
* Create pgsql_fdw extension during initdb invocation, like plpgsql.
These are not trivial, but not difficult so much. However, I think
contrib would be the appropriate place for pgsql_fdw because it's
(relatively) special feature.
--
Shigeru Hanada
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
(2011/10/25 19:15), Magnus Hagander wrote:
I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...
Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb?
If that was what he meant, I'd vote against it. There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net. This feature should be something that has to be
installed by explicit user action.
regards, tom lane
2011/10/26 Shigeru Hanada <shigeru.hanada@gmail.com>:
(2011/10/25 19:15), Magnus Hagander wrote:
2011/10/25 Shigeru Hanada<shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb? It would be accomplished with
some more changes:* Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
install dynamically loadable module during "make install" for core. The
pgsql_fdw_handler function can't be included into core binary because we
must avoid liking libpq with server binary directly. This method is
also used for libwalreceiver of replication module.
* Create pgsql_fdw extension during initdb invocation, like plpgsql.These are not trivial, but not difficult so much. However, I think
contrib would be the appropriate place for pgsql_fdw because it's
(relatively) special feature.
I agree. pgsql_fdw will be a nice feature, but there's no reason to
think that everyone will want it installed by default, and there are
some security reasons to think that they might not. On the flip side,
pushing it out of contrib and onto pgfoundry or whatever makes it
unnecessarily difficult to install, and not as many people will
benefit from it. So contrib seems exactly right to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Stephen Frost <sfrost@snowman.net> writes:
I'm in favor of making that distinction. I would still have pgsql_fdw,
file_fdw, etc, be packaged more-or-less the same way and still use the
CREATE EXTENTION framework, of course.
We called that idea “core extension” at the latest hackers meeting, and
Greg Smith had a patch with a first selections of extensions to package
this way.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
2011/10/26 Robert Haas <robertmhaas@gmail.com>:
2011/10/26 Shigeru Hanada <shigeru.hanada@gmail.com>:
(2011/10/25 19:15), Magnus Hagander wrote:
2011/10/25 Shigeru Hanada<shigeru.hanada@gmail.com>:
I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
contrib module. I think that this module would be the basis of further
SQL/MED development for core, e.g. join-push-down and ANALYZE support.I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb? It would be accomplished with
some more changes:* Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
install dynamically loadable module during "make install" for core. The
pgsql_fdw_handler function can't be included into core binary because we
must avoid liking libpq with server binary directly. This method is
also used for libwalreceiver of replication module.
* Create pgsql_fdw extension during initdb invocation, like plpgsql.These are not trivial, but not difficult so much. However, I think
contrib would be the appropriate place for pgsql_fdw because it's
(relatively) special feature.I agree. pgsql_fdw will be a nice feature, but there's no reason to
think that everyone will want it installed by default, and there are
some security reasons to think that they might not. On the flip side,
pushing it out of contrib and onto pgfoundry or whatever makes it
unnecessarily difficult to install, and not as many people will
benefit from it. So contrib seems exactly right to me.
I also agree. The pgsql_fdw will be worthful to locate in the main tree
as a contrib module. It will give us clear opportunity to test new
features of FDW using RDBMS characteristics; such as join-push-down.
However, it should be a separated discussion whether it shall be installed
by the default.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
On Wed, Oct 26, 2011 at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
(2011/10/25 19:15), Magnus Hagander wrote:
I have not looked at the code itself, but I wonder if we shouldn't
consider making this a part of core-proper, not just a contrib module.
The fact that it isn't *already* available in core surprises a lot of
people...Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb?If that was what he meant, I'd vote against it. There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net. This feature should be something that has to be
installed by explicit user action.
That is not what I meant.
I meant installed the shared library by defualt, but still require
CREATE EXTENSION.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 10/26/2011 12:47 PM, Magnus Hagander wrote:
If that was what he meant, I'd vote against it. There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net. This feature should be something that has to be
installed by explicit user action.That is not what I meant.
I meant installed the shared library by defualt, but still require
CREATE EXTENSION.
I don't see why it should be different from other standard modules, such
as citext or hstore, both of which have pretty wide use, and less
possible security implications than this.
cheers
andrew
On Wed, Oct 26, 2011 at 19:25, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/26/2011 12:47 PM, Magnus Hagander wrote:
If that was what he meant, I'd vote against it. There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net. This feature should be something that has to be
installed by explicit user action.That is not what I meant.
I meant installed the shared library by defualt, but still require
CREATE EXTENSION.I don't see why it should be different from other standard modules, such as
citext or hstore, both of which have pretty wide use, and less possible
security implications than this.
As I stated earlier, it's really back to the old discussion of
splitting up contrib. This would be the "additional module" part, but
not the "example of how to do things" part of that...
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Wed, Oct 26, 2011 at 16:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
If that was what he meant, I'd vote against it. �There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net. �This feature should be something that has to be
installed by explicit user action.
That is not what I meant.
I meant installed the shared library by defualt, but still require
CREATE EXTENSION.
Whether the shlib is installed by default is a decision for packagers to
make, not us. At best we could make a recommendation.
regards, tom lane
(2011/10/26 23:57), Kohei KaiGai wrote:
2011/10/26 Robert Haas<robertmhaas@gmail.com>:
I agree. pgsql_fdw will be a nice feature, but there's no reason to
think that everyone will want it installed by default, and there are
some security reasons to think that they might not. On the flip side,
pushing it out of contrib and onto pgfoundry or whatever makes it
unnecessarily difficult to install, and not as many people will
benefit from it. So contrib seems exactly right to me.I also agree. The pgsql_fdw will be worthful to locate in the main tree
as a contrib module. It will give us clear opportunity to test new
features of FDW using RDBMS characteristics; such as join-push-down.
However, it should be a separated discussion whether it shall be installed
by the default.
There seems to be some approvals on pushing pgsql_fdw into main tree
(contrib or core extension, or something else), but not an external
module. There are still some debatable issues, but they would be
meaningless unless pgsql_fdw is qualified for a contrib module. So I'd
like to continue the development of pgsql_fdw as contrib module, at
least for a while.
Please find attached a patch for pgsql_fdw. This patch needs first two
patches attached to OP[1]http://archives.postgresql.org/pgsql-hackers/2011-10/msg01329.php to be applied. (Sorry. gathering patches from
another post must be bothersome work. Should I create new CF items for
fundamental patches?)
[1]: http://archives.postgresql.org/pgsql-hackers/2011-10/msg01329.php
Changes done since last post are:
* add colname FDW option support
* allow some libpq options (authtype and tty) to be specified as server
FDW options
--
Shigeru Hanada
* ポルトガル語 - 自動検出
* 英語
* 日本語
* 英語
* 日本語
<javascript:void(0);>
Attachments:
pgsql_fdw_v2.patchtext/plain; name=pgsql_fdw_v2.patchDownload+3040-0
2011/10/25 Shigeru Hanada <shigeru.hanada@gmail.com>:
Connection management
=====================
The pgsql_fdw establishes a new connection when a foreign server is
accessed first for the local session. Established connection is shared
between all foreign scans in the local query, and shared between even
scans in following queries. Connections are discarded when the current
transaction aborts so that unexpected failure won't cause connection
leak. This is implemented with resource owner mechanism.
I have a doubt here, on sharing connection for each server. What if
there are simultaneous scan on the same plan? Say,
-> Nested Loop
-> Foreign Scan to table T1 on server A
-> Foreign Scan to table T2 on server A
Okay, you are thinking about Foreign Join, so example above is too
simple. But it is always possible to execute such a query if foreign
scan nodes are separated far, isn't it? As far as I see from your
explanation, scan T1 and scan T2 share the same connection. Now join
node scans one row from left (T1) while asking rows from right (T2)
without fetching all the rows from left. If T2 requests to server A,
the connection's result (of T1) is discarded. Am I understand
correctly?
Regards,
--
Hitoshi Harada
On Sat, Oct 29, 2011 at 12:25:46AM -0700, Hitoshi Harada wrote:
I have a doubt here, on sharing connection for each server. What if
there are simultaneous scan on the same plan? Say,-> Nested Loop
-> Foreign Scan to table T1 on server A
-> Foreign Scan to table T2 on server AOkay, you are thinking about Foreign Join, so example above is too
simple. But it is always possible to execute such a query if foreign
scan nodes are separated far, isn't it? As far as I see from your
explanation, scan T1 and scan T2 share the same connection. Now join
node scans one row from left (T1) while asking rows from right (T2)
without fetching all the rows from left. If T2 requests to server A,
the connection's result (of T1) is discarded. Am I understand
correctly?
This would need to be factored in in the cost calculations. For remote
servers there is an overhead per tuple transmitted. So in the above
case it might actually be quicker to do the nested loop locally.
To handle the parallel case you might need to materialise in the inner
loop, that would avoid the double scan. Or we could fix the protocol so
you can stream multiple queries at once.
Actually, you can already do this is you use DECLARE CURSOR for all the
queries upfront and then FETCH as needed. That way you can do it all
over one connection.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.
-- Arthur Schopenhauer
Hitoshi Harada <umi.tanuki@gmail.com> writes:
I have a doubt here, on sharing connection for each server. What if
there are simultaneous scan on the same plan? Say,
-> Nested Loop
-> Foreign Scan to table T1 on server A
-> Foreign Scan to table T2 on server A
Okay, you are thinking about Foreign Join, so example above is too
simple. But it is always possible to execute such a query if foreign
scan nodes are separated far, isn't it? As far as I see from your
explanation, scan T1 and scan T2 share the same connection. Now join
node scans one row from left (T1) while asking rows from right (T2)
without fetching all the rows from left. If T2 requests to server A,
the connection's result (of T1) is discarded. Am I understand
correctly?
I have not looked at the code, but ISTM the way that this has to work is
that you set up a portal for each active scan. Then you can fetch a few
rows at a time from any one of them.
If you're doing this through libpq, it'd be necessary to implement each
scan using a cursor. I'm not sure whether it'd be worth our time to
add more functions to libpq to allow more-direct access to the protocol
portal feature.
regards, tom lane