Join push-down support for foreign tables
Hi all,
In 2011 I proposed join push-down support for foreign tables, which
would improve performance of queries which contain join between
foreign tables in one server, but it has not finished before time-up.
This performance improvement would widen application range of foreign
tables, so I'd like to tackle the work again.
The descriptions below are based on previous discussions and additional studies.
Background
==========
At the moment FDWs can't handle join, so every join are processed on
local side even if the source relations are on the same server. It's
apparently inefficient to fetch possible rows from remote and join
them on local and waste some of them since join condition doesn't
match. If FDW (typically SQL-based FDWs like postgres_fdw) can get
control of JOIN operation, it would optimize queries for source tables
into a join query and avoid transfer of un-match rows.
With this improvement, most of joins in usual use, especially joins
between large foreign tables which don't match much, would become
remarkablly fast, for the reasons below.
a) less data transfer
Especially for inner joins, result of join is usually much smaller
than source tables. If the original target list doesn't contain join
keys, FDW might be able to omit from the SELECT list of remote queries
because they are only necessary on remote side.
b) more optimization on remote side
Join query would provide remote data source more optimization chances,
such as using index.
Changes expected
================
In the past development trial, these changes seem necessary at least.
(1) Add server oid field to RelOptInfo
This attribute is set only when the RelOptInfo is a joinrel, and all
underlying base relations are foreign tables and they have same server
oid. This field is set through join consideration from lower join
level to high (many tables) level, IOW from the bottom to the top. If
all base relations joined in a query are on same server, top
RelOptInfo which represents final output has valid server oid. In
such case, whole query could be pushed down to the server and user can
get most efficient result.
New helper function GetFdwRoutineByServerId(Oid serverid) which
returns FdwRoutine of given server oid would be handy.
(2) Add new path node for foreign join
New path node ForeignJoinPath, which inherits JoinPath like other join
path nodes, represents a join between ForeignPath or ForeignJoinPath.
ForeignJoinPath has fdw_private list to hold FDW-specific information
through the path consideration phase. This is similar to fdw_private
of ForeignPath path node.
This node cares only type of join such as INNER JOIN and LEFT OUTER
JOIN, but doesn't care how to do it. IOW foreign join is not special
case of existing join nodes such as nested loops, merge join and hash
join. FDW can implement a foreign join in arbitrary way, for
instance, file_fdw can have already-joined file for particular
combination for optimization, and postgres_fdw can generate a SELECT
query which contains JOIN clause and avoid essentially unnecessary
data transfer.
At the moment I'm not sure whether we should support SEMI/ANTI join in
the context of foreign join. It would require postgres_fdw (or other
SQL-based FDWs) to generate query with subquery connected with IN/NOT
IN clause, but it seems too far to head to in the first version.
We (and especially FDW authors) need to note that join push-down is
not the best way in some situations. In most cases OUTER JOIN
populates data on remote side more than current FDWs transfer,
especially for FULL OUTER JOIN and
CROSS JOIN (cartesian product).
(3) Add new plan node for foreign join
New plan node ForeignJoin, which inherits Join like other join plan
nodes. This node is similar to other join plan nodes such as
NestLoop, MergeJoin and HashJoin, but it delegates actual processing
to FDW associated to the server.
This means that new plan state node for ForeignJoin, say
ForeignJoinState, is also needed.
(4) Add new FDW API functions
Adding Join push-down support requires some functions to be added to
FdwRoutine to give control to FDWs.
a) GetForeignJoinPaths()
This allows FDWs to provide alternative join paths for a join
RelOptInfo. This is called from add_paths_to_joinrel() after
considering other join possibilities, and FDW should call add_path()
for each possible foreign join path. Foreign join paths are built
similarly to existing join paths, in a bottom-up manner.
FDWs may push ordered or unordered paths here, but combination of sort
keys would bloat up easily if FDW has no information about efficient
patterns such as remote indexes. FDW should not add too many paths to
prevent exponential overhead of join combination.
b) GetForeignJoinPlan()
This creates ForeignJoin plan node from ForeignJoinPath and other
planner infromation.
c) Executor functions for ForeignJoin plan node
A set of funcitons for executing ForeignJoin plan node is also needed.
Begin/ReScan/Iterate/End are basic operations of a plan node, so we
need to provide them for ForeignJoin node.
Issues
======
(1) Separate cost estimation phases?
For existing join paths, planner estimates their costs in two phaeses.
In the first phase initial_cost_foo(), here foo is one of
nestloop/mergejoin/hashjoin, produces lower-bound estimates for
elimination. The second phase is done for only promising paths which
passed add_path_precheck(), by final_cost_foo() for cost and result
size. I'm not sure that we need to follow this manner, since FDWs
would be able to estimate final cost/size with their own methods.
(2) How to reflect cost of transfer
Cost of transfer is dominant in foreign table operations, including
foreign scans. It would be nice to have some mechanism to reflect
actual time of transfer to the cost estimation. An idea is to have a
FDW option which represents cost factor of transfer, say
transfer_cost.
(3) SELECT-with-Join SQL generation in postgres_fdw
Probably Postgres-XC's shipping code would help us for implementing
deparse JOIN SQL, but I've not studied it fully, I'll continue the
study.
(4) criteria for push-down
It is assumed that FDWs can push joins down to remote when all foreign
tables are in same server. IMO a SERVER objects represents a logical
data source. For instance database for postgres_fdw and other
connection-based FDWs, and disk volumes (or directory?) for file_fdw.
Is this reasonable assumption?
Perhaps more issues would come out later, but I'd like to get comments
about the design.
(5) Terminology
I used "foreign join" as a process which joins foreign tables on
*remote* side, but is this enough intuitive? Another idea is using
"remote join", is this more appropriate for this kind of process? I
hesitate to use "remote join" because it implies client-server FDWs,
but foreign join is not limited to such FDWs, e.g. file_fdw can have
extra file which is already joined files accessed via foreign tables.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 3, 2014 at 5:16 AM, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
In 2011 I proposed join push-down support for foreign tables, which
would improve performance of queries which contain join between
foreign tables in one server, but it has not finished before time-up.
This performance improvement would widen application range of foreign
tables, so I'd like to tackle the work again.The descriptions below are based on previous discussions and additional studies.
Hanada-san, it is fantastic to see you working on this again.
I think your proposal sounds promising and it is along the lines of
what I have considered in the past.
(1) Separate cost estimation phases?
For existing join paths, planner estimates their costs in two phaeses.
In the first phase initial_cost_foo(), here foo is one of
nestloop/mergejoin/hashjoin, produces lower-bound estimates for
elimination. The second phase is done for only promising paths which
passed add_path_precheck(), by final_cost_foo() for cost and result
size. I'm not sure that we need to follow this manner, since FDWs
would be able to estimate final cost/size with their own methods.
The main problem I see here is that accurate costing may require a
round-trip to the remote server. If there is only one path that is
probably OK; the cost of asking the question will usually be more than
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query. But if there are many paths,
for example because there are multiple sets of useful pathkeys, it
might start to get a bit expensive.
Probably both the initial cost and final cost calculations should be
delegated to the FDW, but maybe within postgres_fdw, the initial cost
should do only the work that can be done without contacting the remote
server; then, let the final cost step do that if appropriate. But I'm
not entirely sure what is best here.
(2) How to reflect cost of transfer
Cost of transfer is dominant in foreign table operations, including
foreign scans. It would be nice to have some mechanism to reflect
actual time of transfer to the cost estimation. An idea is to have a
FDW option which represents cost factor of transfer, say
transfer_cost.
That would be reasonable. I assume users would normally wish to
specify this per-server, and the default should be something
reasonable for a LAN.
(4) criteria for push-down
It is assumed that FDWs can push joins down to remote when all foreign
tables are in same server. IMO a SERVER objects represents a logical
data source. For instance database for postgres_fdw and other
connection-based FDWs, and disk volumes (or directory?) for file_fdw.
Is this reasonable assumption?
I think it's probably good to give an FDW the option of producing a
ForeignJoinPath for any join against a ForeignPath *or
ForeignJoinPath* for the same FDW. It's perhaps unlikely that an FDW
can perform a join efficiently between two data sources with different
server definitions, but why not give it the option? It should be
pretty fast for the FDW to realize, oh, the server OIDs don't match -
and at that point it can exit without doing anything further if that
seems desirable. And there might be some kinds of data sources where
cross-server joins actually can be executed quickly (e.g. when the
underlying data is just in two files in different places on the local
machine).
(5) Terminology
I used "foreign join" as a process which joins foreign tables on
*remote* side, but is this enough intuitive? Another idea is using
"remote join", is this more appropriate for this kind of process? I
hesitate to use "remote join" because it implies client-server FDWs,
but foreign join is not limited to such FDWs, e.g. file_fdw can have
extra file which is already joined files accessed via foreign tables.
Foreign join is perfect.
As I alluded to above, it's pretty important to make sure that this
works with large join trees; that is, if I join four foreign tables, I
don't want it to push down a join between two of the tables and a join
between the other two tables and then join the results of those joins
locally. Instead, I want to push the entire join tree to the foreign
server and execute the whole thing there. Some care may be needed in
designing the hooks to make sure this works as desired.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
The main problem I see here is that accurate costing may require a
round-trip to the remote server. If there is only one path that is
probably OK; the cost of asking the question will usually be more than
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query. But if there are many paths,
for example because there are multiple sets of useful pathkeys, it
might start to get a bit expensive.Probably both the initial cost and final cost calculations should be
delegated to the FDW, but maybe within postgres_fdw, the initial cost
should do only the work that can be done without contacting the remote
server; then, let the final cost step do that if appropriate. But I'm
not entirely sure what is best here.
I am thinking eventually we will need to cache the foreign server
statistics on the local server.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thursday, September 4, 2014, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep 4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
The main problem I see here is that accurate costing may require a
round-trip to the remote server. If there is only one path that is
probably OK; the cost of asking the question will usually be more than
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query. But if there are many paths,
for example because there are multiple sets of useful pathkeys, it
might start to get a bit expensive.Probably both the initial cost and final cost calculations should be
delegated to the FDW, but maybe within postgres_fdw, the initial cost
should do only the work that can be done without contacting the remote
server; then, let the final cost step do that if appropriate. But I'm
not entirely sure what is best here.I am thinking eventually we will need to cache the foreign server
statistics on the local server.
Wouldn't that lead to issues where the statistics get outdated and we have
to anyways query the foreign server before planning any joins? Or are you
thinking of dropping the foreign table statistics once the foreign join is
complete?
Regards,
Atri
--
Regards,
Atri
*l'apprenant*
On Thu, Sep 4, 2014 at 08:41:43PM +0530, Atri Sharma wrote:
On Thursday, September 4, 2014, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep� 4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
The main problem I see here is that accurate costing may require a
round-trip to the remote server.� If there is only one path that is
probably OK; the cost of asking the question will usually be more than
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query.� But if there are many paths,
for example because there are multiple sets of useful pathkeys, it
might start to get a bit expensive.Probably both the initial cost and final cost calculations should be
delegated to the FDW, but maybe within postgres_fdw, the initial cost
should do only the work that can be done without contacting the remote
server; then, let the final cost step do that if appropriate.� But I'm
not entirely sure what is best here.I am thinking eventually we will need to cache the foreign server
statistics on the local server.Wouldn't that lead to issues where the statistics get outdated and we have to
anyways query the foreign server before planning any joins?�Or are you thinking
of dropping the foreign table statistics once the foreign join is complete?
I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server. I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 9:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep 4, 2014 at 08:41:43PM +0530, Atri Sharma wrote:
On Thursday, September 4, 2014, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep 4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
The main problem I see here is that accurate costing may require a
round-trip to the remote server. If there is only one path that is
probably OK; the cost of asking the question will usually be morethan
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query. But if there are manypaths,
for example because there are multiple sets of useful pathkeys, it
might start to get a bit expensive.Probably both the initial cost and final cost calculations should
be
delegated to the FDW, but maybe within postgres_fdw, the initial
cost
should do only the work that can be done without contacting the
remote
server; then, let the final cost step do that if appropriate. But
I'm
not entirely sure what is best here.
I am thinking eventually we will need to cache the foreign server
statistics on the local server.Wouldn't that lead to issues where the statistics get outdated and we
have to
anyways query the foreign server before planning any joins? Or are you
thinking
of dropping the foreign table statistics once the foreign join is
complete?
I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server. I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.
That could lead to some bloat in storing statistics since we may have a lot
of tables for a lot of foreign servers. Also, will we have VACUUM look at
ANALYZING the foreign tables?
Also, how will we decide that the statistics are invalid? Will we have the
FDW query the foreign server and do some sort of comparison between the
statistics the foreign server has and the statistics we locally have? I am
trying to understand how the idea of invalidation message from foreign
server will work.
Regards,
Atri
On Thu, Sep 4, 2014 at 09:31:20PM +0530, Atri Sharma wrote:
I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server.� I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.That could lead to some bloat in storing statistics since we may have a lot of
tables for a lot of foreign servers. Also, will we have VACUUM look at
ANALYZING the foreign tables?
Also, how will we decide that the statistics are invalid? Will we have the FDW
query the foreign server and do some sort of comparison between the statistics
the foreign server has and the statistics we locally have? I am trying to
understand how the idea of invalidation message from foreign server will work.
Well, ANALYZING is running on the foreign server, and somehow it would
be nice if it would send a message to us about its new statistics, or we
can do it like http does and it gives us a last-refresh statistics date
when we connect.
I am not sure how it will work --- I am just suspecting that we might
get to a point where the statistics lookup overhead on the foreign
server might become a bottleneck.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 9:33 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep 4, 2014 at 09:31:20PM +0530, Atri Sharma wrote:
I am thinking we would eventually have to cache the statistics, then
get
some kind of invalidation message from the foreign server. I am also
thinking that cache would have to be global across all backends, Iguess
similar to our invalidation cache.
That could lead to some bloat in storing statistics since we may have a
lot of
tables for a lot of foreign servers. Also, will we have VACUUM look at
ANALYZING the foreign tables?Also, how will we decide that the statistics are invalid? Will we have
the FDW
query the foreign server and do some sort of comparison between the
statistics
the foreign server has and the statistics we locally have? I am trying to
understand how the idea of invalidation message from foreign server willwork.
Well, ANALYZING is running on the foreign server, and somehow it would
be nice if it would send a message to us about its new statistics, or we
can do it like http does and it gives us a last-refresh statistics date
when we connect.
Not sure how that would work without changing the way ANALYZE works on the
foreign server. http idea could work,though.
I am not sure how it will work --- I am just suspecting that we might
get to a point where the statistics lookup overhead on the foreign
server might become a bottleneck.
Totally agree, but doing the planning only locally opens the questions I
mentioned above, and also deprives the foreign server database to do any
optimizations that it may want to do (assuming that the foreign database
and postgres query planner do not generate identical plans). This is only
my thought though, we could also be planning better than the foreign server
database, so the optimization part I raised is debatable.
Regards,
Atri
--
Regards,
Atri
*l'apprenant*
On Thu, Sep 4, 2014 at 11:56 AM, Bruce Momjian <bruce@momjian.us> wrote:
I am thinking eventually we will need to cache the foreign server
statistics on the local server.Wouldn't that lead to issues where the statistics get outdated and we have to
anyways query the foreign server before planning any joins? Or are you thinking
of dropping the foreign table statistics once the foreign join is complete?I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server. I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.
Maybe ... but I think this isn't really related to the ostensible
topic of this thread. We can do join pushdown just fine without the
ability to do anything like this.
I'm in full agreement that we should probably have a way to cache some
kind of statistics locally, but making that work figures to be tricky,
because (as I'm pretty sure Tom has pointed out before) there's no
guarantee that the remote side's statistics look anything like
PostgreSQL statistics, and so we might not be able to easily store
them or make sense of them. But it would be nice to at least have the
option to store such statistics if they do happen to be something we
can store and interpret.
It's also coming to seem to me more and more that we need a way to
designate several PostgreSQL machines as a cooperating cluster. This
would mean they'd keep connections to each other open and notify each
other about significant events, which could include "hey, I updated
the statistics on this table, you might want to get the new ones" or
"hey, i've replicated your definition for function X so it's safe to
push it down now" as well as "hey, I have just been promoted to be the
new master" or even automatic negotiation of which of a group of
machines should become the master after a server failure. So far,
we've taken the approach that postgres_fdw is just another FDW which
enjoys no special privileges, and I think that's a good approach on
the whole, but think if we want to create a relatively seamless
multi-node experience as some of the NoSQL databases do, we're going
to need something more than that.
But all of that is a bit pie in the sky, and the join pushdown
improvements we're talking about here don't necessitate any of it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 5, 2014 at 2:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 4, 2014 at 11:56 AM, Bruce Momjian <bruce@momjian.us> wrote:
I am thinking eventually we will need to cache the foreign server
statistics on the local server.Wouldn't that lead to issues where the statistics get outdated and we
have to
anyways query the foreign server before planning any joins? Or are you
thinking
of dropping the foreign table statistics once the foreign join is
complete?
I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server. I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.Maybe ... but I think this isn't really related to the ostensible
topic of this thread. We can do join pushdown just fine without the
ability to do anything like this.I'm in full agreement that we should probably have a way to cache some
kind of statistics locally, but making that work figures to be tricky,
because (as I'm pretty sure Tom has pointed out before) there's no
guarantee that the remote side's statistics look anything like
PostgreSQL statistics, and so we might not be able to easily store
them or make sense of them. But it would be nice to at least have the
option to store such statistics if they do happen to be something we
can store and interpret.
I agree that we need local statistics too (full agreement to Bruce's
proposal) but playing the Devil's advocate here and trying to figure how
will things like invalidation and as you mentioned, cross compatibility
work.
It's also coming to seem to me more and more that we need a way to
designate several PostgreSQL machines as a cooperating cluster. This
would mean they'd keep connections to each other open and notify each
other about significant events, which could include "hey, I updated
the statistics on this table, you might want to get the new ones" or
"hey, i've replicated your definition for function X so it's safe to
push it down now" as well as "hey, I have just been promoted to be the
new master" or even automatic negotiation of which of a group of
machines should become the master after a server failure.
Thats a brilliant idea, and shouldnt be too much of a problem. One race
condition that is possible is that multiple backend may try to globally
propagate different statistics of the same table, but I think that any
standard logical ordering algorithm should handle that. Also, the automatic
master promotion seems like a brilliant idea and is also great since we
have time tested standard algorithms for that.
One thing I would like to see is that assuming all the interacting nodes do
not have identical schemas, if we can somehow maintain cross node
statistics and use them for planning cross node joins. That would lead to
similar problems as the ones already noted for having local statistics for
foreign databases, but if we solve those anyways for storing local
statistics, we could potentially look at having cross node relation
statistics as well.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Regards,
Atri
*l'apprenant*
(2014/09/04 21:37), Robert Haas wrote:> On Wed, Sep 3, 2014 at 5:16 AM,
Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
(1) Separate cost estimation phases?
For existing join paths, planner estimates their costs in two phaeses.
In the first phase initial_cost_foo(), here foo is one of
nestloop/mergejoin/hashjoin, produces lower-bound estimates for
elimination. The second phase is done for only promising paths which
passed add_path_precheck(), by final_cost_foo() for cost and result
size. I'm not sure that we need to follow this manner, since FDWs
would be able to estimate final cost/size with their own methods.The main problem I see here is that accurate costing may require a
round-trip to the remote server. If there is only one path that is
probably OK; the cost of asking the question will usually be more than
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query. But if there are many paths,
for example because there are multiple sets of useful pathkeys, it
might start to get a bit expensive.
I agree that requiring round-trip per path is unbearable, so main source
of plan cost should be local statistics gathered by ANALYZE command. If
an FDW needs extra information for planning, it should obtain that from
FDW options or its private catalogs to avoid undesirable round-trips.
FDWs like postgres_fdw would want to optimize plan by providing paths
with pathkeys (not only use remote index, but it also allows MergeJoin
at upper level),
I noticed that order of join considering is an issue too. Planner
compares currently-cheapest path to newly generated path, and mostly
foreign join path would be the cheapest, so considering foreign join
would reduce planner overhead.
Probably both the initial cost and final cost calculations should be
delegated to the FDW, but maybe within postgres_fdw, the initial cost
should do only the work that can be done without contacting the remote
server; then, let the final cost step do that if appropriate. But I'm
not entirely sure what is best here.
Agreed. I'll design planner API along that way for now.
(2) How to reflect cost of transfer
Cost of transfer is dominant in foreign table operations, including
foreign scans. It would be nice to have some mechanism to reflect
actual time of transfer to the cost estimation. An idea is to have a
FDW option which represents cost factor of transfer, say
transfer_cost.That would be reasonable. I assume users would normally wish to
specify this per-server, and the default should be something
reasonable for a LAN.
This enhancement could be applied separately from foreign join patch.
(4) criteria for push-down
It is assumed that FDWs can push joins down to remote when all foreign
tables are in same server. IMO a SERVER objects represents a logical
data source. For instance database for postgres_fdw and other
connection-based FDWs, and disk volumes (or directory?) for file_fdw.
Is this reasonable assumption?I think it's probably good to give an FDW the option of producing a
ForeignJoinPath for any join against a ForeignPath *or
ForeignJoinPath* for the same FDW. It's perhaps unlikely that an FDW
can perform a join efficiently between two data sources with different
server definitions, but why not give it the option? It should be
pretty fast for the FDW to realize, oh, the server OIDs don't match -
and at that point it can exit without doing anything further if that
seems desirable. And there might be some kinds of data sources where
cross-server joins actually can be executed quickly (e.g. when the
underlying data is just in two files in different places on the local
machine).
Indeed how to separate servers is left to users, or author of FDWs,
though postgres_fdw and most of other FDWs can join foreign tables in a
server. I think it would be good if we can know two foreign tables are
managed by same FDW, from FdwRoutine, maybe adding new API which returns
FDW identifier?
(5) Terminology
I used "foreign join" as a process which joins foreign tables on
*remote* side, but is this enough intuitive? Another idea is using
"remote join", is this more appropriate for this kind of process? I
hesitate to use "remote join" because it implies client-server FDWs,
but foreign join is not limited to such FDWs, e.g. file_fdw can have
extra file which is already joined files accessed via foreign tables.Foreign join is perfect.
As I alluded to above, it's pretty important to make sure that this
works with large join trees; that is, if I join four foreign tables, I
don't want it to push down a join between two of the tables and a join
between the other two tables and then join the results of those joins
locally. Instead, I want to push the entire join tree to the foreign
server and execute the whole thing there. Some care may be needed in
designing the hooks to make sure this works as desired.
I think so too, so ForeignJoinPath should be able to be an input of
another ForeignJoinPath in upper join level. But I also think joining
on remote or not should be decided based on cost, as existing joins are
planned with bottom-up approach.
Regards,
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2014/09/05 0:56), Bruce Momjian wrote:> On Thu, Sep 4, 2014 at
08:41:43PM +0530, Atri Sharma wrote:
On Thursday, September 4, 2014, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep 4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
The main problem I see here is that accurate costing may
require a
round-trip to the remote server. If there is only one path
that is
probably OK; the cost of asking the question will usually be
more than
paid for by hearing that the pushed-down join clobbers the other
possible methods of executing the query. But if there are
many paths,
for example because there are multiple sets of useful
pathkeys, it
might start to get a bit expensive.
Probably both the initial cost and final cost calculations
should be
delegated to the FDW, but maybe within postgres_fdw, the
initial cost
should do only the work that can be done without contacting
the remote
server; then, let the final cost step do that if appropriate.
But I'm
not entirely sure what is best here.
I am thinking eventually we will need to cache the foreign server
statistics on the local server.Wouldn't that lead to issues where the statistics get outdated and
we have to
anyways query the foreign server before planning any joins? Or are
you thinking
of dropping the foreign table statistics once the foreign join is
complete?
I am thinking we would eventually have to cache the statistics, then get
some kind of invalidation message from the foreign server. I am also
thinking that cache would have to be global across all backends, I guess
similar to our invalidation cache.
If a FDW needs to know more information than pg_statistics and pg_class
have, yes, it should cache some statistics on the local side. But such
statistics would have FDW-specific shape so it would be hard to have API
to manage. FDW can have their own functions and tables to manage their
own statistics, and it can have even background-worker for messaging.
But it would be another story.
Regards,
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Sep 7, 2014 at 7:07 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
I think it's probably good to give an FDW the option of producing a
ForeignJoinPath for any join against a ForeignPath *or
ForeignJoinPath* for the same FDW. It's perhaps unlikely that an FDW
can perform a join efficiently between two data sources with different
server definitions, but why not give it the option? It should be
pretty fast for the FDW to realize, oh, the server OIDs don't match -
and at that point it can exit without doing anything further if that
seems desirable. And there might be some kinds of data sources where
cross-server joins actually can be executed quickly (e.g. when the
underlying data is just in two files in different places on the local
machine).Indeed how to separate servers is left to users, or author of FDWs, though
postgres_fdw and most of other FDWs can join foreign tables in a server. I
think it would be good if we can know two foreign tables are managed by same
FDW, from FdwRoutine, maybe adding new API which returns FDW identifier?
Do we need this? I mean, if you get the FdwRoutine, don't you have
the OID of the FDW or the foreign table also?
I think so too, so ForeignJoinPath should be able to be an input of another
ForeignJoinPath in upper join level. But I also think joining on remote or
not should be decided based on cost, as existing joins are planned with
bottom-up approach.
Definitely.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-09-08 8:07 GMT+09:00 Shigeru HANADA <shigeru.hanada@gmail.com>:
(2014/09/04 21:37), Robert Haas wrote:> On Wed, Sep 3, 2014 at 5:16 AM,
Probably both the initial cost and final cost calculations should be
delegated to the FDW, but maybe within postgres_fdw, the initial cost
should do only the work that can be done without contacting the remote
server; then, let the final cost step do that if appropriate. But I'm
not entirely sure what is best here.Agreed. I'll design planner API along that way for now.
I tried some patterns of implementation but I've not gotten feasible
way yet. So I'd like to hear hackers' idea.
* Foreign join hook point
First I thought that following existing cost estimating manner is the
way to go, but I tend to think it doesn't fit foreign joins because
join method is tightly-coupled to sort-ness, but foreign join would
not.
In current planner, add_paths_to_joinrel is conscious of sort-ness,
and functions directly called from it are conscious of join method.
But this seems not fit the abstraction level of FDW. FDW is highly
abstracted, say differ from custom plan providers, so most of work
should be delegated to FDW, including pathkeys consideration, IMO.
Besides that, order of join consideration is another issue. First I
try to add foreign join consideration at the last (after hash join
consideration), but after some thought I noticed that
early-elimination would work better if we try foreign join first,
because in most cases foreign join is the cheapest way to accomplish a
join between two foreign relations.
So currently I'm thinking that delegating whole join consideration to
FDWs before other join consideration in add_paths_to_joinrel, by
calling new FDW API would be promising.
This means that FDWs can add multiple arbitrary paths to RelOptInfo in
a call. Of course this allows FDWs to do round-trip per path, but it
would be optimization issue, and they can compare their own
candidates they can get without round-trip.
* Supported join types
INNER and (LEFT|RIGHT|FULL) OUTER would be safe to push down, even
though some of OUTER JOIN might not be much faster than local join.
I'm not sure that SEMI and ANTI joins are safe to push-down. Can we
leave the matter to FDWs, or should we forbid FDWs pushing down by not
calling foreign join API? Anyway SEMI/ANTI would not be supported in
the first version.
* Blockers of join push-down
Pushing down join means that foreign scans for inner and outer are
skipped, so some elements blocks pushing down. Basically the criteria
is same as scan push-down and update push-down.
After some thoughts, we should check only unsafe expression in join
qual and restrict qual. This limitation is necessary to avoid
difference between results of pushe-down or not. Target list seems to
contain only Var for necessary columns, but we should check that too.
* WIP patch
Attached is WIP patch for reviewing the design. Works should be done
are 1) judging push-down or not, and 2) generating join SQL. For 2),
I'm thinking about referring Postgres-XC's join shipping mechanism.
Any comments or questions are welcome.
--
Shigeru HANADA
Attachments:
join_pushdown_wip.patchapplication/octet-stream; name=join_pushdown_wip.patchDownload
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4c49776..91a98fa 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -287,6 +287,15 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -367,6 +376,13 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPath = postgresGetForeignJoinPath;
+ routine->BeginForeignJoin = NULL;
+ routine->ReScanForeignJoin = NULL;
+ routine->IterateForeignJoin = NULL;
+ routine->EndForeignJoin = NULL;
+
PG_RETURN_POINTER(routine);
}
@@ -2832,6 +2848,49 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * postgresGetForeignJoinPath
+ * Add possible ForeignJoinPath to joinrel.
+ *
+ */
+static void
+postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ListCell *lc;
+
+ foreach(lc, outerrel->pathlist)
+ {
+ Path *outerpath = (Path *) lfirst(lc);
+ Path *innerpath = innerrel->cheapest_total_path;
+ ForeignJoinPath *joinpath;
+ Relids required_outer;
+
+ required_outer = calc_non_nestloop_required_outer(outerpath, innerpath);
+ joinpath = create_foreignjoin_path(root,
+ joinrel,
+ jointype,
+ sjinfo,
+ semifactors,
+ outerpath,
+ innerpath,
+ restrictlist,
+ NIL,
+ required_outer);
+
+ /* TODO generate SQL for the join and store it into fdw_private. */
+ /* TODO determine cost and rows of the join. */
+ /* TODO add generated path into joinrel by add_path(). */
+ }
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 781a736..8559e3f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -900,6 +900,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
pname = "Hash"; /* "Join" gets added by jointype switch */
sname = "Hash Join";
break;
+ case T_ForeignJoin:
+ pname = "Foreign"; /* "Join" gets added by jointype switch */
+ sname = "Foreign Join";
+ break;
case T_SeqScan:
pname = sname = "Seq Scan";
break;
@@ -1090,6 +1094,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_NestLoop:
case T_MergeJoin:
case T_HashJoin:
+ case T_ForeignJoin:
{
const char *jointype;
@@ -1390,6 +1395,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_instrumentation_count("Rows Removed by Filter", 2,
planstate, es);
break;
+ case T_ForeignJoin:
+ /* TODO: add FDW-specific EXPLAIN information */
+ show_upper_qual(((ForeignJoin *) plan)->join.joinqual,
+ "Join Filter", planstate, ancestors, es);
+ if (((ForeignJoin *) plan)->join.joinqual)
+ show_instrumentation_count("Rows Removed by Join Filter", 1,
+ planstate, es);
+ show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+ if (plan->qual)
+ show_instrumentation_count("Rows Removed by Filter", 2,
+ planstate, es);
+ break;
case T_Agg:
show_agg_keys((AggState *) planstate, ancestors, es);
show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 6081b56..308cc99 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -24,6 +24,7 @@ OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \
nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \
nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \
- nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o
+ nodeForeignscan.o nodeWindowAgg.o tstoreReceiver.o spi.o \
+ nodeForeignjoin.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 640964c..6b6d230 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -21,6 +21,7 @@
#include "executor/nodeBitmapIndexscan.h"
#include "executor/nodeBitmapOr.h"
#include "executor/nodeCtescan.h"
+#include "executor/nodeForeignjoin.h"
#include "executor/nodeForeignscan.h"
#include "executor/nodeFunctionscan.h"
#include "executor/nodeGroup.h"
@@ -209,6 +210,10 @@ ExecReScan(PlanState *node)
ExecReScanHashJoin((HashJoinState *) node);
break;
+ case T_ForeignJoinState:
+ ExecReScanForeignJoin((ForeignJoinState *) node);
+ break;
+
case T_MaterialState:
ExecReScanMaterial((MaterialState *) node);
break;
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index c0189eb..e7d9a6e 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -85,6 +85,7 @@
#include "executor/nodeBitmapIndexscan.h"
#include "executor/nodeBitmapOr.h"
#include "executor/nodeCtescan.h"
+#include "executor/nodeForeignjoin.h"
#include "executor/nodeForeignscan.h"
#include "executor/nodeFunctionscan.h"
#include "executor/nodeGroup.h"
@@ -262,6 +263,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
estate, eflags);
break;
+ case T_ForeignJoin:
+ result = (PlanState *) ExecInitForeignJoin((ForeignJoin *) node,
+ estate, eflags);
+ break;
+
/*
* materialization nodes
*/
@@ -457,6 +463,10 @@ ExecProcNode(PlanState *node)
result = ExecHashJoin((HashJoinState *) node);
break;
+ case T_ForeignJoinState:
+ result = ExecForeignJoin((ForeignJoinState *) node);
+ break;
+
/*
* materialization nodes
*/
@@ -693,6 +703,10 @@ ExecEndNode(PlanState *node)
ExecEndHashJoin((HashJoinState *) node);
break;
+ case T_ForeignJoinState:
+ ExecEndForeignJoin((ForeignJoinState *) node);
+ break;
+
/*
* materialization nodes
*/
diff --git a/src/backend/executor/nodeForeignjoin.c b/src/backend/executor/nodeForeignjoin.c
new file mode 100644
index 0000000..fbcdbc3
--- /dev/null
+++ b/src/backend/executor/nodeForeignjoin.c
@@ -0,0 +1,176 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeForeignjoin.c
+ * routines to support foreign joins
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/nodeForeignjoin.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ * ExecForeignJoin - process a foreign join of two foreign plans
+ * ExecInitForeignJoin - initialize the join
+ * ExecEndForeignJoin - shut down the join
+ */
+
+#include "postgres.h"
+
+#include "executor/executor.h"
+#include "executor/nodeForeignjoin.h"
+#include "foreign/fdwapi.h"
+#include "utils/memutils.h"
+
+
+/* ----------------------------------------------------------------
+ * ExecForeignJoin(node)
+ *
+ * Returns the tuple joined from inner and outer tuples which
+ * satisfies the qualification clause.
+ *
+ * It delegates actual join processing to the foreign data wrapper
+ * associsated with the foreign tables used in the join subtree.
+ * Basically this node is very similar to ForeignScan except ForeignJoin
+ * holds subplans of inner and outer relations.
+ *
+ * NULL is returned if all the remaining tuples are consumed.
+ *
+ * Basically this funcitons is very similar to ForeignNext called from
+ * ExecForeignScan.
+ *
+ * ----------------------------------------------------------------
+ */
+TupleTableSlot *
+ExecForeignJoin(ForeignJoinState *node)
+{
+ TupleTableSlot *slot;
+ ExprContext *econtext = node->js.ps.ps_ExprContext;
+ MemoryContext oldcontext;
+
+ /* Call the Iterate function in short-lived context */
+ oldcontext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+ slot = node->fdwroutine->IterateForeignJoin(node);
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * We don't force the slot into the "materialized" state, unlike
+ * ExecForeignScan, because no system attribute is valid in joined result
+ * tuple.
+ */
+ return slot;
+}
+
+/* ----------------------------------------------------------------
+ * ExecInitForeignJoin
+ * ----------------------------------------------------------------
+ */
+ForeignJoinState *
+ExecInitForeignJoin(ForeignJoin *node, EState *estate, int eflags)
+{
+ ForeignJoinState *fjstate;
+ FdwRoutine *fdwroutine;
+
+ /* check for unsupported flags */
+ Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
+
+ /*
+ * create state structure
+ */
+ fjstate = makeNode(ForeignJoinState);
+ fjstate->js.ps.plan = (Plan *) node;
+ fjstate->js.ps.state = estate;
+
+ /*
+ * Miscellaneous initialization
+ *
+ * create expression context for node
+ */
+ ExecAssignExprContext(estate, &fjstate->js.ps);
+
+ /*
+ * initialize child expressions
+ */
+ fjstate->js.ps.targetlist = (List *)
+ ExecInitExpr((Expr *) node->join.plan.targetlist,
+ (PlanState *) fjstate);
+ fjstate->js.ps.qual = (List *)
+ ExecInitExpr((Expr *) node->join.plan.qual,
+ (PlanState *) fjstate);
+ fjstate->js.jointype = node->join.jointype;
+ fjstate->js.joinqual = (List *)
+ ExecInitExpr((Expr *) node->join.joinqual,
+ (PlanState *) fjstate);
+
+ /*
+ * tuple table initialization
+ */
+ ExecInitResultTupleSlot(estate, &fjstate->js.ps);
+
+ /*
+ * initialize tuple type and projection info
+ */
+ ExecAssignResultTypeFromTL(&fjstate->js.ps);
+ ExecAssignProjectionInfo(&fjstate->js.ps, NULL);
+
+ /*
+ * Acquire function pointers from the FDW's handler, and init fdw_state.
+ */
+ fdwroutine = GetFdwRoutineByServerId(node->serverid);
+ fjstate->fdwroutine = fdwroutine;
+ fjstate->fdw_state = NULL;
+
+ /*
+ * Tell the FDW to initiate the join.
+ */
+ fdwroutine->BeginForeignJoin(fjstate, eflags);
+
+ /*
+ * finally, wipe the current outer tuple clean.
+ */
+ fjstate->js.ps.ps_TupFromTlist = false;
+
+ return fjstate;
+}
+
+/* ----------------------------------------------------------------
+ * ExecEndForeignJoin
+ *
+ * closes down scans and frees allocated storage
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndForeignJoin(ForeignJoinState *node)
+{
+ /*
+ * Free the exprcontext
+ */
+ ExecFreeExprContext(&node->js.ps);
+
+ /*
+ * clean out the tuple table
+ */
+ ExecClearTuple(node->js.ps.ps_ResultTupleSlot);
+
+ /*
+ * Tell the FDW that the join is done.
+ */
+ node->fdwroutine->EndForeignJoin(node);
+}
+
+/* ----------------------------------------------------------------
+ * ExecReScanForeignJoin
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanForeignJoin(ForeignJoinState *node)
+{
+ /*
+ * Tell the FDW to rewind the join.
+ */
+ node->fdwroutine->ReScanForeignJoin(node);
+}
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 4f5f6ae..9fd1069 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -250,6 +250,29 @@ GetForeignTable(Oid relid)
/*
+ * GetForeignTableServerOid - Get OID of the server related to the given
+ * foreign table.
+ */
+Oid
+GetForeignTableServerOid(Oid relid)
+{
+ Form_pg_foreign_table tableform;
+ HeapTuple tp;
+ Oid serverid;
+
+ tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for foreign table %u", relid);
+ tableform = (Form_pg_foreign_table) GETSTRUCT(tp);
+ serverid = tableform->ftserver;
+
+ ReleaseSysCache(tp);
+
+ return serverid;
+}
+
+
+/*
* GetForeignColumnOptions - Get attfdwoptions of given relation/attnum
* as list of DefElem.
*/
@@ -311,12 +334,8 @@ FdwRoutine *
GetFdwRoutineByRelId(Oid relid)
{
HeapTuple tp;
- Form_pg_foreign_data_wrapper fdwform;
- Form_pg_foreign_server serverform;
Form_pg_foreign_table tableform;
Oid serverid;
- Oid fdwid;
- Oid fdwhandler;
/* Get server OID for the foreign table. */
tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
@@ -326,6 +345,18 @@ GetFdwRoutineByRelId(Oid relid)
serverid = tableform->ftserver;
ReleaseSysCache(tp);
+ return GetFdwRoutineByServerId(serverid);
+}
+
+FdwRoutine *
+GetFdwRoutineByServerId(Oid serverid)
+{
+ HeapTuple tp;
+ Form_pg_foreign_data_wrapper fdwform;
+ Form_pg_foreign_server serverform;
+ Oid fdwid;
+ Oid fdwhandler;
+
/* Get foreign-data wrapper OID for the server. */
tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(serverid));
if (!HeapTupleIsValid(tp))
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 225756c..c8113ab 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -703,6 +703,27 @@ _copyHashJoin(const HashJoin *from)
return newnode;
}
+/*
+ * _copyForeignJoin
+ */
+static ForeignJoin *
+_copyForeignJoin(const ForeignJoin *from)
+{
+ ForeignJoin *newnode = makeNode(ForeignJoin);
+
+ /*
+ * copy node superclass fields
+ */
+ CopyJoinFields((const Join *) from, (Join *) newnode);
+
+ /*
+ * copy remainder of node
+ */
+ COPY_NODE_FIELD(fdw_private);
+
+ return newnode;
+}
+
/*
* _copyMaterial
@@ -4054,6 +4075,9 @@ copyObject(const void *from)
case T_HashJoin:
retval = _copyHashJoin(from);
break;
+ case T_ForeignJoin:
+ retval = _copyForeignJoin(from);
+ break;
case T_Material:
retval = _copyMaterial(from);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1ff78eb..82c8624 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -623,6 +623,16 @@ _outHashJoin(StringInfo str, const HashJoin *node)
}
static void
+_outForeignJoin(StringInfo str, const ForeignJoin *node)
+{
+ WRITE_NODE_TYPE("FOREIGNJOIN");
+
+ _outJoinPlanInfo(str, (const Join *) node);
+
+ WRITE_NODE_FIELD(fdw_private);
+}
+
+static void
_outAgg(StringInfo str, const Agg *node)
{
int i;
@@ -1668,6 +1678,16 @@ _outHashPath(StringInfo str, const HashPath *node)
}
static void
+_outForeignJoinPath(StringInfo str, const ForeignJoinPath *node)
+{
+ WRITE_NODE_TYPE("FOREIGNJOINPATH");
+
+ _outJoinPathInfo(str, (const JoinPath *) node);
+
+ WRITE_NODE_FIELD(fdw_private);
+}
+
+static void
_outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
{
WRITE_NODE_TYPE("PLANNERGLOBAL");
@@ -1766,6 +1786,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(subplan);
WRITE_NODE_FIELD(subroot);
WRITE_NODE_FIELD(subplan_params);
+ WRITE_OID_FIELD(serverid);
/* we don't try to print fdwroutine or fdw_private */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_NODE_FIELD(joininfo);
@@ -2864,6 +2885,9 @@ _outNode(StringInfo str, const void *obj)
case T_HashJoin:
_outHashJoin(str, obj);
break;
+ case T_ForeignJoin:
+ _outForeignJoin(str, obj);
+ break;
case T_Agg:
_outAgg(str, obj);
break;
@@ -3084,6 +3108,9 @@ _outNode(StringInfo str, const void *obj)
case T_HashPath:
_outHashPath(str, obj);
break;
+ case T_ForeignJoinPath:
+ _outForeignJoinPath(str, obj);
+ break;
case T_PlannerGlobal:
_outPlannerGlobal(str, obj);
break;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index be54f3d..faadefc 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,6 +17,7 @@
#include <math.h>
#include "executor/executor.h"
+#include "foreign/fdwapi.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -50,7 +51,6 @@ static List *select_mergejoin_clauses(PlannerInfo *root,
JoinType jointype,
bool *mergejoin_allowed);
-
/*
* add_paths_to_joinrel
* Given a join relation and two component rels from which it can be made,
@@ -207,7 +207,26 @@ add_paths_to_joinrel(PlannerInfo *root,
extra_lateral_rels = NULL;
/*
- * 1. Consider mergejoin paths where both relations must be explicitly
+ * 1. Consider foreignjoin paths when both outer and inner relations are
+ * managed by same foreign-data wrapper. This is done preceding to any
+ * local join consideration because foreignjoin would be cheapst in most
+ * case when joining on remote side is possible.
+ */
+ if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+ {
+ joinrel->fdwroutine->GetForeignJoinPath(root,
+ joinrel,
+ outerrel,
+ innerrel,
+ jointype,
+ sjinfo,
+ &semifactors,
+ restrictlist,
+ extra_lateral_rels);
+ }
+
+ /*
+ * 2. Consider mergejoin paths where both relations must be explicitly
* sorted. Skip this if we can't mergejoin.
*/
if (mergejoin_allowed)
@@ -217,7 +236,7 @@ add_paths_to_joinrel(PlannerInfo *root,
param_source_rels, extra_lateral_rels);
/*
- * 2. Consider paths where the outer relation need not be explicitly
+ * 3. Consider paths where the outer relation need not be explicitly
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle right/full
@@ -232,7 +251,7 @@ add_paths_to_joinrel(PlannerInfo *root,
#ifdef NOT_USED
/*
- * 3. Consider paths where the inner relation need not be explicitly
+ * 4. Consider paths where the inner relation need not be explicitly
* sorted. This includes mergejoins only (nestloops were already built in
* match_unsorted_outer).
*
@@ -250,7 +269,7 @@ add_paths_to_joinrel(PlannerInfo *root,
#endif
/*
- * 4. Consider paths where both outer and inner relations must be hashed
+ * 5. Consider paths where both outer and inner relations must be hashed
* before being joined. As above, disregard enable_hashjoin for full
* joins, because there may be no other alternative.
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4b641a2..4c28dcb 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -83,6 +83,9 @@ static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
Plan *outer_plan, Plan *inner_plan);
static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
Plan *outer_plan, Plan *inner_plan);
+static ForeignJoin *create_foreignjoin_plan(PlannerInfo *root,
+ ForeignJoinPath *best_path, Plan *outer_plan,
+ Plan *inner_plan);
static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
static void process_subquery_nestloop_params(PlannerInfo *root,
@@ -235,6 +238,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
case T_ForeignScan:
plan = create_scan_plan(root, best_path);
break;
+ case T_ForeignJoin:
case T_HashJoin:
case T_MergeJoin:
case T_NestLoop:
@@ -625,6 +629,13 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
outer_plan,
inner_plan);
break;
+ case T_ForeignJoin:
+ /* Create ForeignScan plan node for ForeignJoin path */
+ plan = (Plan *) create_foreignjoin_plan(root,
+ (ForeignJoinPath *) best_path,
+ outer_plan,
+ inner_plan);
+ break;
case T_NestLoop:
/* Restore curOuterRels */
bms_free(root->curOuterRels);
@@ -2524,6 +2535,28 @@ create_hashjoin_plan(PlannerInfo *root,
return join_plan;
}
+static ForeignJoin *
+create_foreignjoin_plan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ Plan *outer_plan,
+ Plan *inner_plan)
+{
+ ForeignJoin *join_plan;
+ RelOptInfo *rel = best_path->jpath.path.parent;
+
+ Assert(rel->fdwroutine);
+ Assert(rel->fdwroutine->GetForeignJoinPlan);
+
+ join_plan = rel->fdwroutine->GetForeignJoinPlan(root,
+ best_path,
+ outer_plan,
+ inner_plan);
+
+ copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path);
+
+ return join_plan;
+}
+
/*****************************************************************************
*
@@ -3727,6 +3760,34 @@ make_mergejoin(List *tlist,
return node;
}
+ForeignJoin *
+make_foreignjoin(List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Oid serverid,
+ List *fdwclauses,
+ List *fdw_private,
+ Plan *lefttree,
+ Plan *righttree,
+ JoinType jointype)
+{
+ ForeignJoin *node = makeNode(ForeignJoin);
+ Plan *plan = &node->join.plan;
+
+ /* cost will be filled in by create_foreignjoin_plan */
+ plan->targetlist = tlist;
+ plan->qual = otherclauses;
+ plan->lefttree = lefttree;
+ plan->righttree = righttree;
+ node->join.jointype = jointype;
+ node->join.joinqual = joinclauses;
+ node->serverid = serverid;
+ node->fdwclauses = fdwclauses;
+ node->fdw_private = fdw_private;
+
+ return node;
+}
+
/*
* make_sort --- basic routine to build a Sort plan node
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9ddc8ad..a3501706 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -582,6 +582,7 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
case T_NestLoop:
case T_MergeJoin:
case T_HashJoin:
+ case T_ForeignJoin:
set_join_references(root, (Join *) plan, rtoffset);
break;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3e7dc85..1f4e361 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2403,6 +2403,15 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
&context);
break;
+ case T_ForeignJoin:
+ /*
+ * TODO: consider ForeignJoin-specific information, see
+ * T_ForeignScan section above.
+ */
+ finalize_primnode((Node *) ((Join *) plan)->joinqual,
+ &context);
+ break;
+
case T_Limit:
finalize_primnode(((Limit *) plan)->limitOffset,
&context);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 319e8b2..4e9fb44 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1859,6 +1859,58 @@ create_hashjoin_path(PlannerInfo *root,
}
/*
+ * create_foreignjoin_path
+ * Creates a pathnode corresponding to a foreign join between two relations.
+ * Unlike similar funcitons for other join types, final_cost_foreignjoin is
+ * not called, so FDW have to take care of cost information.
+ *
+ * 'joinrel' is the join relation
+ * 'jointype' is the type of join required
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'outer_path' is the cheapest outer path
+ * 'inner_path' is the cheapest inner path
+ * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
+ * 'required_outer' is the set of required outer rels
+ * 'foreignclauses' are the RestrictInfo nodes to use as foreign clauses
+ * (this should be a subset of the restrict_clauses list)
+ */
+ForeignJoinPath *
+create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer)
+{
+ ForeignJoinPath *pathnode = makeNode(ForeignJoinPath);
+
+ pathnode->jpath.path.pathtype = T_ForeignJoin;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
+ get_joinrel_parampathinfo(root,
+ joinrel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ required_outer,
+ &restrict_clauses);
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+ pathnode->fdw_private = NIL;
+
+ return pathnode;
+}
+
+/*
* reparameterize_path
* Attempt to modify a Path to have greater parameterization
*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2becfa..22b7523 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
#include "catalog/catalog.h"
#include "catalog/heap.h"
#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
@@ -378,9 +379,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
/* Grab the fdwroutine info using the relcache, while we have it */
if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ rel->serverid = GetForeignTableServerOid(relation->rd_id);
rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+ }
else
+ {
+ rel->serverid = InvalidOid;
rel->fdwroutine = NULL;
+ }
heap_close(relation, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4c76f54..278ca2e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -121,6 +121,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->subplan = NULL;
rel->subroot = NULL;
rel->subplan_params = NIL;
+ rel->serverid = InvalidOid;
rel->fdwroutine = NULL;
rel->fdw_private = NULL;
rel->baserestrictinfo = NIL;
@@ -383,7 +384,17 @@ build_join_rel(PlannerInfo *root,
joinrel->subplan = NULL;
joinrel->subroot = NULL;
joinrel->subplan_params = NIL;
- joinrel->fdwroutine = NULL;
+ /* propagate common server information up to join relation */
+ if (inner_rel->serverid == outer_rel->serverid)
+ {
+ joinrel->fdwroutine = inner_rel->fdwroutine;
+ joinrel->serverid = inner_rel->serverid;
+ }
+ else
+ {
+ joinrel->serverid = InvalidOid;
+ joinrel->fdwroutine = NULL;
+ }
joinrel->fdw_private = NULL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
diff --git a/src/include/executor/nodeForeignjoin.h b/src/include/executor/nodeForeignjoin.h
new file mode 100644
index 0000000..802498c
--- /dev/null
+++ b/src/include/executor/nodeForeignjoin.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeForeignjoin.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeForeignjoin.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEFOREIGNJOIN_H
+#define NODEFOREIGNJOIN_H
+
+#include "nodes/execnodes.h"
+
+extern ForeignJoinState *ExecInitForeignJoin(ForeignJoin *node, EState *estate, int eflags);
+extern TupleTableSlot *ExecForeignJoin(ForeignJoinState *node);
+extern void ExecEndForeignJoin(ForeignJoinState *node);
+extern void ExecReScanForeignJoin(ForeignJoinState *node);
+
+#endif /* NODEFOREIGNJOIN_H */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index dc0a7fc7..56fc8e0 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,29 @@ typedef void (*EndForeignModify_function) (EState *estate,
typedef int (*IsForeignRelUpdatable_function) (Relation rel);
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels);
+
+typedef ForeignJoin *(*GetForeignJoinPlan_function) (PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ Plan *outerplan,
+ Plan *innerplan);
+
+typedef void (*BeginForeignJoin_function) (ForeignJoinState *node,
+ int eflags);
+typedef TupleTableSlot *(*IterateForeignJoin_function) (ForeignJoinState *node);
+
+typedef void (*ReScanForeignJoin_function) (ForeignJoinState *node);
+
+typedef void (*EndForeignJoin_function) (ForeignJoinState *node);
+
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
@@ -150,12 +173,22 @@ typedef struct FdwRoutine
/* Support functions for IMPORT FOREIGN SCHEMA */
ImportForeignSchema_function ImportForeignSchema;
+
+ /* Support functions for join push-down */
+ GetForeignJoinPath_function GetForeignJoinPath;
+ GetForeignJoinPlan_function GetForeignJoinPlan;
+ BeginForeignJoin_function BeginForeignJoin;
+ IterateForeignJoin_function IterateForeignJoin;
+ ReScanForeignJoin_function ReScanForeignJoin;
+ EndForeignJoin_function EndForeignJoin;
+
} FdwRoutine;
/* Functions in foreign/foreign.c */
extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
+extern FdwRoutine * GetFdwRoutineByServerId(Oid serverid);
extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
extern bool IsImportableForeignTable(const char *tablename,
ImportForeignSchemaStmt *stmt);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index ac080d7..b9e120a 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -75,6 +75,7 @@ extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
bool missing_ok);
extern ForeignTable *GetForeignTable(Oid relid);
+extern Oid GetForeignTableServerOid(Oid relid);
extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b271f21..dd2f69c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1636,6 +1636,21 @@ typedef struct HashJoinState
bool hj_OuterNotEmpty;
} HashJoinState;
+/* ----------------
+ * ForeignJoinState information
+ *
+ * fdwroutine handler functions used to process the join
+ * fdw_state FDW-private state information
+ * ----------------
+ */
+typedef struct ForeignJoinState
+{
+ JoinState js; /* its first field is NodeTag */
+ /* use struct pointer to avoid including fdwapi.h here */
+ struct FdwRoutine *fdwroutine;
+ void *fdw_state; /* foreign-data wrapper can keep state here */
+} ForeignJoinState;
+
/* ----------------------------------------------------------------
* Materialization State Information
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 154d943..c81eff3 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -66,6 +66,7 @@ typedef enum NodeTag
T_NestLoop,
T_MergeJoin,
T_HashJoin,
+ T_ForeignJoin,
T_Material,
T_Sort,
T_Group,
@@ -111,6 +112,7 @@ typedef enum NodeTag
T_NestLoopState,
T_MergeJoinState,
T_HashJoinState,
+ T_ForeignJoinState,
T_MaterialState,
T_SortState,
T_GroupState,
@@ -222,6 +224,7 @@ typedef enum NodeTag
T_NestPath,
T_MergePath,
T_HashPath,
+ T_ForeignJoinPath,
T_TidPath,
T_ForeignPath,
T_AppendPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1839494..bb741ae 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -569,6 +569,18 @@ typedef struct HashJoin
} HashJoin;
/* ----------------
+ * foreign join node
+ * ----------------
+ */
+typedef struct ForeignJoin
+{
+ Join join;
+ Oid serverid;
+ List *fdwclauses; /* expressions that FDW may evaluate */
+ List *fdw_private;
+} ForeignJoin;
+
+/* ----------------
* materialization node
* ----------------
*/
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f1a0504..71ac248 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -364,13 +364,14 @@ typedef struct PlannerInfo
* subplan - plan for subquery (NULL if it's not a subquery)
* subroot - PlannerInfo for subquery (NULL if it's not a subquery)
* subplan_params - list of PlannerParamItems to be passed to subquery
+ * serverid - OID of server, if foreign table (else InvalidOid)
* fdwroutine - function hooks for FDW, if foreign table (else NULL)
* fdw_private - private state for FDW, if foreign table (else NULL)
*
* Note: for a subquery, tuples, subplan, subroot are not set immediately
* upon creation of the RelOptInfo object; they are filled in when
- * set_subquery_pathlist processes the object. Likewise, fdwroutine
- * and fdw_private are filled during initial path creation.
+ * set_subquery_pathlist processes the object. Likewise, serverid,
+ * fdwroutine, and fdw_private are filled during initial path creation.
*
* For otherrels that are appendrel members, these fields are filled
* in just as for a baserel.
@@ -459,6 +460,7 @@ typedef struct RelOptInfo
PlannerInfo *subroot; /* if subquery */
List *subplan_params; /* if subquery */
/* use "struct FdwRoutine" to avoid including fdwapi.h here */
+ Oid serverid; /* if foreign table */
struct FdwRoutine *fdwroutine; /* if foreign table */
void *fdw_private; /* if foreign table */
@@ -1053,6 +1055,22 @@ typedef struct HashPath
} HashPath;
/*
+ * ForeignJoinPath represents a join between two relations consist of foreign
+ * table.
+ *
+ * fdw_private stores FDW private data about the join. While fdw_private is
+ * not actually touched by the core code during normal operations, it's
+ * generally a good idea to use a representation that can be dumped by
+ * nodeToString(), so that you can examine the structure during debugging
+ * with tools like pprint().
+ */
+typedef struct ForeignJoinPath
+{
+ JoinPath jpath;
+ List *fdw_private;
+} ForeignJoinPath;
+
+/*
* Restriction clause info.
*
* We create one of these for each AND sub-clause of a restriction condition
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 26b17f5..7a1f236 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -124,6 +124,17 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
Relids required_outer,
List *hashclauses);
+extern ForeignJoinPath *create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer);
+
extern Path *reparameterize_path(PlannerInfo *root, Path *path,
Relids required_outer,
double loop_count);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 3fdc2cb..a0e5788 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -45,6 +45,10 @@ extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
Index scanrelid, Plan *subplan);
extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
Index scanrelid, List *fdw_exprs, List *fdw_private);
+extern ForeignJoin *make_foreignjoin(List *tlist, List *joinclauses,
+ List *otherclauses, Oid serverid, List *fdwclauses,
+ List *fdw_private, Plan *lefttree, Plan *righttree,
+ JoinType jointype);
extern Append *make_append(List *appendplans, List *tlist);
extern RecursiveUnion *make_recursive_union(List *tlist,
Plan *lefttree, Plan *righttree, int wtParam,
Hi hackers,
I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below. Note that the patch
requires Kaigai-san's custom foriegn join patch[1]/messages/by-id/9A28C8860F777E439AA12E8AEA7694F80108C355@BPXM15GP.gisp.nec.co.jp
[1]: /messages/by-id/9A28C8860F777E439AA12E8AEA7694F80108C355@BPXM15GP.gisp.nec.co.jp
Joins to be pushed down
=======================
We have two levels of decision about Join push-down, core and FDW. I
think we should allow them to push down joins as much as we can unless
it doesn't break the semantics of join. Anyway FDWs should decide
whether the join can be pushed down or not, on the basis of the FDW's
capability.
Here is the list of checks which should be done in core:
1. Join source relations
All of foreign tables used in a join should be managed by one foreign
data wrapper. I once proposed that all source tables should belong to
one server, because at that time I assumed that FDWs use SERVER to
express physical place of data source. But Robert's comment gave me
an idea that SERVER is not important for some FDWs, so now I think
check about server matching should be done by FDWs.
USER MAPPING is another important attribute of foreign scan/join, and
IMO it should be checked by FDW because some of FDWs don't require
USER MAPPING. If an FDW want to check user mapping, all tables in the
join should belong to the same server and have same
RangeTablEntry#checkAsUser to ensure that only one user mapping is
derived.
2. Join type
Join type can be any, except JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER,
though most of FDWs would support only INNER and OUTER.
Pushing down CROSS joins might seem inefficient, because obviously
CROSS JOIN always produces more result than retrieving all rows from
each foreign table separately. However, some FDW might be able to
accelerate such join with cache or something. So I think we should
leave such decision to FDWs.
Here is the list of checks which shold be done in postgres_fdw:
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs to
check that 1) all foreign tables in the join belong to a server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible to
join even join relations, so N-way join is listed as enhancement item
below.
2. Join type
In the first proposal, postgres_fdw allows INNER and OUTER joins to be
pushed down. CROSS, SEMI and ANTI would have much less use cases.
3. Join conditions and WHERE clauses
Join conditions should consist of semantically safe expressions.
Where the "semantically safe" means is same as WHERE clause push-down.
Planned enhancements for 9.5
============================
These features will be proposed as enhancements, hopefully in the 9.5
development cycle, but probably in 9.6.
1. Remove unnecessary column from SELECT clause
Columns which are used for only join conditions can be removed from
the target list, as postgres_fdw does in simple foreign scans.
2. Support N-way joins
Mostly for difficulty of SQL generation, I didn't add support of N-Way joins.
3. Proper cost estimation
Currently postgres_fdw always gives 0 as the cost of a foreign join,
as a compromise. This is because estimating costs of each join
without round-trip (EXPLAIN) is not easy. A rough idea about that I
currently have is to use local statistics, but determining join method
used at remote might require whole planner to run for the join
subtree.
Regards,
--
Shigeru HANADA
Attachments:
join_pushdown.patchapplication/octet-stream; name=join_pushdown.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index a75462b..f6d6936 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -86,7 +86,7 @@ typedef struct foreign_loc_cxt
typedef struct deparse_expr_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning for */
+ Relids rels; /* list of foreign tables to be deparsed */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
} deparse_expr_cxt;
@@ -108,6 +108,7 @@ static void deparseTargetList(StringInfo buf,
Index rtindex,
Relation rel,
Bitmapset *attrs_used,
+ const char *alias,
List **retrieved_attrs);
static void deparseReturningList(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
@@ -115,7 +116,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
static void deparseColumnRef(StringInfo buf, int varno, int varattno,
- PlannerInfo *root);
+ PlannerInfo *root, const char *alias);
static void deparseRelation(StringInfo buf, Relation rel);
static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
static void deparseVar(Var *node, deparse_expr_cxt *context);
@@ -679,33 +680,119 @@ is_builtin(Oid oid)
void
deparseSelectSql(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
- Bitmapset *attrs_used,
- List **retrieved_attrs)
+ List *rels)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
- Relation rel;
+ StringInfoData frombuf;
+ ListCell *lc;
+ bool first_rel = true;
+ Relids relids = NULL;
- /*
- * Core code already has some lock on each rel being planned, so we can
- * use NoLock here.
- */
- rel = heap_open(rte->relid, NoLock);
+ initStringInfo(&frombuf);
- /*
- * Construct SELECT list
- */
- appendStringInfoString(buf, "SELECT ");
- deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
- retrieved_attrs);
+ /* Construct list of relid for deparsing query contains multiple tables. */
+ foreach(lc, rels)
+ {
+ PgFdwDeparseRel *dr = (PgFdwDeparseRel *) lfirst(lc);
+ relids = bms_add_member(relids, dr->baserel->relid);
+ }
- /*
- * Construct FROM clause
- */
- appendStringInfoString(buf, " FROM ");
- deparseRelation(buf, rel);
+ /* Loop through relation list and deparse SELECT query. */
+ foreach(lc, rels)
+ {
+ PgFdwDeparseRel *dr = (PgFdwDeparseRel *) lfirst(lc);
+ RangeTblEntry *rte = planner_rt_fetch(dr->baserel->relid, root);
+ Relation rel;
+ const char *alias;
- heap_close(rel, NoLock);
+ /*
+ * Core code already has some lock on each rel being planned, so we can
+ * use NoLock here.
+ */
+ rel = heap_open(rte->relid, NoLock);
+
+ /*
+ * Add alias only when we have multiple relations.
+ */
+ if (list_length(rels) > 1 && rte->alias)
+ alias = rte->alias->aliasname;
+ else
+ alias = NULL;
+
+ /*
+ * Construct SELECT list
+ */
+ if (first_rel)
+ appendStringInfoString(buf, "SELECT ");
+ else
+ appendStringInfoString(buf, ", ");
+ deparseTargetList(buf, root, dr->baserel->relid, rel, dr->attrs_used,
+ alias, dr->retrieved_attrs);
+
+ /*
+ * Construct FROM clause
+ */
+ if (first_rel)
+ appendStringInfoString(&frombuf, " FROM ");
+ else
+ {
+ switch (dr->jointype)
+ {
+ case JOIN_INNER:
+ if (dr->joinclauses)
+ appendStringInfoString(&frombuf, " INNER JOIN ");
+ else
+ /* Currently cross join is not pushed down, though. */
+ appendStringInfoString(&frombuf, " CROSS JOIN ");
+ break;
+ case JOIN_LEFT:
+ appendStringInfoString(&frombuf, " LEFT JOIN ");
+ break;
+ case JOIN_FULL:
+ appendStringInfoString(&frombuf, " FULL JOIN ");
+ break;
+ case JOIN_RIGHT:
+ appendStringInfoString(&frombuf, " RIGHT JOIN ");
+ break;
+ default:
+ elog(ERROR, "unsupported join type for deparse: %d",
+ dr->jointype);
+ break;
+ }
+ }
+ deparseRelation(&frombuf, rel);
+ if (alias)
+ appendStringInfo(&frombuf, " %s", alias);
+
+ if (!first_rel && dr->joinclauses)
+ {
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfoString(&frombuf, " ON ");
+
+ foreach(lc, dr->joinclauses)
+ {
+ deparse_expr_cxt context;
+ Expr *expr = (Expr *) lfirst(lc);
+
+ context.root = root;
+ context.rels = relids;
+ context.buf = &frombuf;
+ context.params_list = NULL;
+
+ if (!first)
+ appendStringInfoString(&frombuf, " AND ");
+ deparseExpr(expr, &context);
+ first = false;
+ }
+ }
+
+ heap_close(rel, NoLock);
+ first_rel = false;
+ }
+
+ appendStringInfoString(buf, frombuf.data);
+ pfree(frombuf.data);
}
/*
@@ -721,6 +808,7 @@ deparseTargetList(StringInfo buf,
Index rtindex,
Relation rel,
Bitmapset *attrs_used,
+ const char *alias,
List **retrieved_attrs)
{
TupleDesc tupdesc = RelationGetDescr(rel);
@@ -751,7 +839,7 @@ deparseTargetList(StringInfo buf,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, i, root);
+ deparseColumnRef(buf, rtindex, i, root, alias);
*retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
@@ -768,6 +856,8 @@ deparseTargetList(StringInfo buf,
appendStringInfoString(buf, ", ");
first = false;
+ if (alias)
+ appendStringInfo(buf, "%s.", alias);
appendStringInfoString(buf, "ctid");
*retrieved_attrs = lappend_int(*retrieved_attrs,
@@ -796,7 +886,7 @@ deparseTargetList(StringInfo buf,
void
appendWhereClause(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ Relids relids,
List *exprs,
bool is_first,
List **params)
@@ -810,7 +900,7 @@ appendWhereClause(StringInfo buf,
/* Set up context struct for recursion */
context.root = root;
- context.foreignrel = baserel;
+ context.rels = relids;
context.buf = buf;
context.params_list = params;
@@ -870,7 +960,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, attnum, root);
+ deparseColumnRef(buf, rtindex, attnum, root, NULL);
}
appendStringInfoString(buf, ") VALUES (");
@@ -928,7 +1018,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, attnum, root);
+ deparseColumnRef(buf, rtindex, attnum, root, NULL);
appendStringInfo(buf, " = $%d", pindex);
pindex++;
}
@@ -993,7 +1083,7 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
if (attrs_used != NULL)
{
appendStringInfoString(buf, " RETURNING ");
- deparseTargetList(buf, root, rtindex, rel, attrs_used,
+ deparseTargetList(buf, root, rtindex, rel, attrs_used, NULL,
retrieved_attrs);
}
else
@@ -1088,7 +1178,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
* If it has a column_name FDW option, use that instead of attribute name.
*/
static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root,
+ const char *alias)
{
RangeTblEntry *rte;
char *colname = NULL;
@@ -1124,6 +1215,8 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
if (colname == NULL)
colname = get_relid_attribute_name(rte->relid, varattno);
+ if (alias)
+ appendStringInfo(buf, "%s.", alias);
appendStringInfoString(buf, quote_identifier(colname));
}
@@ -1270,12 +1363,46 @@ static void
deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
+ int i;
+ RelOptInfo *rel = NULL;
+ RangeTblEntry *rte = NULL;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ /* Find RangeTblEntry contains given Var to determine alias name. */
+ if (bms_is_member(node->varno, context->rels) && node->varlevelsup == 0)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ for (i = 1; i < context->root->simple_rel_array_size; i++)
+ {
+ /* Skip empty slot */
+ if (context->root->simple_rel_array[i] == NULL)
+ continue;
+
+ if (context->root->simple_rel_array[i]->relid == node->varno)
+ {
+ rel = context->root->simple_rel_array[i];
+ rte = context->root->simple_rte_array[i];
+ break;
+ }
+ }
+ }
+
+ /*
+ * If the Var is in current level (not in outer subquery), simply deparse
+ * it.
+ */
+ if (rel)
+ {
+ const char *alias;
+
+ /*
+ * Deparse Var belongs to foreign tables in context->rels, with alias
+ * name if we are deparsing multiple foreign tables.
+ */
+ if (bms_num_members(context->rels) > 1 && rte->alias)
+ alias = rte->alias->aliasname;
+ else
+ alias = NULL;
+ deparseColumnRef(buf, node->varno, node->varattno, context->root,
+ alias);
}
else
{
@@ -1849,3 +1976,4 @@ printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
appendStringInfo(buf, "((SELECT null::%s)::%s)", ptypename, ptypename);
}
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c3039a6..aaae2bb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -288,6 +288,16 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ List *restrictlisti,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Relids param_source_rels,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -329,6 +339,17 @@ static HeapTuple make_tuple_from_result_row(PGresult *res,
MemoryContext temp_context);
static void conversion_error_callback(void *arg);
+extern void _PG_init(void);
+
+static set_join_pathlist_hook_type prev_set_join_pathlist_hook;
+
+void
+_PG_init(void)
+{
+ prev_set_join_pathlist_hook = set_join_pathlist_hook;
+ set_join_pathlist_hook = postgresGetForeignJoinPath;
+}
+
/*
* Foreign-data wrapper handler function: return a struct with pointers
@@ -752,6 +773,7 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ PgFdwDeparseRel dr;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -797,11 +819,15 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
+ dr.baserel = baserel;
+ dr.jointype = JOIN_INNER;
+ dr.joinclauses = NIL;
+ dr.attrs_used = fpinfo->attrs_used;
+ dr.retrieved_attrs = &retrieved_attrs;
+ deparseSelectSql(&sql, root, list_make1(&dr));
if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
+ appendWhereClause(&sql, root, bms_add_member(NULL, baserel->relid),
+ remote_conds, true, ¶ms_list);
/*
* Add FOR UPDATE/SHARE if appropriate. We apply locking during the
@@ -1725,10 +1751,12 @@ estimate_path_cost_size(PlannerInfo *root,
List *remote_join_conds;
List *local_join_conds;
StringInfoData sql;
+ Relids relids;
List *retrieved_attrs;
PGconn *conn;
Selectivity local_sel;
QualCost local_cost;
+ PgFdwDeparseRel dr;
/*
* join_conds might contain both clauses that are safe to send across,
@@ -1743,14 +1771,19 @@ estimate_path_cost_size(PlannerInfo *root,
* dummy values.
*/
initStringInfo(&sql);
+ dr.baserel = baserel;
+ dr.jointype = JOIN_INNER;
+ dr.joinclauses = NIL;
+ dr.attrs_used = fpinfo->attrs_used;
+ dr.retrieved_attrs = &retrieved_attrs;
appendStringInfoString(&sql, "EXPLAIN ");
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
+ deparseSelectSql(&sql, root, list_make1(&dr));
+ relids = bms_add_member(NULL, baserel->relid);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
+ appendWhereClause(&sql, root, relids, fpinfo->remote_conds,
true, NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
+ appendWhereClause(&sql, root, relids, remote_join_conds,
(fpinfo->remote_conds == NIL), NULL);
/* Get the remote estimate */
@@ -2835,6 +2868,118 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * postgresGetForeignJoinPath
+ * Add possible ForeignJoinPath to joinrel.
+ *
+ */
+static void
+postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ List *restrictlist,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Relids param_source_rels,
+ Relids extra_lateral_rels)
+{
+ int i;
+ bool rte_found;
+ Oid serverid = InvalidOid;
+ Oid checkAsUser = InvalidOid;
+ Path *outerpath = outerrel->cheapest_total_path;
+ Path *innerpath = innerrel->cheapest_total_path;
+ ForeignJoinPath *joinpath;
+ Relids required_outer;
+ List *fdw_private = NIL;
+
+ /* Skip considering reversed join combination */
+ elog(DEBUG1, "%s() outer: %d, inner: %d",
+ __func__, outerrel->relid, innerrel->relid);
+ if (outerrel->relid < innerrel->relid)
+ return;
+
+ /* At the moment we support only joins between foreign tables. */
+ if (outerrel->reloptkind != RELOPT_BASEREL ||
+ innerrel->reloptkind != RELOPT_BASEREL)
+ return;
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ return;
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && !restrictlist)
+ return;
+
+ /*
+ * All relations in the join must belong to same server, and have same
+ * checkAsUser to use one connection to execute SQL for the join.
+ */
+ rte_found = false;
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RangeTblEntry *rte;
+ ForeignTable *ft;
+
+ if (!bms_is_member(i, joinrel->relids))
+ continue;
+
+ rte = root->simple_rte_array[i];
+ if (rte == NULL)
+ continue;
+
+ ft = GetForeignTable(rte->relid);
+ if (rte_found)
+ {
+ if (serverid != ft->serverid)
+ return;
+ if (checkAsUser != rte->checkAsUser)
+ return;
+ }
+ else
+ {
+ checkAsUser = rte->checkAsUser;
+ serverid = ft->serverid;
+ rte_found = true;
+ }
+ }
+ fdw_private = lappend_oid(fdw_private, checkAsUser);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ required_outer = calc_non_nestloop_required_outer(outerpath, innerpath);
+ joinpath = create_foreignjoin_path(root,
+ joinrel,
+ jointype,
+ sjinfo,
+ semifactors,
+ outerpath,
+ innerpath,
+ restrictlist,
+ NIL,
+ required_outer);
+
+ /* TODO determine cost and rows of the join. */
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 0382c55..dcfcb77 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -39,6 +39,13 @@ extern int ExtractConnectionOptions(List *defelems,
const char **values);
/* in deparse.c */
+typedef struct PgFdwDeparseRel {
+ RelOptInfo *baserel;
+ JoinType jointype;
+ List *joinclauses;
+ Bitmapset *attrs_used;
+ List **retrieved_attrs;
+} PgFdwDeparseRel;
extern void classifyConditions(PlannerInfo *root,
RelOptInfo *baserel,
List *input_conds,
@@ -49,12 +56,10 @@ extern bool is_foreign_expr(PlannerInfo *root,
Expr *expr);
extern void deparseSelectSql(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
- Bitmapset *attrs_used,
- List **retrieved_attrs);
+ List *rels);
extern void appendWhereClause(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ Relids relids,
List *exprs,
bool is_first,
List **params);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 457cbab..7023a40 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -250,6 +250,29 @@ GetForeignTable(Oid relid)
/*
+ * GetForeignTableServerOid - Get OID of the server related to the given
+ * foreign table.
+ */
+Oid
+GetForeignTableServerOid(Oid relid)
+{
+ Form_pg_foreign_table tableform;
+ HeapTuple tp;
+ Oid serverid;
+
+ tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for foreign table %u", relid);
+ tableform = (Form_pg_foreign_table) GETSTRUCT(tp);
+ serverid = tableform->ftserver;
+
+ ReleaseSysCache(tp);
+
+ return serverid;
+}
+
+
+/*
* GetForeignColumnOptions - Get attfdwoptions of given relation/attnum
* as list of DefElem.
*/
@@ -309,11 +332,15 @@ GetFdwRoutine(Oid fdwhandler)
FdwRoutine *
GetFdwRoutineByServer(Oid serverid)
{
+ return GetFdwRoutineByServerId(serverid);
+}
+
+FdwRoutine *
+GetFdwRoutineByServerId(Oid serverid)
+{
HeapTuple tp;
- Form_pg_foreign_data_wrapper fdwform;
Form_pg_foreign_server serverform;
Oid fdwid;
- Oid fdwhandler;
/* Get foreign-data wrapper OID for the server. */
tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(serverid));
@@ -323,6 +350,16 @@ GetFdwRoutineByServer(Oid serverid)
fdwid = serverform->srvfdw;
ReleaseSysCache(tp);
+ return GetFdwRoutineByFdwId(fdwid);
+}
+
+FdwRoutine *
+GetFdwRoutineByFdwId(Oid fdwid)
+{
+ HeapTuple tp;
+ Form_pg_foreign_data_wrapper fdwform;
+ Oid fdwhandler;
+
/* Get handler function OID for the FDW. */
tp = SearchSysCache1(FOREIGNDATAWRAPPEROID, ObjectIdGetDatum(fdwid));
if (!HeapTupleIsValid(tp))
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d799eb8..b9ef533 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1702,6 +1702,16 @@ _outHashPath(StringInfo str, const HashPath *node)
}
static void
+_outForeignJoinPath(StringInfo str, const ForeignJoinPath *node)
+{
+ WRITE_NODE_TYPE("FOREIGNJOINPATH");
+
+ _outJoinPathInfo(str, (const JoinPath *) node);
+
+ WRITE_NODE_FIELD(fdw_private);
+}
+
+static void
_outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
{
WRITE_NODE_TYPE("PLANNERGLOBAL");
@@ -1800,6 +1810,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(subplan);
WRITE_NODE_FIELD(subroot);
WRITE_NODE_FIELD(subplan_params);
+ WRITE_OID_FIELD(fdwid);
/* we don't try to print fdwroutine or fdw_private */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_NODE_FIELD(joininfo);
@@ -3124,6 +3135,9 @@ _outNode(StringInfo str, const void *obj)
case T_HashPath:
_outHashPath(str, obj);
break;
+ case T_ForeignJoinPath:
+ _outForeignJoinPath(str, obj);
+ break;
case T_PlannerGlobal:
_outPlannerGlobal(str, obj);
break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 659daa2..637e2d8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1782,8 +1782,8 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors)
{
- Path *outer_path = path->outerjoinpath;
- Path *inner_path = path->innerjoinpath;
+ Path *outer_path = path->jpath.outerjoinpath;
+ Path *inner_path = path->jpath.innerjoinpath;
double outer_path_rows = outer_path->rows;
double inner_path_rows = inner_path->rows;
Cost startup_cost = workspace->startup_cost;
@@ -1794,10 +1794,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
double ntuples;
/* Mark the path with the correct row estimate */
- if (path->path.param_info)
- path->path.rows = path->path.param_info->ppi_rows;
+ if (path->jpath.path.param_info)
+ path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
else
- path->path.rows = path->path.parent->rows;
+ path->jpath.path.rows = path->jpath.path.parent->rows;
/*
* We could include disable_cost in the preliminary estimate, but that
@@ -1809,7 +1809,7 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
/* cost of source data */
- if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI)
+ if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI)
{
double outer_matched_rows = workspace->outer_matched_rows;
Selectivity inner_scan_frac = workspace->inner_scan_frac;
@@ -1856,13 +1856,13 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
}
/* CPU costs */
- cost_qual_eval(&restrict_qual_cost, path->joinrestrictinfo, root);
+ cost_qual_eval(&restrict_qual_cost, path->jpath.joinrestrictinfo, root);
startup_cost += restrict_qual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + restrict_qual_cost.per_tuple;
run_cost += cpu_per_tuple * ntuples;
- path->path.startup_cost = startup_cost;
- path->path.total_cost = startup_cost + run_cost;
+ path->jpath.path.startup_cost = startup_cost;
+ path->jpath.path.total_cost = startup_cost + run_cost;
}
/*
@@ -3306,14 +3306,14 @@ compute_semi_anti_join_factors(PlannerInfo *root,
static bool
has_indexed_join_quals(NestPath *joinpath)
{
- Relids joinrelids = joinpath->path.parent->relids;
- Path *innerpath = joinpath->innerjoinpath;
+ Relids joinrelids = joinpath->jpath.path.parent->relids;
+ Path *innerpath = joinpath->jpath.innerjoinpath;
List *indexclauses;
bool found_one;
ListCell *lc;
/* If join still has quals to evaluate, it's not fast */
- if (joinpath->joinrestrictinfo != NIL)
+ if (joinpath->jpath.joinrestrictinfo != NIL)
return false;
/* Nor if the inner path isn't parameterized at all */
if (innerpath->param_info == NULL)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 030158d..29ee414 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,6 +17,7 @@
#include <math.h>
#include "executor/executor.h"
+#include "foreign/fdwapi.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -52,7 +53,6 @@ static List *select_mergejoin_clauses(PlannerInfo *root,
JoinType jointype,
bool *mergejoin_allowed);
-
/*
* add_paths_to_joinrel
* Given a join relation and two component rels from which it can be made,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9683560..fc3ef81 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -83,11 +83,11 @@ static CustomScan *create_customscan_plan(PlannerInfo *root,
CustomPath *best_path,
List *tlist, List *scan_clauses);
static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
static void process_subquery_nestloop_params(PlannerInfo *root,
@@ -238,6 +238,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
case T_CteScan:
case T_WorkTableScan:
case T_ForeignScan:
+ case T_ForeignJoinPath:
case T_CustomScan:
plan = create_scan_plan(root, best_path);
break;
@@ -409,6 +410,7 @@ create_scan_plan(PlannerInfo *root, Path *best_path)
break;
case T_ForeignScan:
+ case T_ForeignJoinPath:
plan = (Plan *) create_foreignscan_plan(root,
(ForeignPath *) best_path,
tlist,
@@ -611,6 +613,7 @@ create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
static Plan *
create_join_plan(PlannerInfo *root, JoinPath *best_path)
{
+ List *tlist;
Plan *outer_plan;
Plan *inner_plan;
Plan *plan;
@@ -625,27 +628,34 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
inner_plan = create_plan_recurse(root, best_path->innerjoinpath);
+ if (best_path->path.pathtype == T_NestLoop)
+ {
+ /* Restore curOuterRels */
+ bms_free(root->curOuterRels);
+ root->curOuterRels = saveOuterRels;
+ }
+ tlist = build_path_tlist(root, &best_path->path);
+
switch (best_path->path.pathtype)
{
case T_MergeJoin:
plan = (Plan *) create_mergejoin_plan(root,
(MergePath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
case T_HashJoin:
plan = (Plan *) create_hashjoin_plan(root,
(HashPath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
case T_NestLoop:
- /* Restore curOuterRels */
- bms_free(root->curOuterRels);
- root->curOuterRels = saveOuterRels;
-
plan = (Plan *) create_nestloop_plan(root,
(NestPath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
@@ -2115,12 +2125,12 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
static NestLoop *
create_nestloop_plan(PlannerInfo *root,
NestPath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
NestLoop *join_plan;
- List *tlist = build_path_tlist(root, &best_path->path);
- List *joinrestrictclauses = best_path->joinrestrictinfo;
+ List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
List *joinclauses;
List *otherclauses;
Relids outerrelids;
@@ -2134,7 +2144,7 @@ create_nestloop_plan(PlannerInfo *root,
/* Get the join qual clauses (in plain expression form) */
/* Any pseudoconstant clauses are ignored here */
- if (IS_OUTER_JOIN(best_path->jointype))
+ if (IS_OUTER_JOIN(best_path->jpath.jointype))
{
extract_actual_join_clauses(joinrestrictclauses,
&joinclauses, &otherclauses);
@@ -2147,7 +2157,7 @@ create_nestloop_plan(PlannerInfo *root,
}
/* Replace any outer-relation variables with nestloop params */
- if (best_path->path.param_info)
+ if (best_path->jpath.path.param_info)
{
joinclauses = (List *)
replace_nestloop_params(root, (Node *) joinclauses);
@@ -2159,7 +2169,7 @@ create_nestloop_plan(PlannerInfo *root,
* Identify any nestloop parameters that should be supplied by this join
* node, and move them from root->curOuterParams to the nestParams list.
*/
- outerrelids = best_path->outerjoinpath->parent->relids;
+ outerrelids = best_path->jpath.outerjoinpath->parent->relids;
nestParams = NIL;
prev = NULL;
for (cell = list_head(root->curOuterParams); cell; cell = next)
@@ -2196,9 +2206,9 @@ create_nestloop_plan(PlannerInfo *root,
nestParams,
outer_plan,
inner_plan,
- best_path->jointype);
+ best_path->jpath.jointype);
- copy_path_costsize(&join_plan->join.plan, &best_path->path);
+ copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path);
return join_plan;
}
@@ -2206,10 +2216,10 @@ create_nestloop_plan(PlannerInfo *root,
static MergeJoin *
create_mergejoin_plan(PlannerInfo *root,
MergePath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
- List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinclauses;
List *otherclauses;
List *mergeclauses;
@@ -2501,10 +2511,10 @@ create_mergejoin_plan(PlannerInfo *root,
static HashJoin *
create_hashjoin_plan(PlannerInfo *root,
HashPath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
- List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinclauses;
List *otherclauses;
List *hashclauses;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 319e8b2..dd81764 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1710,9 +1710,9 @@ create_nestloop_path(PlannerInfo *root,
restrict_clauses = jclauses;
}
- pathnode->path.pathtype = T_NestLoop;
- pathnode->path.parent = joinrel;
- pathnode->path.param_info =
+ pathnode->jpath.path.pathtype = T_NestLoop;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
get_joinrel_parampathinfo(root,
joinrel,
outer_path,
@@ -1720,11 +1720,11 @@ create_nestloop_path(PlannerInfo *root,
sjinfo,
required_outer,
&restrict_clauses);
- pathnode->path.pathkeys = pathkeys;
- pathnode->jointype = jointype;
- pathnode->outerjoinpath = outer_path;
- pathnode->innerjoinpath = inner_path;
- pathnode->joinrestrictinfo = restrict_clauses;
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
final_cost_nestloop(root, pathnode, workspace, sjinfo, semifactors);
@@ -1859,6 +1859,58 @@ create_hashjoin_path(PlannerInfo *root,
}
/*
+ * create_foreignjoin_path
+ * Creates a pathnode corresponding to a foreign join between two relations.
+ * Unlike similar funcitons for other join types, final_cost_foreignjoin is
+ * not called, so FDW have to take care of cost information.
+ *
+ * 'joinrel' is the join relation
+ * 'jointype' is the type of join required
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'outer_path' is the cheapest outer path
+ * 'inner_path' is the cheapest inner path
+ * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
+ * 'required_outer' is the set of required outer rels
+ * 'foreignclauses' are the RestrictInfo nodes to use as foreign clauses
+ * (this should be a subset of the restrict_clauses list)
+ */
+ForeignJoinPath *
+create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer)
+{
+ ForeignJoinPath *pathnode = makeNode(ForeignJoinPath);
+
+ pathnode->jpath.path.pathtype = T_ForeignJoinPath;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
+ get_joinrel_parampathinfo(root,
+ joinrel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ required_outer,
+ &restrict_clauses);
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+ pathnode->fdw_private = NIL;
+
+ return pathnode;
+}
+
+/*
* reparameterize_path
* Attempt to modify a Path to have greater parameterization
*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 627bc53..2e5d91d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
#include "catalog/catalog.h"
#include "catalog/heap.h"
#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0429c76..93d7e79 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -121,6 +121,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->subplan = NULL;
rel->subroot = NULL;
rel->subplan_params = NIL;
+ rel->fdwid = InvalidOid;
rel->fdwroutine = NULL;
rel->fdw_private = NULL;
rel->baserestrictinfo = NIL;
@@ -383,7 +384,17 @@ build_join_rel(PlannerInfo *root,
joinrel->subplan = NULL;
joinrel->subroot = NULL;
joinrel->subplan_params = NIL;
- joinrel->fdwroutine = NULL;
+ /* propagate common server information up to join relation */
+ if (inner_rel->fdwid == outer_rel->fdwid)
+ {
+ joinrel->fdwroutine = inner_rel->fdwroutine;
+ joinrel->fdwid = inner_rel->fdwid;
+ }
+ else
+ {
+ joinrel->fdwid = InvalidOid;
+ joinrel->fdwroutine = NULL;
+ }
joinrel->fdw_private = NULL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 0faad55..0f713fc 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,24 @@ typedef void (*EndForeignModify_function) (EState *estate,
typedef int (*IsForeignRelUpdatable_function) (Relation rel);
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels);
+
+typedef ForeignScan *(*GetForeignJoinPlan_function) (PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan);
+
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
@@ -157,6 +175,8 @@ typedef struct FdwRoutine
extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
extern FdwRoutine *GetFdwRoutineByServer(Oid server_id);
extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
+extern FdwRoutine *GetFdwRoutineByServerId(Oid serverid);
+extern FdwRoutine *GetFdwRoutineByFdwId(Oid fdwid);
extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
extern Oid GetForeignServerForRelation(Relation relation);
extern bool IsImportableForeignTable(const char *tablename,
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index ac080d7..b9e120a 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -75,6 +75,7 @@ extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
bool missing_ok);
extern ForeignTable *GetForeignTable(Oid relid);
+extern Oid GetForeignTableServerOid(Oid relid);
extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bc71fea..160c0f6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -224,6 +224,7 @@ typedef enum NodeTag
T_NestPath,
T_MergePath,
T_HashPath,
+ T_ForeignJoinPath,
T_TidPath,
T_ForeignPath,
T_CustomPath,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f9db9ce..6a949fa 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -461,6 +461,7 @@ typedef struct RelOptInfo
PlannerInfo *subroot; /* if subquery */
List *subplan_params; /* if subquery */
/* use "struct FdwRoutine" to avoid including fdwapi.h here */
+ Oid fdwid; /* if foreign table */
struct FdwRoutine *fdwroutine; /* if foreign table */
Oid fdw_server; /* if foreign table */
void *fdw_private; /* if foreign table */
@@ -1046,7 +1047,10 @@ typedef struct JoinPath
* A nested-loop path needs no special fields.
*/
-typedef JoinPath NestPath;
+typedef struct NestPath
+{
+ JoinPath jpath;
+} NestPath;
/*
* A mergejoin path has these fields.
@@ -1102,6 +1106,22 @@ typedef struct HashPath
} HashPath;
/*
+ * ForeignJoinPath represents a join between two relations consist of foreign
+ * table.
+ *
+ * fdw_private stores FDW private data about the join. While fdw_private is
+ * not actually touched by the core code during normal operations, it's
+ * generally a good idea to use a representation that can be dumped by
+ * nodeToString(), so that you can examine the structure during debugging
+ * with tools like pprint().
+ */
+typedef struct ForeignJoinPath
+{
+ JoinPath jpath;
+ List *fdw_private;
+} ForeignJoinPath;
+
+/*
* Restriction clause info.
*
* We create one of these for each AND sub-clause of a restriction condition
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 26b17f5..7a1f236 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -124,6 +124,17 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
Relids required_outer,
List *hashclauses);
+extern ForeignJoinPath *create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer);
+
extern Path *reparameterize_path(PlannerInfo *root, Path *path,
Relids required_outer,
double loop_count);
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below. Note that the patch
requires Kaigai-san's custom foriegn join patch[1]
For the record, I'm not particularly on-board with custom scan, and
even less so with custom join. I don't want FDW features like this
depending on those kluges, especially not when you're still in need
of core-code changes (which really points up the inadequacy of those
concepts).
Also, please don't redefine struct NestPath like that. That adds a
whole bunch of notational churn (and backpatch risk) for no value
that I can see. It might've been better to do it like that in a
green field, but you're about twenty years too late to do it now.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:
I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.
I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs to
check that 1) all foreign tables in the join belong to a server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible to
join even join relations, so N-way join is listed as enhancement item
below.
It seems pretty important to me that we have a way to push the entire
join nest down. Being able to push down a 2-way join but not more
seems like quite a severe limitation.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hanada-san,
Thanks for proposing this great functionality people waited for.
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada <shigeru.hanada@gmail.com>
wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs to
check that 1) all foreign tables in the join belong to a server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible to
join even join relations, so N-way join is listed as enhancement item
below.It seems pretty important to me that we have a way to push the entire join
nest down. Being able to push down a 2-way join but not more seems like
quite a severe limitation.
As long as we don't care about simpleness/gracefulness of the remote
query, what we need to do is not complicated. All the optimization jobs
are responsibility of remote system.
Let me explain my thought:
We have three cases to be considered; (1) a join between foreign tables
that is the supported case, (2) a join either of relations is foreign
join, and (3) a join both of relations are foreign joins.
In case of (1), remote query shall have the following form:
SELECT <tlist> FROM FT1 JOIN FT2 ON <cond> WHERE <qual>
In case of (2) or (3), because we already construct inner/outer join,
it is not difficult to replace the FT1 or FT2 above by sub-query, like:
SELECT <tlist> FROM FT3 JOIN
(SELECT <tlist> FROM FT1 JOIN FT2 ON <cond> WHERE <qual>) as FJ1
ON <joincond> WHERE <qual>
How about your thought?
Let me comment on some other points at this moment:
* Enhancement in src/include/foreign/fdwapi.h
It seems to me GetForeignJoinPath_function and GetForeignJoinPlan_function
are not used anywhere. Is it an oversight to remove definitions from your
previous work, isn't it?
Now ForeignJoinPath is added on set_join_pathlist_hook, but not callback of
FdwRoutine.
* Is ForeignJoinPath really needed?
I guess the reason why you added ForeignJoinPath is, to have the fields
of inner_path/outer_path. If we want to have paths of underlying relations,
a straightforward way for the concept (join relations is replaced by
foreign-/custom-scan on a result set of remote join) is enhancement of the
fields in ForeignPath.
How about an idea that adds "List *fdw_subpaths" to save the list of
underlying Path nodes. It also allows to have more than two relations
to be joined.
(Probably, it should be a feature of interface portion. I may have to
enhance my portion.)
* Why NestPath is re-defined?
-typedef JoinPath NestPath;
+typedef struct NestPath
+{
+ JoinPath jpath;
+} NestPath;
It looks to me this change makes patch scale larger...
Best regards,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hanada-san,
One other question from my side:
How postgres_fdw tries to solve the varno/varattno mapping when it
replaces relations join by foreign-scan?
Let me explain the issue using an example. If SELECT has a target-
list that references 2nd-column of the inner relation and 2nd-column
of the outer relation, how varno/varattno of ForeignScan shall be
assigned on?
Unless FDW driver does not set fdw_ps_tlist, setrefs.c deals with
this ForeignScan as usual relation scan, then varno of Var will
have non-special varno (even though it may be shifted by rtoffset
in setrefs.c).
Then, ExecEvalScalarVar() is invoked on executor to evaluate the
value of fetched tuple. At that time, which slot and attribute will
be referenced? The varattno of Var-node is neutral on setrefs.c, so
both of the var-nodes that references 2nd-column of the inner relation
and 2nd-column of the outer relation will reference the 2nd-column
of the econtext->ecxt_scantuple, however, it is uncertain which
column of foreign-table is mapped to 2nd-column of the ecxt_scantuple.
So, it needs to inform the planner which underlying column is
mapped to the pseudo scan tlist.
Another expression of what I'm saying is:
SELECT
ft_1.second_col X, --> varno=1 / varattno=2
ft_2.second_col Y --> varno=2 / varattno=2
FROM
ft_1 NATURAL JOIN ft_2;
When FROM-clause is replaced by ForeignScan, the relevant varattno
also needs to be updated, according to the underlying remote query.
If postgres_fdw runs the following remote query, X should have varattno=1
and Y should have varattno=2 on the pseudo scan tlist.
remote: SELECT t_1.second_col, t_2.second_col
FROM t_1 NATURAL JOIN t_2;
You can inform the planner this mapping using fdw_ps_tlist field of
ForeignScan, if FDW driver put a list of TargetEntry.
In above example, fdw_ps_tlist will have two elements and both of then
has Var-node of the underlying foreign tables.
The patch to replace join by foreign-/custom-scan adds a functionality
to fix-up varno/varattno in these cases.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Tuesday, December 16, 2014 9:01 AM
To: Robert Haas; Shigeru Hanada
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHanada-san,
Thanks for proposing this great functionality people waited for.
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com>
wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs
to check that 1) all foreign tables in the join belong to a server,
and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible
to join even join relations, so N-way join is listed as enhancement
item below.It seems pretty important to me that we have a way to push the entire
join nest down. Being able to push down a 2-way join but not more
seems like quite a severe limitation.As long as we don't care about simpleness/gracefulness of the remote query,
what we need to do is not complicated. All the optimization jobs are
responsibility of remote system.Let me explain my thought:
We have three cases to be considered; (1) a join between foreign tables
that is the supported case, (2) a join either of relations is foreign join,
and (3) a join both of relations are foreign joins.In case of (1), remote query shall have the following form:
SELECT <tlist> FROM FT1 JOIN FT2 ON <cond> WHERE <qual>In case of (2) or (3), because we already construct inner/outer join, it
is not difficult to replace the FT1 or FT2 above by sub-query, like:
SELECT <tlist> FROM FT3 JOIN
(SELECT <tlist> FROM FT1 JOIN FT2 ON <cond> WHERE <qual>) as FJ1
ON <joincond> WHERE <qual>How about your thought?
Let me comment on some other points at this moment:
* Enhancement in src/include/foreign/fdwapi.h
It seems to me GetForeignJoinPath_function and
GetForeignJoinPlan_function are not used anywhere. Is it an oversight to
remove definitions from your previous work, isn't it?
Now ForeignJoinPath is added on set_join_pathlist_hook, but not callback
of FdwRoutine.* Is ForeignJoinPath really needed?
I guess the reason why you added ForeignJoinPath is, to have the fields
of inner_path/outer_path. If we want to have paths of underlying relations,
a straightforward way for the concept (join relations is replaced by
foreign-/custom-scan on a result set of remote join) is enhancement of the
fields in ForeignPath.
How about an idea that adds "List *fdw_subpaths" to save the list of
underlying Path nodes. It also allows to have more than two relations to
be joined.
(Probably, it should be a feature of interface portion. I may have to enhance
my portion.)* Why NestPath is re-defined?
-typedef JoinPath NestPath; +typedef struct NestPath +{ + JoinPath jpath; +} NestPath;It looks to me this change makes patch scale larger...
Best regards,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-12-16 0:45 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>:
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below. Note that the patch
requires Kaigai-san's custom foriegn join patch[1]For the record, I'm not particularly on-board with custom scan, and
even less so with custom join. I don't want FDW features like this
depending on those kluges, especially not when you're still in need
of core-code changes (which really points up the inadequacy of those
concepts).
This design derived from discussion about custom scan/join. In that
discussion Robert suggested common infrastructure for replacing Join
path with Scan node. Here I agree to user such common infrastructure.
One concern is introducing hook function I/F which seems to break
FdwRoutine I/F boundary...
Also, please don't redefine struct NestPath like that. That adds a
whole bunch of notational churn (and backpatch risk) for no value
that I can see. It might've been better to do it like that in a
green field, but you're about twenty years too late to do it now.
Ok, will revert it.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-12-16 1:22 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs to
check that 1) all foreign tables in the join belong to a server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible to
join even join relations, so N-way join is listed as enhancement item
below.It seems pretty important to me that we have a way to push the entire
join nest down. Being able to push down a 2-way join but not more
seems like quite a severe limitation.
Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL
generator seems to give us a hint for such case, I'll check it out
again.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 26, 2014 at 1:48 PM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:
Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL
generator seems to give us a hint for such case, I'll check it out
again.
Switching to returned with feedback, as this patch is waiting for
feedback for a couple of weeks now.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi
I've revised the patch based on Kaigai-san's custom/foreign join patch
posted in the thread below.
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F80108C355@BPXM15GP.gisp.nec.co.jp
Basically not changed from the version in the last CF, but as Robert
commented before, N-way (not only 2-way) joins should be supported in
the first version by construct SELECT SQL by containing source query
in FROM clause as inline views (a.k.a. from clause subquery).
2014-12-26 13:48 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2014-12-16 1:22 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs to
check that 1) all foreign tables in the join belong to a server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible to
join even join relations, so N-way join is listed as enhancement item
below.It seems pretty important to me that we have a way to push the entire
join nest down. Being able to push down a 2-way join but not more
seems like quite a severe limitation.Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL
generator seems to give us a hint for such case, I'll check it out
again.--
Shigeru HANADA
--
Shigeru HANADA
Attachments:
foreign_join.patchapplication/octet-stream; name=foreign_join.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..6795e5f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -86,7 +86,7 @@ typedef struct foreign_loc_cxt
typedef struct deparse_expr_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning for */
+ Relids rels; /* list of foreign tables to be deparsed */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
} deparse_expr_cxt;
@@ -108,6 +108,7 @@ static void deparseTargetList(StringInfo buf,
Index rtindex,
Relation rel,
Bitmapset *attrs_used,
+ const char *alias,
List **retrieved_attrs);
static void deparseReturningList(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
@@ -115,7 +116,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
static void deparseColumnRef(StringInfo buf, int varno, int varattno,
- PlannerInfo *root);
+ PlannerInfo *root, const char *alias);
static void deparseRelation(StringInfo buf, Relation rel);
static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
static void deparseVar(Var *node, deparse_expr_cxt *context);
@@ -679,33 +680,119 @@ is_builtin(Oid oid)
void
deparseSelectSql(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
- Bitmapset *attrs_used,
- List **retrieved_attrs)
+ List *rels)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
- Relation rel;
+ StringInfoData frombuf;
+ ListCell *lc;
+ bool first_rel = true;
+ Relids relids = NULL;
- /*
- * Core code already has some lock on each rel being planned, so we can
- * use NoLock here.
- */
- rel = heap_open(rte->relid, NoLock);
+ initStringInfo(&frombuf);
- /*
- * Construct SELECT list
- */
- appendStringInfoString(buf, "SELECT ");
- deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
- retrieved_attrs);
+ /* Construct list of relid for deparsing query contains multiple tables. */
+ foreach(lc, rels)
+ {
+ PgFdwDeparseRel *dr = (PgFdwDeparseRel *) lfirst(lc);
+ relids = bms_add_member(relids, dr->baserel->relid);
+ }
- /*
- * Construct FROM clause
- */
- appendStringInfoString(buf, " FROM ");
- deparseRelation(buf, rel);
+ /* Loop through relation list and deparse SELECT query. */
+ foreach(lc, rels)
+ {
+ PgFdwDeparseRel *dr = (PgFdwDeparseRel *) lfirst(lc);
+ RangeTblEntry *rte = planner_rt_fetch(dr->baserel->relid, root);
+ Relation rel;
+ const char *alias;
- heap_close(rel, NoLock);
+ /*
+ * Core code already has some lock on each rel being planned, so we can
+ * use NoLock here.
+ */
+ rel = heap_open(rte->relid, NoLock);
+
+ /*
+ * Add alias only when we have multiple relations.
+ */
+ if (list_length(rels) > 1 && rte->alias)
+ alias = rte->alias->aliasname;
+ else
+ alias = NULL;
+
+ /*
+ * Construct SELECT list
+ */
+ if (first_rel)
+ appendStringInfoString(buf, "SELECT ");
+ else
+ appendStringInfoString(buf, ", ");
+ deparseTargetList(buf, root, dr->baserel->relid, rel, dr->attrs_used,
+ alias, dr->retrieved_attrs);
+
+ /*
+ * Construct FROM clause
+ */
+ if (first_rel)
+ appendStringInfoString(&frombuf, " FROM ");
+ else
+ {
+ switch (dr->jointype)
+ {
+ case JOIN_INNER:
+ if (dr->joinclauses)
+ appendStringInfoString(&frombuf, " INNER JOIN ");
+ else
+ /* Currently cross join is not pushed down, though. */
+ appendStringInfoString(&frombuf, " CROSS JOIN ");
+ break;
+ case JOIN_LEFT:
+ appendStringInfoString(&frombuf, " LEFT JOIN ");
+ break;
+ case JOIN_FULL:
+ appendStringInfoString(&frombuf, " FULL JOIN ");
+ break;
+ case JOIN_RIGHT:
+ appendStringInfoString(&frombuf, " RIGHT JOIN ");
+ break;
+ default:
+ elog(ERROR, "unsupported join type for deparse: %d",
+ dr->jointype);
+ break;
+ }
+ }
+ deparseRelation(&frombuf, rel);
+ if (alias)
+ appendStringInfo(&frombuf, " %s", alias);
+
+ if (!first_rel && dr->joinclauses)
+ {
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfoString(&frombuf, " ON ");
+
+ foreach(lc, dr->joinclauses)
+ {
+ deparse_expr_cxt context;
+ Expr *expr = (Expr *) lfirst(lc);
+
+ context.root = root;
+ context.rels = relids;
+ context.buf = &frombuf;
+ context.params_list = NULL;
+
+ if (!first)
+ appendStringInfoString(&frombuf, " AND ");
+ deparseExpr(expr, &context);
+ first = false;
+ }
+ }
+
+ heap_close(rel, NoLock);
+ first_rel = false;
+ }
+
+ appendStringInfoString(buf, frombuf.data);
+ pfree(frombuf.data);
}
/*
@@ -721,6 +808,7 @@ deparseTargetList(StringInfo buf,
Index rtindex,
Relation rel,
Bitmapset *attrs_used,
+ const char *alias,
List **retrieved_attrs)
{
TupleDesc tupdesc = RelationGetDescr(rel);
@@ -751,7 +839,7 @@ deparseTargetList(StringInfo buf,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, i, root);
+ deparseColumnRef(buf, rtindex, i, root, alias);
*retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
@@ -768,6 +856,8 @@ deparseTargetList(StringInfo buf,
appendStringInfoString(buf, ", ");
first = false;
+ if (alias)
+ appendStringInfo(buf, "%s.", alias);
appendStringInfoString(buf, "ctid");
*retrieved_attrs = lappend_int(*retrieved_attrs,
@@ -796,7 +886,7 @@ deparseTargetList(StringInfo buf,
void
appendWhereClause(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ Relids relids,
List *exprs,
bool is_first,
List **params)
@@ -810,7 +900,7 @@ appendWhereClause(StringInfo buf,
/* Set up context struct for recursion */
context.root = root;
- context.foreignrel = baserel;
+ context.rels = relids;
context.buf = buf;
context.params_list = params;
@@ -870,7 +960,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, attnum, root);
+ deparseColumnRef(buf, rtindex, attnum, root, NULL);
}
appendStringInfoString(buf, ") VALUES (");
@@ -928,7 +1018,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, attnum, root);
+ deparseColumnRef(buf, rtindex, attnum, root, NULL);
appendStringInfo(buf, " = $%d", pindex);
pindex++;
}
@@ -993,7 +1083,7 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
if (attrs_used != NULL)
{
appendStringInfoString(buf, " RETURNING ");
- deparseTargetList(buf, root, rtindex, rel, attrs_used,
+ deparseTargetList(buf, root, rtindex, rel, attrs_used, NULL,
retrieved_attrs);
}
else
@@ -1088,7 +1178,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
* If it has a column_name FDW option, use that instead of attribute name.
*/
static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root,
+ const char *alias)
{
RangeTblEntry *rte;
char *colname = NULL;
@@ -1124,6 +1215,8 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
if (colname == NULL)
colname = get_relid_attribute_name(rte->relid, varattno);
+ if (alias)
+ appendStringInfo(buf, "%s.", alias);
appendStringInfoString(buf, quote_identifier(colname));
}
@@ -1270,12 +1363,46 @@ static void
deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
+ int i;
+ RelOptInfo *rel = NULL;
+ RangeTblEntry *rte = NULL;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ /* Find RangeTblEntry contains given Var to determine alias name. */
+ if (bms_is_member(node->varno, context->rels) && node->varlevelsup == 0)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ for (i = 1; i < context->root->simple_rel_array_size; i++)
+ {
+ /* Skip empty slot */
+ if (context->root->simple_rel_array[i] == NULL)
+ continue;
+
+ if (context->root->simple_rel_array[i]->relid == node->varno)
+ {
+ rel = context->root->simple_rel_array[i];
+ rte = context->root->simple_rte_array[i];
+ break;
+ }
+ }
+ }
+
+ /*
+ * If the Var is in current level (not in outer subquery), simply deparse
+ * it.
+ */
+ if (rel)
+ {
+ const char *alias;
+
+ /*
+ * Deparse Var belongs to foreign tables in context->rels, with alias
+ * name if we are deparsing multiple foreign tables.
+ */
+ if (bms_num_members(context->rels) > 1 && rte->alias)
+ alias = rte->alias->aliasname;
+ else
+ alias = NULL;
+ deparseColumnRef(buf, node->varno, node->varattno, context->root,
+ alias);
}
else
{
@@ -1849,3 +1976,4 @@ printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
appendStringInfo(buf, "((SELECT null::%s)::%s)", ptypename, ptypename);
}
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..0a645b6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -288,6 +289,22 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
+static ForeignScan *postgresGetForeignJoinPlan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan);
/*
* Helper functions
@@ -368,6 +385,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPath = postgresGetForeignJoinPath;
+ routine->GetForeignJoinPlan = postgresGetForeignJoinPlan;
+
PG_RETURN_POINTER(routine);
}
@@ -752,6 +773,7 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ PgFdwDeparseRel dr;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -797,11 +819,15 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
+ dr.baserel = baserel;
+ dr.jointype = JOIN_INNER;
+ dr.joinclauses = NIL;
+ dr.attrs_used = fpinfo->attrs_used;
+ dr.retrieved_attrs = &retrieved_attrs;
+ deparseSelectSql(&sql, root, list_make1(&dr));
if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
+ appendWhereClause(&sql, root, bms_add_member(NULL, baserel->relid),
+ remote_conds, true, ¶ms_list);
/*
* Add FOR UPDATE/SHARE if appropriate. We apply locking during the
@@ -906,13 +932,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
* Identify which user to do the remote access as. This should match what
* ExecCheckRTEPerms() does.
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ server = GetForeignServer(table->serverid);
+ }
+ else
+ {
+ /* XXX how can we determine userid to use for join cases? */
+ userid = GetCurrentRoleId();
+ server = GetForeignServer(16409);
+ }
/* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
/*
@@ -944,7 +980,16 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->attinmeta =
+ TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ else
+ {
+ TupleDesc ps_tupdesc;
+
+ ps_tupdesc = ExecTypeFromTL(fsplan->fdw_ps_tlist, false);
+ fsstate->attinmeta = TupleDescGetAttInMetadata(ps_tupdesc);
+ }
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1725,10 +1770,12 @@ estimate_path_cost_size(PlannerInfo *root,
List *remote_join_conds;
List *local_join_conds;
StringInfoData sql;
+ Relids relids;
List *retrieved_attrs;
PGconn *conn;
Selectivity local_sel;
QualCost local_cost;
+ PgFdwDeparseRel dr;
/*
* join_conds might contain both clauses that are safe to send across,
@@ -1743,14 +1790,19 @@ estimate_path_cost_size(PlannerInfo *root,
* dummy values.
*/
initStringInfo(&sql);
+ dr.baserel = baserel;
+ dr.jointype = JOIN_INNER;
+ dr.joinclauses = NIL;
+ dr.attrs_used = fpinfo->attrs_used;
+ dr.retrieved_attrs = &retrieved_attrs;
appendStringInfoString(&sql, "EXPLAIN ");
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
+ deparseSelectSql(&sql, root, list_make1(&dr));
+ relids = bms_add_member(NULL, baserel->relid);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
+ appendWhereClause(&sql, root, relids, fpinfo->remote_conds,
true, NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
+ appendWhereClause(&sql, root, relids, remote_join_conds,
(fpinfo->remote_conds == NIL), NULL);
/* Get the remote estimate */
@@ -2835,6 +2887,233 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o, PgFdwRelationInfo *fpinfo_i)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds =
+ list_concat(fpinfo_o->remote_conds, fpinfo_i->remote_conds);
+ fpinfo->local_conds =
+ list_concat(fpinfo_o->local_conds, fpinfo_i->local_conds);
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup =
+ fpinfo_o->local_conds_cost.startup + fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple =
+ fpinfo_o->local_conds_cost.per_tuple + fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel =
+ fpinfo_o->local_conds_sel * fpinfo_i->local_conds_sel;
+ /* XXX we should use join selectivity and join type */
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPath
+ * Add possible ForeignJoinPath to joinrel.
+ *
+ */
+static void
+postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignJoinPath *joinpath;
+ Path *path_o = outerrel->cheapest_total_path;
+ Path *path_i = innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ Relids required_outer;
+
+ /* Skip considering reversed join combination */
+ elog(DEBUG1, "%s() outer: %d, inner: %d",
+ __func__, outerrel->relid, innerrel->relid);
+ if (outerrel->relid < innerrel->relid)
+ return;
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ return;
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && !restrictlist)
+ return;
+
+ /*
+ * Both relations in the join must belong to same server, and have same
+ * checkAsUser to use one connection to execute SQL for the join.
+ */
+ if (IsA(path_o, ForeignPath))
+ fpinfo_o = ((ForeignPath *) path_o)->path.parent->fdw_private;
+ else if (IsA(path_o, ForeignJoinPath))
+ fpinfo_o = ((ForeignJoinPath *) path_o)->jpath.path.parent->fdw_private;
+ else
+ fpinfo_o = NULL;
+ Assert(fpinfo_o);
+ if (IsA(path_i, ForeignPath))
+ fpinfo_i = ((ForeignPath *) path_i)->path.parent->fdw_private;
+ else if (IsA(path_i, ForeignJoinPath))
+ fpinfo_i = ((ForeignJoinPath *) path_i)->jpath.path.parent->fdw_private;
+ else
+ fpinfo_i = NULL;
+ Assert(fpinfo_i);
+
+ /* Servers should match */
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ return;
+
+ /* Construct fpinfo for the join relation */
+ joinrel->fdw_private = merge_fpinfo(fpinfo_o, fpinfo_i);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ required_outer = calc_non_nestloop_required_outer(path_o, path_i);
+ joinpath = create_foreignjoin_path(root,
+ joinrel,
+ jointype,
+ sjinfo,
+ semifactors,
+ path_o,
+ path_i,
+ restrictlist,
+ NIL,
+ required_outer);
+
+ /* TODO determine cost and rows of the join. */
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
+ * postgresGetForeignJoinPlan
+ * Create ForeignJoin plan node from given ForeignJoinPath.
+ *
+ */
+static ForeignScan *
+postgresGetForeignJoinPlan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan)
+{
+ ForeignScan *join_plan;
+ List *params_list = NIL;
+ List *fdw_private = NIL;
+ List *retrieved_attrs = NIL;
+ Relids relids;
+ StringInfoData sql;
+ ForeignPath *path_o;
+ ForeignPath *path_i;
+ List *retrieved_attrs_o = NIL;
+ List *retrieved_attrs_i = NIL;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwDeparseRel dr_o;
+ PgFdwDeparseRel dr_i;
+
+ /*
+ * At the moment we support only joins between foreign tables. This
+ * limitation will be relaxed in future releases.
+ */
+ Assert(IsA(outer_plan, ForeignScan));
+ Assert(IsA(inner_plan, ForeignScan));
+
+ /*
+ * Retrieve Path and PgFdwRelationInfo of underlying ForeignScan to reuse
+ * various information cumputed in ForeignScan planning.
+ */
+ path_o = (ForeignPath *) best_path->jpath.outerjoinpath;
+ fpinfo_o = path_o->path.parent->fdw_private;
+ path_i = (ForeignPath *) best_path->jpath.innerjoinpath;
+ fpinfo_i = path_i->path.parent->fdw_private;
+
+ /*
+ * Construcr deparse information for two relations.
+ */
+ dr_o.baserel = path_o->path.parent;
+ dr_o.jointype = JOIN_INNER;
+ dr_o.joinclauses = NIL;
+ dr_o.attrs_used = fpinfo_o->attrs_used;
+ dr_o.retrieved_attrs = &retrieved_attrs_o;
+ dr_i.baserel = path_i->path.parent;
+ dr_i.jointype = best_path->jpath.jointype;
+ dr_i.joinclauses = joinclauses;
+ dr_i.attrs_used = fpinfo_i->attrs_used;
+ dr_i.retrieved_attrs = &retrieved_attrs_i;
+
+ relids = NULL;
+ relids = bms_add_member(relids, dr_o.baserel->relid);
+ relids = bms_add_member(relids, dr_i.baserel->relid);
+
+ initStringInfo(&sql);
+ deparseSelectSql(&sql, root, list_make2(&dr_o, &dr_i));
+ if (fpinfo_o->remote_conds)
+ appendWhereClause(&sql, root, relids, fpinfo_o->remote_conds, true,
+ ¶ms_list);
+ if (fpinfo_i->remote_conds)
+ appendWhereClause(&sql, root, relids, fpinfo_i->remote_conds,
+ (fpinfo_o->remote_conds == NULL), ¶ms_list);
+
+ /*
+ * Different from ForeignScan, we store retrieved_attrs as a list of lists.
+ * This allows subsequent processing to distinguish which relation is the
+ * source.
+ */
+ retrieved_attrs = list_make2(list_copy(*dr_o.retrieved_attrs),
+ list_copy(*dr_i.retrieved_attrs));
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+ elog(DEBUG1, "sql: %s", sql.data);
+ elog(DEBUG1, "retrieved_attrs: %s", nodeToString(retrieved_attrs));
+
+ join_plan = make_foreignscan(tlist,
+ NIL,
+ 0,
+ params_list,
+ fdw_private);
+ return join_plan;
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..c71bf21 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -39,6 +39,13 @@ extern int ExtractConnectionOptions(List *defelems,
const char **values);
/* in deparse.c */
+typedef struct PgFdwDeparseRel {
+ RelOptInfo *baserel;
+ JoinType jointype;
+ List *joinclauses;
+ Bitmapset *attrs_used;
+ List **retrieved_attrs;
+} PgFdwDeparseRel;
extern void classifyConditions(PlannerInfo *root,
RelOptInfo *baserel,
List *input_conds,
@@ -49,12 +56,10 @@ extern bool is_foreign_expr(PlannerInfo *root,
Expr *expr);
extern void deparseSelectSql(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
- Bitmapset *attrs_used,
- List **retrieved_attrs);
+ List *rels);
extern void appendWhereClause(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ Relids relids,
List *exprs,
bool is_first,
List **params);
diff --git a/doc/src/sgml/custom-scan.sgml b/doc/src/sgml/custom-scan.sgml
new file mode 100644
index 0000000..1d103f5
--- /dev/null
+++ b/doc/src/sgml/custom-scan.sgml
@@ -0,0 +1,278 @@
+<!-- doc/src/sgml/custom-scan.sgml -->
+
+<chapter id="custom-scan">
+ <title>Writing A Custom Scan Provider</title>
+
+ <indexterm zone="custom-scan">
+ <primary>custom scan provider</primary>
+ <secondary>handler for</secondary>
+ </indexterm>
+
+ <para>
+ Prior to query execution, the PostgreSQL planner constructs a plan tree
+ that usually consists of built-in plan nodes (eg: SeqScan, HashJoin, etc).
+ The custom-scan interface allows extensions to provide a custom-scan plan
+ that implements its own logic, in addition to the built-in nodes, to scan
+ a relation or join relations. Once a custom-scan node is chosen by planner,
+ callback functions associated with this custom-scan node shall be invoked
+ during query execution. Custom-scan provider is responsible for returning
+ equivalent result set as built-in logic would, but it is free to scan or
+ join the target relations according to its own logic.
+ This chapter explains how to write a custom-scan provider.
+ </para>
+
+ <para>
+ The first thing custom-scan provider to do is adding alternative paths
+ to scan a relation (on the <literal>set_rel_pathlist_hook</>) or
+ to join relations (on the <literal>set_join_pathlist_hook</>).
+ It expects <literal>CustomPath</> node is added with estimated execution
+ cost and a set of callbacks defined at <literal>CustomPathMethods</>.
+ Both of hooks also give extensions enough information to construct
+ <literal>CustomPath</> node, like <literal>RelOptInfo</> of relations
+ to be scanned, joined or read as source of join. Custom-scan provider
+ is responsible to compute a reasonable cost estimation which is
+ comparable to built-in logics.
+ </para>
+
+ <para>
+ Once a custom-path got chosen by planner, custom-scan provider has to
+ populate a plan node according to the <literal>CustomPath</> node.
+ At this moment, <literal>CustomScan</> is the only node type that allows
+ to implement custom-logic towards any <literal>CustomPath</> node.
+ The <literal>CustomScan</> structure has two special fields to keep
+ private information; <literal>custom_exprs</> and <literal>custom_private</>.
+ The <literal>custom_exprs</> intends to save a couple of expression trees
+ that shall be updated on <filename>setrefs.c</> and <filename>subselect.c</>.
+ On the other hands, <literal>custom_private</> is expected to save really
+ private information nobody will touch except for the custom-scan provider
+ itself. A plan-tree, which contains custom-scan node, can be duplicated
+ using <literal>copyObject()</>, so all the data structure stored within
+ these two fields must be safe to <literal>copyObject()</>.
+ </para>
+
+ <para>
+ In case when extension implements its own logic to join relations, it looks
+ like a simple relation scan but on a pseudo materialized relation from
+ multiple source relations, from the standpoint of the core executor.
+ Custom-scan provider is expected to process relation join with its own
+ logic internally, then return a set of records according to the tuple
+ descriptor of the scan node.
+ <literal>CustomScan</> node that replaced a relations join is not
+ associated with a particular tangible relation, unlike simple scan case,
+ so extension needs to inform the core planner expected records type to be
+ fetched from this node.
+ What we should do here is, setting zero on the <literal>scanrelid</> and
+ a valid list of <literal>TargetEntry</> on the <literal>custom_ps_tlist</>
+ instead. These configuration informs the core planner this custom-scan
+ node is not associated with a particular physical table and expected
+ record type to be returned.
+ </para>
+
+ <para>
+ Once a plan-tree is moved to the executor, it has to construct plan-state
+ objects according to the supplied plan-node.
+ Custom-scan is not an exception. Executor invokes a callback to populate
+ <literal>CustomScanState</> node, if <literal>CustomScan</> node gets
+ found in the supplied plan-tree.
+ It does not have fields to save private information unlike
+ <literal>CustomScan</> node, because custom-scan provider can allocate
+ larger object than the bare <literal>CustomScanState</> to store various
+ private execution state.
+ It looks like a relationship of <literal>ScanState</> structure towards
+ <literal>PlanState</>; that expands scan specific fields towards generic
+ plan-state. In addition, custom-scan provider can expand fields on demand.
+ Once a CustomScanState gets constructed, BeginCustomScan is invoked during
+ executor initialization; ExecCustomScan is repeatedly called during
+ execution (returning a TupleTableSlot with each fetched record), then
+ EndCustomScan is invoked on cleanup of the executor.
+ </para>
+
+ <sect1 id="custom-scan-reference">
+ <title>Custom Scan Hooks and Callbacks</title>
+
+ <sect2 id="custom-scan-hooks">
+ <title>Custom Scan Hooks</title>
+ <para>
+ This hooks is invoked when the planner investigates the optimal way to
+ scan a particular relation. Extension can add alternative paths if it
+ can provide its own logic to scan towards the given scan and qualifiers.
+<programlisting>
+typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
+ RelOptInfo *rel,
+ Index rti,
+ RangeTblEntry *rte);
+extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+</programlisting>
+ </para>
+
+ <para>
+ This hook is invoked when the planner investigates the optimal combination
+ of relations join. Extension can add alternative paths that replaces the
+ relation join with its own logic.
+<programlisting>
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ List *restrictlist,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Relids param_source_rels,
+ Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+</programlisting>
+ </para>
+ </sect2>
+
+ <sect2 id="custom-path-callbacks">
+ <title>Custom Path Callbacks</title>
+ <para>
+ A <literal>CustomPathMethods</> table contains a set of callbacks related
+ to <literal>CustomPath</> node. The core backend invokes these callbacks
+ during query planning.
+ </para>
+ <para>
+ This callback is invoked when the core backend tries to populate
+ <literal>CustomScan</> node according to the supplied
+ <literal>CustomPath</> node.
+ Custom-scan provider is responsible to allocate a <literal>CustomScan</>
+ node and initialize each fields of them.
+<programlisting>
+Plan *(*PlanCustomPath) (PlannerInfo *root,
+ RelOptInfo *rel,
+ CustomPath *best_path,
+ List *tlist,
+ List *clauses);
+</programlisting>
+ </para>
+ <para>
+ This optional callback will be invoked when <literal>nodeToString()</>
+ tries to create a text representation of <literal>CustomPath</> node.
+ A custom-scan provider can utilize this callback, if it wants to output
+ something additional. Note that expression nodes linked to
+ <literal>custom_private</> shall be transformed to text representation
+ by the core, so nothing to do by extension.
+<programlisting>
+void (*TextOutCustomPath) (StringInfo str,
+ const CustomPath *node);
+</programlisting>
+ </para>
+ </sect2>
+
+ <sect2 id="custom-scan-callbacks">
+ <title>Custom Scan Callbacks</title>
+ <para>
+ A <literal>CustomScanMethods</> contains a set of callbacks related to
+ <literal>CustomScan</> node, then the core backend invokes these callbacks
+ during query planning and initialization of executor.
+ </para>
+ <para>
+ This callback shall be invoked when the core backend tries to populate
+ <literal>CustomScanState</> node according to the supplied
+ <literal>CustomScan</> node. The custom-scan provider is responsible to
+ allocate a <literal>CustomScanState</> (or its own data-type enhanced
+ from it), but no need to initialize the fields here, because
+ <literal>ExecInitCustomScan</> initializes the fields in
+ <literal>CustomScanState</>, then <literal>BeginCustomScan</> shall be
+ kicked on the end of executor initialization.
+<programlisting>
+Node *(*CreateCustomScanState) (CustomScan *cscan);
+</programlisting>
+ </para>
+ <para>
+ This optional callback shall be invoked when <literal>nodeToString()</>
+ tries to make text representation of <literal>CustomScan</> node.
+ Custom-scan provider can utilize this callback, if it wants to output
+ something additional. Note that it is not allowed to expand the data
+ structure of <literal>CustomScan</> node, so we usually don't need to
+ implement this callback.
+<programlisting>
+void (*TextOutCustomScan) (StringInfo str,
+ const CustomScan *node);
+</programlisting>
+ </para>
+ </sect2>
+
+ <sect2 id="custom-exec-callbacks">
+ <title>Custom Exec Callbacks</title>
+ <para>
+ A <literal>CustomExecMethods</> contains a set of callbacks related to
+ <literal>CustomScanState</> node, then the core backend invokes these
+ callbacks during query execution.
+ </para>
+ <para>
+ This callback allows a custom-scan provider to have final initialization
+ of the <literal>CustomScanState</> node.
+ The supplied <literal>CustomScanState</> node is partially initialized
+ according to either <literal>scanrelid</> or <literal>custom_ps_tlist</>
+ of <literal>CustomScan</> node. If the custom-scan provider wants to
+ apply additional initialization to the private fields, it can be done
+ by this callback.
+<programlisting>
+void (*BeginCustomScan) (CustomScanState *node,
+ EState *estate,
+ int eflags);
+</programlisting>
+ </para>
+ <para>
+ This callback requires custom-scan provider to produce the next tuple
+ of the relation scan. If any tuples, it should set it on the
+ <literal>ps_ResultTupleSlot</> then returns the tuple slot. Elsewhere,
+ <literal>NULL</> or empty slot shall be returned to inform end of the
+ relation scan.
+<programlisting>
+TupleTableSlot *(*ExecCustomScan) (CustomScanState *node);
+</programlisting>
+ </para>
+ <para>
+ This callback allows a custom-scan provider to cleanup the
+ <literal>CustomScanState</> node. If it holds any private (and not
+ released automatically) resources on the supplied node, it can release
+ these resources prior to the cleanup of the common portion.
+<programlisting>
+void (*EndCustomScan) (CustomScanState *node);
+</programlisting>
+ </para>
+ <para>
+ This callback requires custom-scan provider to rewind the current scan
+ position to the head of relation. Custom-scan provider is expected to
+ reset its internal state to restart the relation scan again.
+<programlisting>
+void (*ReScanCustomScan) (CustomScanState *node);
+</programlisting>
+ </para>
+ <para>
+ This optional callback requires custom-scan provider to save the current
+ scan position on its internal state. It shall be able to restore the
+ position using <literal>RestrPosCustomScan</> callback. It shall be never
+ called unless <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*MarkPosCustomScan) (CustomScanState *node);
+</programlisting>
+ </para>
+ <para>
+ This optional callback requires custom-scan provider to restore the
+ previous scan position that was saved by <literal>MarkPosCustomScan</>
+ callback. It shall be never called unless
+ <literal>CUSTOMPATH_SUPPORT_MARK_RESTORE</> flag is set.
+<programlisting>
+void (*RestrPosCustomScan) (CustomScanState *node);
+</programlisting>
+ </para>
+ <para>
+ This optional callback allows custom-scan provider to output additional
+ information on <command>EXPLAIN</> that involves custom-scan node.
+ Note that it can output common items; target-list, qualifiers, relation
+ to be scanned. So, it can be used when custom-scan provider wants to show
+ something others in addition to the items.
+<programlisting>
+void (*ExplainCustomScan) (CustomScanState *node,
+ List *ancestors,
+ ExplainState *es);
+</programlisting>
+ </para>
+ </sect2>
+ </sect1>
+</chapter>
+
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index f03b72a..89fff77 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -93,6 +93,7 @@
<!ENTITY nls SYSTEM "nls.sgml">
<!ENTITY plhandler SYSTEM "plhandler.sgml">
<!ENTITY fdwhandler SYSTEM "fdwhandler.sgml">
+<!ENTITY custom-scan SYSTEM "custom-scan.sgml">
<!ENTITY logicaldecoding SYSTEM "logicaldecoding.sgml">
<!ENTITY protocol SYSTEM "protocol.sgml">
<!ENTITY sources SYSTEM "sources.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index a648a4c..e378d69 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -242,6 +242,7 @@
&nls;
&plhandler;
&fdwhandler;
+ &custom-scan;
&geqo;
&indexam;
&gist;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7cfc9bb..0b8de3f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1073,9 +1073,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
case T_ValuesScan:
case T_CteScan:
case T_WorkTableScan:
+ ExplainScanTarget((Scan *) plan, es);
+ break;
case T_ForeignScan:
case T_CustomScan:
- ExplainScanTarget((Scan *) plan, es);
+ if (((Scan *) plan)->scanrelid > 0)
+ ExplainScanTarget((Scan *) plan, es);
break;
case T_IndexScan:
{
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3f0d809..2f18a8a 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -251,6 +251,10 @@ ExecAssignScanProjectionInfo(ScanState *node)
/* Vars in an index-only scan's tlist should be INDEX_VAR */
if (IsA(scan, IndexOnlyScan))
varno = INDEX_VAR;
+ /* Also foreign-/custom-scan on pseudo relation should be INDEX_VAR */
+ else if (scan->scanrelid == 0 &&
+ (IsA(scan, ForeignScan) || IsA(scan, CustomScan)))
+ varno = INDEX_VAR;
else
varno = scan->scanrelid;
diff --git a/src/backend/executor/nodeCustom.c b/src/backend/executor/nodeCustom.c
index b07932b..ca51333 100644
--- a/src/backend/executor/nodeCustom.c
+++ b/src/backend/executor/nodeCustom.c
@@ -23,6 +23,7 @@ CustomScanState *
ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
{
CustomScanState *css;
+ Index scan_relid = cscan->scan.scanrelid;
Relation scan_rel;
/* populate a CustomScanState according to the CustomScan */
@@ -48,12 +49,31 @@ ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags)
ExecInitScanTupleSlot(estate, &css->ss);
ExecInitResultTupleSlot(estate, &css->ss.ps);
- /* initialize scan relation */
- scan_rel = ExecOpenScanRelation(estate, cscan->scan.scanrelid, eflags);
- css->ss.ss_currentRelation = scan_rel;
- css->ss.ss_currentScanDesc = NULL; /* set by provider */
- ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
-
+ /*
+ * open the base relation and acquire appropriate lock on it, then
+ * get the scan type from the relation descriptor, if this custom
+ * scan is on actual relations.
+ *
+ * on the other hands, custom-scan may scan on a pseudo relation;
+ * that is usually a result-set of relations join by external
+ * computing resource, or others. It has to get the scan type from
+ * the pseudo-scan target-list that should be assigned by custom-scan
+ * provider.
+ */
+ if (scan_relid > 0)
+ {
+ scan_rel = ExecOpenScanRelation(estate, scan_relid, eflags);
+ css->ss.ss_currentRelation = scan_rel;
+ css->ss.ss_currentScanDesc = NULL; /* set by provider */
+ ExecAssignScanType(&css->ss, RelationGetDescr(scan_rel));
+ }
+ else
+ {
+ TupleDesc ps_tupdesc;
+
+ ps_tupdesc = ExecTypeFromTL(cscan->custom_ps_tlist, false);
+ ExecAssignScanType(&css->ss, ps_tupdesc);
+ }
css->ss.ps.ps_TupFromTlist = false;
/*
@@ -89,11 +109,11 @@ ExecEndCustomScan(CustomScanState *node)
/* Clean out the tuple table */
ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
- if (node->ss.ss_ScanTupleSlot)
- ExecClearTuple(node->ss.ss_ScanTupleSlot);
+ ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* Close the heap relation */
- ExecCloseScanRelation(node->ss.ss_currentRelation);
+ if (node->ss.ss_currentRelation)
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
}
void
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 7399053..f25eb6f 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -102,6 +102,7 @@ ForeignScanState *
ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
{
ForeignScanState *scanstate;
+ Index scanrelid = node->scan.scanrelid;
Relation currentRelation;
FdwRoutine *fdwroutine;
@@ -141,16 +142,28 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
ExecInitScanTupleSlot(estate, &scanstate->ss);
/*
- * open the base relation and acquire appropriate lock on it.
+ * open the base relation and acquire appropriate lock on it, then
+ * get the scan type from the relation descriptor, if this foreign
+ * scan is on actual foreign-table.
+ *
+ * on the other hands, foreign-scan may scan on a pseudo relation;
+ * that is usually a result-set of remote relations join. It has
+ * to get the scan type from the pseudo-scan target-list that should
+ * be assigned by FDW driver.
*/
- currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
- scanstate->ss.ss_currentRelation = currentRelation;
+ if (scanrelid > 0)
+ {
+ currentRelation = ExecOpenScanRelation(estate, scanrelid, eflags);
+ scanstate->ss.ss_currentRelation = currentRelation;
+ ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+ }
+ else
+ {
+ TupleDesc ps_tupdesc;
- /*
- * get the scan type from the relation descriptor. (XXX at some point we
- * might want to let the FDW editorialize on the scan tupdesc.)
- */
- ExecAssignScanType(&scanstate->ss, RelationGetDescr(currentRelation));
+ ps_tupdesc = ExecTypeFromTL(node->fdw_ps_tlist, false);
+ ExecAssignScanType(&scanstate->ss, ps_tupdesc);
+ }
/*
* Initialize result tuple type and projection info.
@@ -161,7 +174,7 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
/*
* Acquire function pointers from the FDW's handler, and init fdw_state.
*/
- fdwroutine = GetFdwRoutineForRelation(currentRelation, true);
+ fdwroutine = GetFdwRoutine(node->fdw_handler);
scanstate->fdwroutine = fdwroutine;
scanstate->fdw_state = NULL;
@@ -193,7 +206,8 @@ ExecEndForeignScan(ForeignScanState *node)
ExecClearTuple(node->ss.ss_ScanTupleSlot);
/* close the relation. */
- ExecCloseScanRelation(node->ss.ss_currentRelation);
+ if (node->ss.ss_currentRelation)
+ ExecCloseScanRelation(node->ss.ss_currentRelation);
}
/* ----------------------------------------------------------------
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index cbe8b78..d77eeea 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -250,6 +250,29 @@ GetForeignTable(Oid relid)
/*
+ * GetForeignTableServerOid - Get OID of the server related to the given
+ * foreign table.
+ */
+Oid
+GetForeignTableServerOid(Oid relid)
+{
+ Form_pg_foreign_table tableform;
+ HeapTuple tp;
+ Oid serverid;
+
+ tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for foreign table %u", relid);
+ tableform = (Form_pg_foreign_table) GETSTRUCT(tp);
+ serverid = tableform->ftserver;
+
+ ReleaseSysCache(tp);
+
+ return serverid;
+}
+
+
+/*
* GetForeignColumnOptions - Get attfdwoptions of given relation/attnum
* as list of DefElem.
*/
@@ -302,21 +325,16 @@ GetFdwRoutine(Oid fdwhandler)
return routine;
}
-
/*
- * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
- * for the given foreign table, and retrieve its FdwRoutine struct.
+ * GetFdwHandlerByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table
*/
-FdwRoutine *
-GetFdwRoutineByRelId(Oid relid)
+static Oid
+GetFdwHandlerByRelId(Oid relid)
{
HeapTuple tp;
- Form_pg_foreign_data_wrapper fdwform;
- Form_pg_foreign_server serverform;
Form_pg_foreign_table tableform;
Oid serverid;
- Oid fdwid;
- Oid fdwhandler;
/* Get server OID for the foreign table. */
tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
@@ -326,6 +344,16 @@ GetFdwRoutineByRelId(Oid relid)
serverid = tableform->ftserver;
ReleaseSysCache(tp);
+ return GetFdwRoutineByServerId(serverid);
+}
+
+FdwRoutine *
+GetFdwRoutineByServerId(Oid serverid)
+{
+ HeapTuple tp;
+ Form_pg_foreign_server serverform;
+ Oid fdwid;
+
/* Get foreign-data wrapper OID for the server. */
tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(serverid));
if (!HeapTupleIsValid(tp))
@@ -334,6 +362,16 @@ GetFdwRoutineByRelId(Oid relid)
fdwid = serverform->srvfdw;
ReleaseSysCache(tp);
+ return GetFdwRoutineByFdwId(fdwid);
+}
+
+FdwRoutine *
+GetFdwRoutineByFdwId(Oid fdwid)
+{
+ HeapTuple tp;
+ Form_pg_foreign_data_wrapper fdwform;
+ Oid fdwhandler;
+
/* Get handler function OID for the FDW. */
tp = SearchSysCache1(FOREIGNDATAWRAPPEROID, ObjectIdGetDatum(fdwid));
if (!HeapTupleIsValid(tp))
@@ -350,7 +388,18 @@ GetFdwRoutineByRelId(Oid relid)
ReleaseSysCache(tp);
- /* And finally, call the handler function. */
+ return fdwhandler;
+}
+
+/*
+ * GetFdwRoutineByRelId - look up the handler of the foreign-data wrapper
+ * for the given foreign table, and retrieve its FdwRoutine struct.
+ */
+FdwRoutine *
+GetFdwRoutineByRelId(Oid relid)
+{
+ Oid fdwhandler = GetFdwHandlerByRelId(relid);
+
return GetFdwRoutine(fdwhandler);
}
@@ -398,6 +447,16 @@ GetFdwRoutineForRelation(Relation relation, bool makecopy)
return relation->rd_fdwroutine;
}
+/*
+ * GetFdwHandlerForRelation
+ *
+ * returns OID of FDW handler which is associated with the given relation.
+ */
+Oid
+GetFdwHandlerForRelation(Relation relation)
+{
+ return GetFdwHandlerByRelId(RelationGetRelid(relation));
+}
/*
* IsImportableForeignTable - filter table names for IMPORT FOREIGN SCHEMA
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..cb85468 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -590,7 +590,9 @@ _copyForeignScan(const ForeignScan *from)
/*
* copy remainder of node
*/
+ COPY_SCALAR_FIELD(fdw_handler);
COPY_NODE_FIELD(fdw_exprs);
+ COPY_NODE_FIELD(fdw_ps_tlist);
COPY_NODE_FIELD(fdw_private);
COPY_SCALAR_FIELD(fsSystemCol);
@@ -615,6 +617,7 @@ _copyCustomScan(const CustomScan *from)
*/
COPY_SCALAR_FIELD(flags);
COPY_NODE_FIELD(custom_exprs);
+ COPY_NODE_FIELD(custom_ps_tlist);
COPY_NODE_FIELD(custom_private);
/*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..048db39 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -556,7 +556,9 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
_outScanInfo(str, (const Scan *) node);
+ WRITE_OID_FIELD(fdw_handler);
WRITE_NODE_FIELD(fdw_exprs);
+ WRITE_NODE_FIELD(fdw_ps_tlist);
WRITE_NODE_FIELD(fdw_private);
WRITE_BOOL_FIELD(fsSystemCol);
}
@@ -570,6 +572,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
WRITE_UINT_FIELD(flags);
WRITE_NODE_FIELD(custom_exprs);
+ WRITE_NODE_FIELD(custom_ps_tlist);
WRITE_NODE_FIELD(custom_private);
appendStringInfoString(str, " :methods ");
_outToken(str, node->methods->CustomName);
@@ -1700,6 +1703,16 @@ _outHashPath(StringInfo str, const HashPath *node)
}
static void
+_outForeignJoinPath(StringInfo str, const ForeignJoinPath *node)
+{
+ WRITE_NODE_TYPE("FOREIGNJOINPATH");
+
+ _outJoinPathInfo(str, (const JoinPath *) node);
+
+ WRITE_NODE_FIELD(fdw_private);
+}
+
+static void
_outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
{
WRITE_NODE_TYPE("PLANNERGLOBAL");
@@ -1798,6 +1811,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(subplan);
WRITE_NODE_FIELD(subroot);
WRITE_NODE_FIELD(subplan_params);
+ WRITE_OID_FIELD(fdw_handler);
/* we don't try to print fdwroutine or fdw_private */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_NODE_FIELD(joininfo);
@@ -3122,6 +3136,9 @@ _outNode(StringInfo str, const void *obj)
case T_HashPath:
_outHashPath(str, obj);
break;
+ case T_ForeignJoinPath:
+ _outForeignJoinPath(str, obj);
+ break;
case T_PlannerGlobal:
_outPlannerGlobal(str, obj);
break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..a8506fc 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1782,8 +1782,8 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors)
{
- Path *outer_path = path->outerjoinpath;
- Path *inner_path = path->innerjoinpath;
+ Path *outer_path = path->jpath.outerjoinpath;
+ Path *inner_path = path->jpath.innerjoinpath;
double outer_path_rows = outer_path->rows;
double inner_path_rows = inner_path->rows;
Cost startup_cost = workspace->startup_cost;
@@ -1794,10 +1794,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
double ntuples;
/* Mark the path with the correct row estimate */
- if (path->path.param_info)
- path->path.rows = path->path.param_info->ppi_rows;
+ if (path->jpath.path.param_info)
+ path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
else
- path->path.rows = path->path.parent->rows;
+ path->jpath.path.rows = path->jpath.path.parent->rows;
/*
* We could include disable_cost in the preliminary estimate, but that
@@ -1809,7 +1809,7 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
/* cost of source data */
- if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI)
+ if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI)
{
double outer_matched_rows = workspace->outer_matched_rows;
Selectivity inner_scan_frac = workspace->inner_scan_frac;
@@ -1856,13 +1856,13 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
}
/* CPU costs */
- cost_qual_eval(&restrict_qual_cost, path->joinrestrictinfo, root);
+ cost_qual_eval(&restrict_qual_cost, path->jpath.joinrestrictinfo, root);
startup_cost += restrict_qual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + restrict_qual_cost.per_tuple;
run_cost += cpu_per_tuple * ntuples;
- path->path.startup_cost = startup_cost;
- path->path.total_cost = startup_cost + run_cost;
+ path->jpath.path.startup_cost = startup_cost;
+ path->jpath.path.total_cost = startup_cost + run_cost;
}
/*
@@ -3306,14 +3306,14 @@ compute_semi_anti_join_factors(PlannerInfo *root,
static bool
has_indexed_join_quals(NestPath *joinpath)
{
- Relids joinrelids = joinpath->path.parent->relids;
- Path *innerpath = joinpath->innerjoinpath;
+ Relids joinrelids = joinpath->jpath.path.parent->relids;
+ Path *innerpath = joinpath->jpath.innerjoinpath;
List *indexclauses;
bool found_one;
ListCell *lc;
/* If join still has quals to evaluate, it's not fast */
- if (joinpath->joinrestrictinfo != NIL)
+ if (joinpath->jpath.joinrestrictinfo != NIL)
return false;
/* Nor if the inner path isn't parameterized at all */
if (innerpath->param_info == NULL)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index e6aa21c..04e59e6 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,10 +17,13 @@
#include <math.h>
#include "executor/executor.h"
+#include "foreign/fdwapi.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
#define PATH_PARAM_BY_REL(path, rel) \
((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
@@ -50,7 +53,6 @@ static List *select_mergejoin_clauses(PlannerInfo *root,
JoinType jointype,
bool *mergejoin_allowed);
-
/*
* add_paths_to_joinrel
* Given a join relation and two component rels from which it can be made,
@@ -207,7 +209,29 @@ add_paths_to_joinrel(PlannerInfo *root,
extra_lateral_rels = NULL;
/*
- * 1. Consider mergejoin paths where both relations must be explicitly
+ * 1. Consider foreignjoin paths when both outer and inner relations are
+ * managed by same foreign-data wrapper, and share same server. Besides it,
+ * checkAsUser of all relations in the join must match. These limitations
+ * ensure that
+ * This is done preceding to any local join consideration because
+ * foreign join would be cheapst in most case when joining on remote side
+ * is possible.
+ */
+ if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+ {
+ joinrel->fdwroutine->GetForeignJoinPath(root,
+ joinrel,
+ outerrel,
+ innerrel,
+ jointype,
+ sjinfo,
+ &semifactors,
+ restrictlist,
+ extra_lateral_rels);
+ }
+
+ /*
+ * 2. Consider mergejoin paths where both relations must be explicitly
* sorted. Skip this if we can't mergejoin.
*/
if (mergejoin_allowed)
@@ -217,7 +241,7 @@ add_paths_to_joinrel(PlannerInfo *root,
param_source_rels, extra_lateral_rels);
/*
- * 2. Consider paths where the outer relation need not be explicitly
+ * 3. Consider paths where the outer relation need not be explicitly
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle right/full
@@ -232,7 +256,7 @@ add_paths_to_joinrel(PlannerInfo *root,
#ifdef NOT_USED
/*
- * 3. Consider paths where the inner relation need not be explicitly
+ * 4. Consider paths where the inner relation need not be explicitly
* sorted. This includes mergejoins only (nestloops were already built in
* match_unsorted_outer).
*
@@ -250,7 +274,7 @@ add_paths_to_joinrel(PlannerInfo *root,
#endif
/*
- * 4. Consider paths where both outer and inner relations must be hashed
+ * 5. Consider paths where both outer and inner relations must be hashed
* before being joined. As above, disregard enable_hashjoin for full
* joins, because there may be no other alternative.
*/
@@ -259,6 +283,19 @@ add_paths_to_joinrel(PlannerInfo *root,
restrictlist, jointype,
sjinfo, &semifactors,
param_source_rels, extra_lateral_rels);
+
+ /*
+ * 5. Consider paths added by FDW drivers or custom-scan providers, in
+ * addition to built-in paths.
+ *
+ * XXX - In case of FDW, we may be able to omit invocation if joinrel's
+ * fdwhandler (set only if both relations are managed by same FDW server).
+ */
+ if (set_join_pathlist_hook)
+ set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
+ restrictlist, jointype,
+ sjinfo, &semifactors,
+ param_source_rels, extra_lateral_rels);
}
/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..d20fb50 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -83,11 +83,14 @@ static CustomScan *create_customscan_plan(PlannerInfo *root,
CustomPath *best_path,
List *tlist, List *scan_clauses);
static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
+static ForeignScan *create_foreignjoin_plan(PlannerInfo *root,
+ ForeignJoinPath *best_path, List *tlist, Plan *outer_plan,
+ Plan *inner_plan);
static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
static void process_subquery_nestloop_params(PlannerInfo *root,
@@ -241,6 +244,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
case T_CustomScan:
plan = create_scan_plan(root, best_path);
break;
+ case T_ForeignJoinPath:
case T_HashJoin:
case T_MergeJoin:
case T_NestLoop:
@@ -611,6 +615,7 @@ create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
static Plan *
create_join_plan(PlannerInfo *root, JoinPath *best_path)
{
+ List *tlist;
Plan *outer_plan;
Plan *inner_plan;
Plan *plan;
@@ -625,27 +630,41 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
inner_plan = create_plan_recurse(root, best_path->innerjoinpath);
+ if (best_path->path.pathtype == T_NestLoop)
+ {
+ /* Restore curOuterRels */
+ bms_free(root->curOuterRels);
+ root->curOuterRels = saveOuterRels;
+ }
+ tlist = build_path_tlist(root, &best_path->path);
+
switch (best_path->path.pathtype)
{
+ case T_ForeignJoinPath:
+ plan = (Plan *) create_foreignjoin_plan(root,
+ (ForeignJoinPath *) best_path,
+ tlist,
+ outer_plan,
+ inner_plan);
+ break;
case T_MergeJoin:
plan = (Plan *) create_mergejoin_plan(root,
(MergePath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
case T_HashJoin:
plan = (Plan *) create_hashjoin_plan(root,
(HashPath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
case T_NestLoop:
- /* Restore curOuterRels */
- bms_free(root->curOuterRels);
- root->curOuterRels = saveOuterRels;
-
plan = (Plan *) create_nestloop_plan(root,
(NestPath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
@@ -1958,16 +1977,26 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
ForeignScan *scan_plan;
RelOptInfo *rel = best_path->path.parent;
Index scan_relid = rel->relid;
- RangeTblEntry *rte;
+ Oid rel_oid = InvalidOid;
Bitmapset *attrs_used = NULL;
ListCell *lc;
int i;
- /* it should be a base rel... */
- Assert(scan_relid > 0);
- Assert(rel->rtekind == RTE_RELATION);
- rte = planner_rt_fetch(scan_relid, root);
- Assert(rte->rtekind == RTE_RELATION);
+ /*
+ * Fetch relation-id, if this foreign-scan node actuall scans on
+ * a particular real relation. Elsewhere, InvalidOid shall be
+ * informed to the FDW driver.
+ */
+ if (scan_relid > 0)
+ {
+ RangeTblEntry *rte;
+
+ Assert(rel->rtekind == RTE_RELATION);
+ rte = planner_rt_fetch(scan_relid, root);
+ Assert(rte->rtekind == RTE_RELATION);
+ rel_oid = rte->relid;
+ }
+ Assert(rel->fdwroutine != NULL);
/*
* Sort clauses into best execution order. We do this first since the FDW
@@ -1982,13 +2011,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
* has selected some join clauses for remote use but also wants them
* rechecked locally).
*/
- scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rte->relid,
+ scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
best_path,
tlist, scan_clauses);
/* Copy cost data from Path to Plan; no need to make FDW do this */
copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
+ /* Track FDW server-id; no need to make FDW do this */
+ scan_plan->fdw_handler = rel->fdw_handler;
+
/*
* Replace any outer-relation variables with nestloop params in the qual
* and fdw_exprs expressions. We do this last so that the FDW doesn't
@@ -2052,12 +2084,6 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
RelOptInfo *rel = best_path->path.parent;
/*
- * Right now, all we can support is CustomScan node which is associated
- * with a particular base relation to be scanned.
- */
- Assert(rel && rel->reloptkind == RELOPT_BASEREL);
-
- /*
* Sort clauses into the best execution order, although custom-scan
* provider can reorder them again.
*/
@@ -2108,12 +2134,12 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
static NestLoop *
create_nestloop_plan(PlannerInfo *root,
NestPath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
NestLoop *join_plan;
- List *tlist = build_path_tlist(root, &best_path->path);
- List *joinrestrictclauses = best_path->joinrestrictinfo;
+ List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
List *joinclauses;
List *otherclauses;
Relids outerrelids;
@@ -2127,7 +2153,7 @@ create_nestloop_plan(PlannerInfo *root,
/* Get the join qual clauses (in plain expression form) */
/* Any pseudoconstant clauses are ignored here */
- if (IS_OUTER_JOIN(best_path->jointype))
+ if (IS_OUTER_JOIN(best_path->jpath.jointype))
{
extract_actual_join_clauses(joinrestrictclauses,
&joinclauses, &otherclauses);
@@ -2140,7 +2166,7 @@ create_nestloop_plan(PlannerInfo *root,
}
/* Replace any outer-relation variables with nestloop params */
- if (best_path->path.param_info)
+ if (best_path->jpath.path.param_info)
{
joinclauses = (List *)
replace_nestloop_params(root, (Node *) joinclauses);
@@ -2152,7 +2178,7 @@ create_nestloop_plan(PlannerInfo *root,
* Identify any nestloop parameters that should be supplied by this join
* node, and move them from root->curOuterParams to the nestParams list.
*/
- outerrelids = best_path->outerjoinpath->parent->relids;
+ outerrelids = best_path->jpath.outerjoinpath->parent->relids;
nestParams = NIL;
prev = NULL;
for (cell = list_head(root->curOuterParams); cell; cell = next)
@@ -2189,9 +2215,9 @@ create_nestloop_plan(PlannerInfo *root,
nestParams,
outer_plan,
inner_plan,
- best_path->jointype);
+ best_path->jpath.jointype);
- copy_path_costsize(&join_plan->join.plan, &best_path->path);
+ copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path);
return join_plan;
}
@@ -2199,10 +2225,10 @@ create_nestloop_plan(PlannerInfo *root,
static MergeJoin *
create_mergejoin_plan(PlannerInfo *root,
MergePath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
- List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinclauses;
List *otherclauses;
List *mergeclauses;
@@ -2494,10 +2520,10 @@ create_mergejoin_plan(PlannerInfo *root,
static HashJoin *
create_hashjoin_plan(PlannerInfo *root,
HashPath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
- List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinclauses;
List *otherclauses;
List *hashclauses;
@@ -2616,6 +2642,53 @@ create_hashjoin_plan(PlannerInfo *root,
return join_plan;
}
+/*
+ * Unlike other join paths, ForeignJoinPath is transformed into ForiegnScan
+ * plan node.
+ */
+static ForeignScan *
+create_foreignjoin_plan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ Plan *outer_plan,
+ Plan *inner_plan)
+{
+ ForeignScan *join_plan;
+ List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
+ List *joinclauses;
+ List *otherclauses;
+
+ /* Sort join qual clauses into best execution order */
+ joinrestrictclauses = order_qual_clauses(root, joinrestrictclauses);
+
+ /* Get the join qual clauses (in plain expression form) */
+ /* Any pseudoconstant clauses are ignored here */
+ if (IS_OUTER_JOIN(best_path->jpath.jointype))
+ {
+ extract_actual_join_clauses(joinrestrictclauses,
+ &joinclauses, &otherclauses);
+ }
+ else
+ {
+ /* We can treat all clauses alike for an inner join */
+ joinclauses = extract_actual_clauses(joinrestrictclauses, false);
+ otherclauses = NIL;
+ }
+
+ /* Call FDW handler */
+ {
+ RelOptInfo *rel = best_path->jpath.path.parent;
+
+ Assert(rel->fdwroutine);
+ join_plan = rel->fdwroutine->GetForeignJoinPlan(root, best_path,
+ tlist, joinclauses,
+ otherclauses,
+ outer_plan, inner_plan);
+ join_plan->fdw_handler = rel->fdw_handler;
+ }
+
+ return join_plan;
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..d567c49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -569,6 +569,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
{
ForeignScan *splan = (ForeignScan *) plan;
+ if (splan->fdw_ps_tlist != NIL)
+ {
+ indexed_tlist *pscan_itlist =
+ build_tlist_index(splan->fdw_ps_tlist);
+
+ Assert(splan->scan.scanrelid == 0);
+
+ splan->scan.plan.targetlist = (List *)
+ fix_upper_expr(root,
+ (Node *) splan->scan.plan.targetlist,
+ pscan_itlist,
+ INDEX_VAR,
+ rtoffset);
+ splan->scan.plan.qual = (List *)
+ fix_upper_expr(root,
+ (Node *) splan->scan.plan.qual,
+ pscan_itlist,
+ INDEX_VAR,
+ rtoffset);
+ splan->fdw_exprs = (List *)
+ fix_upper_expr(root,
+ (Node *) splan->fdw_exprs,
+ pscan_itlist,
+ INDEX_VAR,
+ rtoffset);
+ splan->fdw_ps_tlist =
+ fix_scan_list(root, splan->fdw_ps_tlist, rtoffset);
+ pfree(pscan_itlist);
+ break;
+ }
splan->scan.scanrelid += rtoffset;
splan->scan.plan.targetlist =
fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
@@ -583,6 +613,36 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
{
CustomScan *splan = (CustomScan *) plan;
+ if (splan->custom_ps_tlist != NIL)
+ {
+ indexed_tlist *pscan_itlist =
+ build_tlist_index(splan->custom_ps_tlist);
+
+ Assert(splan->scan.scanrelid == 0);
+
+ splan->scan.plan.targetlist = (List *)
+ fix_upper_expr(root,
+ (Node *) splan->scan.plan.targetlist,
+ pscan_itlist,
+ INDEX_VAR,
+ rtoffset);
+ splan->scan.plan.qual = (List *)
+ fix_upper_expr(root,
+ (Node *) splan->scan.plan.qual,
+ pscan_itlist,
+ INDEX_VAR,
+ rtoffset);
+ splan->custom_exprs = (List *)
+ fix_upper_expr(root,
+ (Node *) splan->custom_exprs,
+ pscan_itlist,
+ INDEX_VAR,
+ rtoffset);
+ splan->custom_ps_tlist =
+ fix_scan_list(root, splan->custom_ps_tlist, rtoffset);
+ pfree(pscan_itlist);
+ break;
+ }
splan->scan.scanrelid += rtoffset;
splan->scan.plan.targetlist =
fix_scan_list(root, splan->scan.plan.targetlist, rtoffset);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..d6434bf 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1710,9 +1710,9 @@ create_nestloop_path(PlannerInfo *root,
restrict_clauses = jclauses;
}
- pathnode->path.pathtype = T_NestLoop;
- pathnode->path.parent = joinrel;
- pathnode->path.param_info =
+ pathnode->jpath.path.pathtype = T_NestLoop;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
get_joinrel_parampathinfo(root,
joinrel,
outer_path,
@@ -1720,11 +1720,11 @@ create_nestloop_path(PlannerInfo *root,
sjinfo,
required_outer,
&restrict_clauses);
- pathnode->path.pathkeys = pathkeys;
- pathnode->jointype = jointype;
- pathnode->outerjoinpath = outer_path;
- pathnode->innerjoinpath = inner_path;
- pathnode->joinrestrictinfo = restrict_clauses;
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
final_cost_nestloop(root, pathnode, workspace, sjinfo, semifactors);
@@ -1859,6 +1859,58 @@ create_hashjoin_path(PlannerInfo *root,
}
/*
+ * create_foreignjoin_path
+ * Creates a pathnode corresponding to a foreign join between two relations.
+ * Unlike similar funcitons for other join types, final_cost_foreignjoin is
+ * not called, so FDW have to take care of cost information.
+ *
+ * 'joinrel' is the join relation
+ * 'jointype' is the type of join required
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'outer_path' is the cheapest outer path
+ * 'inner_path' is the cheapest inner path
+ * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
+ * 'required_outer' is the set of required outer rels
+ * 'foreignclauses' are the RestrictInfo nodes to use as foreign clauses
+ * (this should be a subset of the restrict_clauses list)
+ */
+ForeignJoinPath *
+create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer)
+{
+ ForeignJoinPath *pathnode = makeNode(ForeignJoinPath);
+
+ pathnode->jpath.path.pathtype = T_ForeignJoinPath;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
+ get_joinrel_parampathinfo(root,
+ joinrel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ required_outer,
+ &restrict_clauses);
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+ pathnode->fdw_private = NIL;
+
+ return pathnode;
+}
+
+/*
* reparameterize_path
* Attempt to modify a Path to have greater parameterization
*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fb7db6d..57763d4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
#include "catalog/catalog.h"
#include "catalog/heap.h"
#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
@@ -378,10 +379,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
/* Grab the fdwroutine info using the relcache, while we have it */
if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ {
+ rel->fdw_handler = GetFdwHandlerForRelation(relation);
rel->fdwroutine = GetFdwRoutineForRelation(relation, true);
+ }
else
+ {
+ rel->fdw_handler = InvalidOid;
rel->fdwroutine = NULL;
-
+ }
heap_close(relation, NoLock);
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..667ae1b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "foreign/fdwapi.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -121,6 +122,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->subplan = NULL;
rel->subroot = NULL;
rel->subplan_params = NIL;
+ rel->fdw_handler = InvalidOid;
rel->fdwroutine = NULL;
rel->fdw_private = NULL;
rel->baserestrictinfo = NIL;
@@ -383,7 +385,17 @@ build_join_rel(PlannerInfo *root,
joinrel->subplan = NULL;
joinrel->subroot = NULL;
joinrel->subplan_params = NIL;
- joinrel->fdwroutine = NULL;
+ /* propagate common FDW information up to join relation */
+ if (inner_rel->fdw_handler == outer_rel->fdw_handler)
+ {
+ joinrel->fdwroutine = inner_rel->fdwroutine;
+ joinrel->fdw_handler = inner_rel->fdw_handler;
+ }
+ else
+ {
+ joinrel->fdw_handler = InvalidOid;
+ joinrel->fdwroutine = NULL;
+ }
joinrel->fdw_private = NULL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
@@ -427,6 +439,18 @@ build_join_rel(PlannerInfo *root,
sjinfo, restrictlist);
/*
+ * Set FDW handler and routine if both outer and inner relation
+ * are managed by same FDW driver.
+ */
+ if (OidIsValid(outer_rel->fdw_handler) &&
+ OidIsValid(inner_rel->fdw_handler) &&
+ outer_rel->fdw_handler == inner_rel->fdw_handler)
+ {
+ joinrel->fdw_handler = outer_rel->fdw_handler;
+ joinrel->fdwroutine = GetFdwRoutine(joinrel->fdw_handler);
+ }
+
+ /*
* Add the joinrel to the query's joinrel list, and store it into the
* auxiliary hashtable if there is one. NB: GEQO requires us to append
* the new joinrel to the end of the list!
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index c1d860c..eb9eaf0 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -3842,6 +3842,10 @@ set_deparse_planstate(deparse_namespace *dpns, PlanState *ps)
/* index_tlist is set only if it's an IndexOnlyScan */
if (IsA(ps->plan, IndexOnlyScan))
dpns->index_tlist = ((IndexOnlyScan *) ps->plan)->indextlist;
+ else if (IsA(ps->plan, ForeignScan))
+ dpns->index_tlist = ((ForeignScan *) ps->plan)->fdw_ps_tlist;
+ else if (IsA(ps->plan, CustomScan))
+ dpns->index_tlist = ((CustomScan *) ps->plan)->custom_ps_tlist;
else
dpns->index_tlist = NIL;
}
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1d76841..b1f8532 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,24 @@ typedef void (*EndForeignModify_function) (EState *estate,
typedef int (*IsForeignRelUpdatable_function) (Relation rel);
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels);
+
+typedef ForeignScan *(*GetForeignJoinPlan_function) (PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan);
+
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
@@ -150,13 +168,21 @@ typedef struct FdwRoutine
/* Support functions for IMPORT FOREIGN SCHEMA */
ImportForeignSchema_function ImportForeignSchema;
+
+ /* Support functions for join push-down */
+ GetForeignJoinPath_function GetForeignJoinPath;
+ GetForeignJoinPlan_function GetForeignJoinPlan;
+
} FdwRoutine;
/* Functions in foreign/foreign.c */
extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
+extern FdwRoutine * GetFdwRoutineByServerId(Oid serverid);
+extern FdwRoutine * GetFdwRoutineByFdwId(Oid fdwid);
extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
+extern Oid GetFdwHandlerForRelation(Relation relation);
extern bool IsImportableForeignTable(const char *tablename,
ImportForeignSchemaStmt *stmt);
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 9c737b4..35acae7 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -75,6 +75,7 @@ extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
bool missing_ok);
extern ForeignTable *GetForeignTable(Oid relid);
+extern Oid GetForeignTableServerOid(Oid relid);
extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..0f7a15d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -224,6 +224,7 @@ typedef enum NodeTag
T_NestPath,
T_MergePath,
T_HashPath,
+ T_ForeignJoinPath,
T_TidPath,
T_ForeignPath,
T_CustomPath,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..6717c6d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -470,7 +470,13 @@ typedef struct WorkTableScan
* fdw_exprs and fdw_private are both under the control of the foreign-data
* wrapper, but fdw_exprs is presumed to contain expression trees and will
* be post-processed accordingly by the planner; fdw_private won't be.
- * Note that everything in both lists must be copiable by copyObject().
+ * An optional fdw_ps_tlist is used to map a reference to an attribute of
+ * underlying relation(s) on a pair of INDEX_VAR and alternative varattno.
+ * It looks like a scan on pseudo relation that is usually result of
+ * relations join on remote data source, and FDW driver is responsible to
+ * set expected target list for this. If FDW returns records as foreign-
+ * table definition, just put NIL here.
+ * Note that everything in above lists must be copiable by copyObject().
* One way to store an arbitrary blob of bytes is to represent it as a bytea
* Const. Usually, though, you'll be better off choosing a representation
* that can be dumped usefully by nodeToString().
@@ -479,7 +485,9 @@ typedef struct WorkTableScan
typedef struct ForeignScan
{
Scan scan;
+ Oid fdw_handler; /* OID of FDW handler */
List *fdw_exprs; /* expressions that FDW may evaluate */
+ List *fdw_ps_tlist; /* optional pseudo-scan tlist for FDW */
List *fdw_private; /* private data for FDW */
bool fsSystemCol; /* true if any "system column" is needed */
} ForeignScan;
@@ -487,10 +495,11 @@ typedef struct ForeignScan
/* ----------------
* CustomScan node
*
- * The comments for ForeignScan's fdw_exprs and fdw_private fields apply
- * equally to custom_exprs and custom_private. Note that since Plan trees
- * can be copied, custom scan providers *must* fit all plan data they need
- * into those fields; embedding CustomScan in a larger struct will not work.
+ * The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
+ * apply equally to custom_exprs, custom_ps_tlist and custom_private.
+ * Note that since Plan trees can be copied, custom scan providers *must*
+ * fit all plan data they need into those fields; embedding CustomScan in
+ * a larger struct will not work.
* ----------------
*/
struct CustomScan;
@@ -511,6 +520,7 @@ typedef struct CustomScan
Scan scan;
uint32 flags; /* mask of CUSTOMPATH_* flags, see relation.h */
List *custom_exprs; /* expressions that custom code may evaluate */
+ List *custom_ps_tlist;/* optional pseudo-scan target list */
List *custom_private; /* private data for custom code */
const CustomScanMethods *methods;
} CustomScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..9914d1d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -366,6 +366,7 @@ typedef struct PlannerInfo
* subroot - PlannerInfo for subquery (NULL if it's not a subquery)
* subplan_params - list of PlannerParamItems to be passed to subquery
* fdwroutine - function hooks for FDW, if foreign table (else NULL)
+ * fdw_handler - OID of FDW handler, if foreign table (else InvalidOid)
* fdw_private - private state for FDW, if foreign table (else NULL)
*
* Note: for a subquery, tuples, subplan, subroot are not set immediately
@@ -461,6 +462,7 @@ typedef struct RelOptInfo
List *subplan_params; /* if subquery */
/* use "struct FdwRoutine" to avoid including fdwapi.h here */
struct FdwRoutine *fdwroutine; /* if foreign table */
+ Oid fdw_handler; /* if foreign table */
void *fdw_private; /* if foreign table */
/* used by various scans and joins: */
@@ -1044,7 +1046,10 @@ typedef struct JoinPath
* A nested-loop path needs no special fields.
*/
-typedef JoinPath NestPath;
+typedef struct NestPath
+{
+ JoinPath jpath;
+} NestPath;
/*
* A mergejoin path has these fields.
@@ -1100,6 +1105,22 @@ typedef struct HashPath
} HashPath;
/*
+ * ForeignJoinPath represents a join between two relations consist of foreign
+ * table.
+ *
+ * fdw_private stores FDW private data about the join. While fdw_private is
+ * not actually touched by the core code during normal operations, it's
+ * generally a good idea to use a representation that can be dumped by
+ * nodeToString(), so that you can examine the structure during debugging
+ * with tools like pprint().
+ */
+typedef struct ForeignJoinPath
+{
+ JoinPath jpath;
+ List *fdw_private;
+} ForeignJoinPath;
+
+/*
* Restriction clause info.
*
* We create one of these for each AND sub-clause of a restriction condition
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..d4b6498 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -124,6 +124,17 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
Relids required_outer,
List *hashclauses);
+extern ForeignJoinPath *create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer);
+
extern Path *reparameterize_path(PlannerInfo *root, Path *path,
Relids required_outer,
double loop_count);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..c42c69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -30,6 +30,19 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
RangeTblEntry *rte);
extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+/* Hook for plugins to get control in add_paths_to_joinrel() */
+typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ List *restrictlist,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Relids param_source_rels,
+ Relids extra_lateral_rels);
+extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
+
/* Hook for plugins to replace standard_join_search() */
typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
int levels_needed,
Hanada-san,
Your patch mixtures enhancement of custom-/foreign-scan interface and
enhancement of contrib/postgres_fdw... Probably, it is a careless mis-
operation.
Please make your patch as differences from my infrastructure portion.
Also, I noticed this "Join pushdown support for foreign tables" patch
is unintentionally rejected in the last commit fest.
https://commitfest.postgresql.org/3/20/
I couldn't register myself as reviewer. How do I operate it on the
new commitfest application?
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru Hanada
Sent: Monday, February 16, 2015 1:03 PM
To: Robert Haas
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi
I've revised the patch based on Kaigai-san's custom/foreign join patch
posted in the thread below./messages/by-id/9A28C8860F777E439AA12E8AEA7694F80
108C355@BPXM15GP.gisp.nec.co.jpBasically not changed from the version in the last CF, but as Robert
commented before, N-way (not only 2-way) joins should be supported in the
first version by construct SELECT SQL by containing source query in FROM
clause as inline views (a.k.a. from clause subquery).2014-12-26 13:48 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2014-12-16 1:22 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs
to check that 1) all foreign tables in the join belong to a server,
and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible
to join even join relations, so N-way join is listed as enhancement
item below.It seems pretty important to me that we have a way to push the entire
join nest down. Being able to push down a 2-way join but not more
seems like quite a severe limitation.Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL
generator seems to give us a hint for such case, I'll check it out
again.--
Shigeru HANADA--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kaigai-san,
Oops. I rebased the patch onto your v4 custom/foreign join patch.
But as you mentioned off-list, I found a flaw about inappropriate
change about NestPath still remains in the patch... I might have made
my dev branch into unexpected state. I'll check it soon.
2015-02-16 13:13 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Hanada-san,
Your patch mixtures enhancement of custom-/foreign-scan interface and
enhancement of contrib/postgres_fdw... Probably, it is a careless mis-
operation.
Please make your patch as differences from my infrastructure portion.Also, I noticed this "Join pushdown support for foreign tables" patch
is unintentionally rejected in the last commit fest.
https://commitfest.postgresql.org/3/20/
I couldn't register myself as reviewer. How do I operate it on the
new commitfest application?Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru Hanada
Sent: Monday, February 16, 2015 1:03 PM
To: Robert Haas
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi
I've revised the patch based on Kaigai-san's custom/foreign join patch
posted in the thread below./messages/by-id/9A28C8860F777E439AA12E8AEA7694F80
108C355@BPXM15GP.gisp.nec.co.jpBasically not changed from the version in the last CF, but as Robert
commented before, N-way (not only 2-way) joins should be supported in the
first version by construct SELECT SQL by containing source query in FROM
clause as inline views (a.k.a. from clause subquery).2014-12-26 13:48 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2014-12-16 1:22 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs
to check that 1) all foreign tables in the join belong to a server,
and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible
to join even join relations, so N-way join is listed as enhancement
item below.It seems pretty important to me that we have a way to push the entire
join nest down. Being able to push down a 2-way join but not more
seems like quite a severe limitation.Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL
generator seems to give us a hint for such case, I'll check it out
again.--
Shigeru HANADA--
Shigeru HANADA
--
Shigeru HANADA
Attachments:
foreign_join.patchapplication/octet-stream; name=foreign_join.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..6795e5f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -86,7 +86,7 @@ typedef struct foreign_loc_cxt
typedef struct deparse_expr_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning for */
+ Relids rels; /* list of foreign tables to be deparsed */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
} deparse_expr_cxt;
@@ -108,6 +108,7 @@ static void deparseTargetList(StringInfo buf,
Index rtindex,
Relation rel,
Bitmapset *attrs_used,
+ const char *alias,
List **retrieved_attrs);
static void deparseReturningList(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
@@ -115,7 +116,7 @@ static void deparseReturningList(StringInfo buf, PlannerInfo *root,
List *returningList,
List **retrieved_attrs);
static void deparseColumnRef(StringInfo buf, int varno, int varattno,
- PlannerInfo *root);
+ PlannerInfo *root, const char *alias);
static void deparseRelation(StringInfo buf, Relation rel);
static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
static void deparseVar(Var *node, deparse_expr_cxt *context);
@@ -679,33 +680,119 @@ is_builtin(Oid oid)
void
deparseSelectSql(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
- Bitmapset *attrs_used,
- List **retrieved_attrs)
+ List *rels)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
- Relation rel;
+ StringInfoData frombuf;
+ ListCell *lc;
+ bool first_rel = true;
+ Relids relids = NULL;
- /*
- * Core code already has some lock on each rel being planned, so we can
- * use NoLock here.
- */
- rel = heap_open(rte->relid, NoLock);
+ initStringInfo(&frombuf);
- /*
- * Construct SELECT list
- */
- appendStringInfoString(buf, "SELECT ");
- deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
- retrieved_attrs);
+ /* Construct list of relid for deparsing query contains multiple tables. */
+ foreach(lc, rels)
+ {
+ PgFdwDeparseRel *dr = (PgFdwDeparseRel *) lfirst(lc);
+ relids = bms_add_member(relids, dr->baserel->relid);
+ }
- /*
- * Construct FROM clause
- */
- appendStringInfoString(buf, " FROM ");
- deparseRelation(buf, rel);
+ /* Loop through relation list and deparse SELECT query. */
+ foreach(lc, rels)
+ {
+ PgFdwDeparseRel *dr = (PgFdwDeparseRel *) lfirst(lc);
+ RangeTblEntry *rte = planner_rt_fetch(dr->baserel->relid, root);
+ Relation rel;
+ const char *alias;
- heap_close(rel, NoLock);
+ /*
+ * Core code already has some lock on each rel being planned, so we can
+ * use NoLock here.
+ */
+ rel = heap_open(rte->relid, NoLock);
+
+ /*
+ * Add alias only when we have multiple relations.
+ */
+ if (list_length(rels) > 1 && rte->alias)
+ alias = rte->alias->aliasname;
+ else
+ alias = NULL;
+
+ /*
+ * Construct SELECT list
+ */
+ if (first_rel)
+ appendStringInfoString(buf, "SELECT ");
+ else
+ appendStringInfoString(buf, ", ");
+ deparseTargetList(buf, root, dr->baserel->relid, rel, dr->attrs_used,
+ alias, dr->retrieved_attrs);
+
+ /*
+ * Construct FROM clause
+ */
+ if (first_rel)
+ appendStringInfoString(&frombuf, " FROM ");
+ else
+ {
+ switch (dr->jointype)
+ {
+ case JOIN_INNER:
+ if (dr->joinclauses)
+ appendStringInfoString(&frombuf, " INNER JOIN ");
+ else
+ /* Currently cross join is not pushed down, though. */
+ appendStringInfoString(&frombuf, " CROSS JOIN ");
+ break;
+ case JOIN_LEFT:
+ appendStringInfoString(&frombuf, " LEFT JOIN ");
+ break;
+ case JOIN_FULL:
+ appendStringInfoString(&frombuf, " FULL JOIN ");
+ break;
+ case JOIN_RIGHT:
+ appendStringInfoString(&frombuf, " RIGHT JOIN ");
+ break;
+ default:
+ elog(ERROR, "unsupported join type for deparse: %d",
+ dr->jointype);
+ break;
+ }
+ }
+ deparseRelation(&frombuf, rel);
+ if (alias)
+ appendStringInfo(&frombuf, " %s", alias);
+
+ if (!first_rel && dr->joinclauses)
+ {
+ ListCell *lc;
+ bool first = true;
+
+ appendStringInfoString(&frombuf, " ON ");
+
+ foreach(lc, dr->joinclauses)
+ {
+ deparse_expr_cxt context;
+ Expr *expr = (Expr *) lfirst(lc);
+
+ context.root = root;
+ context.rels = relids;
+ context.buf = &frombuf;
+ context.params_list = NULL;
+
+ if (!first)
+ appendStringInfoString(&frombuf, " AND ");
+ deparseExpr(expr, &context);
+ first = false;
+ }
+ }
+
+ heap_close(rel, NoLock);
+ first_rel = false;
+ }
+
+ appendStringInfoString(buf, frombuf.data);
+ pfree(frombuf.data);
}
/*
@@ -721,6 +808,7 @@ deparseTargetList(StringInfo buf,
Index rtindex,
Relation rel,
Bitmapset *attrs_used,
+ const char *alias,
List **retrieved_attrs)
{
TupleDesc tupdesc = RelationGetDescr(rel);
@@ -751,7 +839,7 @@ deparseTargetList(StringInfo buf,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, i, root);
+ deparseColumnRef(buf, rtindex, i, root, alias);
*retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
@@ -768,6 +856,8 @@ deparseTargetList(StringInfo buf,
appendStringInfoString(buf, ", ");
first = false;
+ if (alias)
+ appendStringInfo(buf, "%s.", alias);
appendStringInfoString(buf, "ctid");
*retrieved_attrs = lappend_int(*retrieved_attrs,
@@ -796,7 +886,7 @@ deparseTargetList(StringInfo buf,
void
appendWhereClause(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ Relids relids,
List *exprs,
bool is_first,
List **params)
@@ -810,7 +900,7 @@ appendWhereClause(StringInfo buf,
/* Set up context struct for recursion */
context.root = root;
- context.foreignrel = baserel;
+ context.rels = relids;
context.buf = buf;
context.params_list = params;
@@ -870,7 +960,7 @@ deparseInsertSql(StringInfo buf, PlannerInfo *root,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, attnum, root);
+ deparseColumnRef(buf, rtindex, attnum, root, NULL);
}
appendStringInfoString(buf, ") VALUES (");
@@ -928,7 +1018,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
appendStringInfoString(buf, ", ");
first = false;
- deparseColumnRef(buf, rtindex, attnum, root);
+ deparseColumnRef(buf, rtindex, attnum, root, NULL);
appendStringInfo(buf, " = $%d", pindex);
pindex++;
}
@@ -993,7 +1083,7 @@ deparseReturningList(StringInfo buf, PlannerInfo *root,
if (attrs_used != NULL)
{
appendStringInfoString(buf, " RETURNING ");
- deparseTargetList(buf, root, rtindex, rel, attrs_used,
+ deparseTargetList(buf, root, rtindex, rel, attrs_used, NULL,
retrieved_attrs);
}
else
@@ -1088,7 +1178,8 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs)
* If it has a column_name FDW option, use that instead of attribute name.
*/
static void
-deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
+deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root,
+ const char *alias)
{
RangeTblEntry *rte;
char *colname = NULL;
@@ -1124,6 +1215,8 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, PlannerInfo *root)
if (colname == NULL)
colname = get_relid_attribute_name(rte->relid, varattno);
+ if (alias)
+ appendStringInfo(buf, "%s.", alias);
appendStringInfoString(buf, quote_identifier(colname));
}
@@ -1270,12 +1363,46 @@ static void
deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
+ int i;
+ RelOptInfo *rel = NULL;
+ RangeTblEntry *rte = NULL;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ /* Find RangeTblEntry contains given Var to determine alias name. */
+ if (bms_is_member(node->varno, context->rels) && node->varlevelsup == 0)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ for (i = 1; i < context->root->simple_rel_array_size; i++)
+ {
+ /* Skip empty slot */
+ if (context->root->simple_rel_array[i] == NULL)
+ continue;
+
+ if (context->root->simple_rel_array[i]->relid == node->varno)
+ {
+ rel = context->root->simple_rel_array[i];
+ rte = context->root->simple_rte_array[i];
+ break;
+ }
+ }
+ }
+
+ /*
+ * If the Var is in current level (not in outer subquery), simply deparse
+ * it.
+ */
+ if (rel)
+ {
+ const char *alias;
+
+ /*
+ * Deparse Var belongs to foreign tables in context->rels, with alias
+ * name if we are deparsing multiple foreign tables.
+ */
+ if (bms_num_members(context->rels) > 1 && rte->alias)
+ alias = rte->alias->aliasname;
+ else
+ alias = NULL;
+ deparseColumnRef(buf, node->varno, node->varattno, context->root,
+ alias);
}
else
{
@@ -1849,3 +1976,4 @@ printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
appendStringInfo(buf, "((SELECT null::%s)::%s)", ptypename, ptypename);
}
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..0a645b6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -288,6 +289,22 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
+static ForeignScan *postgresGetForeignJoinPlan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan);
/*
* Helper functions
@@ -368,6 +385,10 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPath = postgresGetForeignJoinPath;
+ routine->GetForeignJoinPlan = postgresGetForeignJoinPlan;
+
PG_RETURN_POINTER(routine);
}
@@ -752,6 +773,7 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ PgFdwDeparseRel dr;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -797,11 +819,15 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
+ dr.baserel = baserel;
+ dr.jointype = JOIN_INNER;
+ dr.joinclauses = NIL;
+ dr.attrs_used = fpinfo->attrs_used;
+ dr.retrieved_attrs = &retrieved_attrs;
+ deparseSelectSql(&sql, root, list_make1(&dr));
if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
+ appendWhereClause(&sql, root, bms_add_member(NULL, baserel->relid),
+ remote_conds, true, ¶ms_list);
/*
* Add FOR UPDATE/SHARE if appropriate. We apply locking during the
@@ -906,13 +932,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
* Identify which user to do the remote access as. This should match what
* ExecCheckRTEPerms() does.
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ server = GetForeignServer(table->serverid);
+ }
+ else
+ {
+ /* XXX how can we determine userid to use for join cases? */
+ userid = GetCurrentRoleId();
+ server = GetForeignServer(16409);
+ }
/* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
/*
@@ -944,7 +980,16 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->attinmeta =
+ TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ else
+ {
+ TupleDesc ps_tupdesc;
+
+ ps_tupdesc = ExecTypeFromTL(fsplan->fdw_ps_tlist, false);
+ fsstate->attinmeta = TupleDescGetAttInMetadata(ps_tupdesc);
+ }
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1725,10 +1770,12 @@ estimate_path_cost_size(PlannerInfo *root,
List *remote_join_conds;
List *local_join_conds;
StringInfoData sql;
+ Relids relids;
List *retrieved_attrs;
PGconn *conn;
Selectivity local_sel;
QualCost local_cost;
+ PgFdwDeparseRel dr;
/*
* join_conds might contain both clauses that are safe to send across,
@@ -1743,14 +1790,19 @@ estimate_path_cost_size(PlannerInfo *root,
* dummy values.
*/
initStringInfo(&sql);
+ dr.baserel = baserel;
+ dr.jointype = JOIN_INNER;
+ dr.joinclauses = NIL;
+ dr.attrs_used = fpinfo->attrs_used;
+ dr.retrieved_attrs = &retrieved_attrs;
appendStringInfoString(&sql, "EXPLAIN ");
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
+ deparseSelectSql(&sql, root, list_make1(&dr));
+ relids = bms_add_member(NULL, baserel->relid);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
+ appendWhereClause(&sql, root, relids, fpinfo->remote_conds,
true, NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
+ appendWhereClause(&sql, root, relids, remote_join_conds,
(fpinfo->remote_conds == NIL), NULL);
/* Get the remote estimate */
@@ -2835,6 +2887,233 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o, PgFdwRelationInfo *fpinfo_i)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds =
+ list_concat(fpinfo_o->remote_conds, fpinfo_i->remote_conds);
+ fpinfo->local_conds =
+ list_concat(fpinfo_o->local_conds, fpinfo_i->local_conds);
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup =
+ fpinfo_o->local_conds_cost.startup + fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple =
+ fpinfo_o->local_conds_cost.per_tuple + fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel =
+ fpinfo_o->local_conds_sel * fpinfo_i->local_conds_sel;
+ /* XXX we should use join selectivity and join type */
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPath
+ * Add possible ForeignJoinPath to joinrel.
+ *
+ */
+static void
+postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignJoinPath *joinpath;
+ Path *path_o = outerrel->cheapest_total_path;
+ Path *path_i = innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ Relids required_outer;
+
+ /* Skip considering reversed join combination */
+ elog(DEBUG1, "%s() outer: %d, inner: %d",
+ __func__, outerrel->relid, innerrel->relid);
+ if (outerrel->relid < innerrel->relid)
+ return;
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ return;
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && !restrictlist)
+ return;
+
+ /*
+ * Both relations in the join must belong to same server, and have same
+ * checkAsUser to use one connection to execute SQL for the join.
+ */
+ if (IsA(path_o, ForeignPath))
+ fpinfo_o = ((ForeignPath *) path_o)->path.parent->fdw_private;
+ else if (IsA(path_o, ForeignJoinPath))
+ fpinfo_o = ((ForeignJoinPath *) path_o)->jpath.path.parent->fdw_private;
+ else
+ fpinfo_o = NULL;
+ Assert(fpinfo_o);
+ if (IsA(path_i, ForeignPath))
+ fpinfo_i = ((ForeignPath *) path_i)->path.parent->fdw_private;
+ else if (IsA(path_i, ForeignJoinPath))
+ fpinfo_i = ((ForeignJoinPath *) path_i)->jpath.path.parent->fdw_private;
+ else
+ fpinfo_i = NULL;
+ Assert(fpinfo_i);
+
+ /* Servers should match */
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ return;
+
+ /* Construct fpinfo for the join relation */
+ joinrel->fdw_private = merge_fpinfo(fpinfo_o, fpinfo_i);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ required_outer = calc_non_nestloop_required_outer(path_o, path_i);
+ joinpath = create_foreignjoin_path(root,
+ joinrel,
+ jointype,
+ sjinfo,
+ semifactors,
+ path_o,
+ path_i,
+ restrictlist,
+ NIL,
+ required_outer);
+
+ /* TODO determine cost and rows of the join. */
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
+ * postgresGetForeignJoinPlan
+ * Create ForeignJoin plan node from given ForeignJoinPath.
+ *
+ */
+static ForeignScan *
+postgresGetForeignJoinPlan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan)
+{
+ ForeignScan *join_plan;
+ List *params_list = NIL;
+ List *fdw_private = NIL;
+ List *retrieved_attrs = NIL;
+ Relids relids;
+ StringInfoData sql;
+ ForeignPath *path_o;
+ ForeignPath *path_i;
+ List *retrieved_attrs_o = NIL;
+ List *retrieved_attrs_i = NIL;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwDeparseRel dr_o;
+ PgFdwDeparseRel dr_i;
+
+ /*
+ * At the moment we support only joins between foreign tables. This
+ * limitation will be relaxed in future releases.
+ */
+ Assert(IsA(outer_plan, ForeignScan));
+ Assert(IsA(inner_plan, ForeignScan));
+
+ /*
+ * Retrieve Path and PgFdwRelationInfo of underlying ForeignScan to reuse
+ * various information cumputed in ForeignScan planning.
+ */
+ path_o = (ForeignPath *) best_path->jpath.outerjoinpath;
+ fpinfo_o = path_o->path.parent->fdw_private;
+ path_i = (ForeignPath *) best_path->jpath.innerjoinpath;
+ fpinfo_i = path_i->path.parent->fdw_private;
+
+ /*
+ * Construcr deparse information for two relations.
+ */
+ dr_o.baserel = path_o->path.parent;
+ dr_o.jointype = JOIN_INNER;
+ dr_o.joinclauses = NIL;
+ dr_o.attrs_used = fpinfo_o->attrs_used;
+ dr_o.retrieved_attrs = &retrieved_attrs_o;
+ dr_i.baserel = path_i->path.parent;
+ dr_i.jointype = best_path->jpath.jointype;
+ dr_i.joinclauses = joinclauses;
+ dr_i.attrs_used = fpinfo_i->attrs_used;
+ dr_i.retrieved_attrs = &retrieved_attrs_i;
+
+ relids = NULL;
+ relids = bms_add_member(relids, dr_o.baserel->relid);
+ relids = bms_add_member(relids, dr_i.baserel->relid);
+
+ initStringInfo(&sql);
+ deparseSelectSql(&sql, root, list_make2(&dr_o, &dr_i));
+ if (fpinfo_o->remote_conds)
+ appendWhereClause(&sql, root, relids, fpinfo_o->remote_conds, true,
+ ¶ms_list);
+ if (fpinfo_i->remote_conds)
+ appendWhereClause(&sql, root, relids, fpinfo_i->remote_conds,
+ (fpinfo_o->remote_conds == NULL), ¶ms_list);
+
+ /*
+ * Different from ForeignScan, we store retrieved_attrs as a list of lists.
+ * This allows subsequent processing to distinguish which relation is the
+ * source.
+ */
+ retrieved_attrs = list_make2(list_copy(*dr_o.retrieved_attrs),
+ list_copy(*dr_i.retrieved_attrs));
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+ elog(DEBUG1, "sql: %s", sql.data);
+ elog(DEBUG1, "retrieved_attrs: %s", nodeToString(retrieved_attrs));
+
+ join_plan = make_foreignscan(tlist,
+ NIL,
+ 0,
+ params_list,
+ fdw_private);
+ return join_plan;
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..c71bf21 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -39,6 +39,13 @@ extern int ExtractConnectionOptions(List *defelems,
const char **values);
/* in deparse.c */
+typedef struct PgFdwDeparseRel {
+ RelOptInfo *baserel;
+ JoinType jointype;
+ List *joinclauses;
+ Bitmapset *attrs_used;
+ List **retrieved_attrs;
+} PgFdwDeparseRel;
extern void classifyConditions(PlannerInfo *root,
RelOptInfo *baserel,
List *input_conds,
@@ -49,12 +56,10 @@ extern bool is_foreign_expr(PlannerInfo *root,
Expr *expr);
extern void deparseSelectSql(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
- Bitmapset *attrs_used,
- List **retrieved_attrs);
+ List *rels);
extern void appendWhereClause(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ Relids relids,
List *exprs,
bool is_first,
List **params);
diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index df69a95..d77eeea 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -250,6 +250,29 @@ GetForeignTable(Oid relid)
/*
+ * GetForeignTableServerOid - Get OID of the server related to the given
+ * foreign table.
+ */
+Oid
+GetForeignTableServerOid(Oid relid)
+{
+ Form_pg_foreign_table tableform;
+ HeapTuple tp;
+ Oid serverid;
+
+ tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for foreign table %u", relid);
+ tableform = (Form_pg_foreign_table) GETSTRUCT(tp);
+ serverid = tableform->ftserver;
+
+ ReleaseSysCache(tp);
+
+ return serverid;
+}
+
+
+/*
* GetForeignColumnOptions - Get attfdwoptions of given relation/attnum
* as list of DefElem.
*/
@@ -310,12 +333,8 @@ static Oid
GetFdwHandlerByRelId(Oid relid)
{
HeapTuple tp;
- Form_pg_foreign_data_wrapper fdwform;
- Form_pg_foreign_server serverform;
Form_pg_foreign_table tableform;
Oid serverid;
- Oid fdwid;
- Oid fdwhandler;
/* Get server OID for the foreign table. */
tp = SearchSysCache1(FOREIGNTABLEREL, ObjectIdGetDatum(relid));
@@ -325,6 +344,16 @@ GetFdwHandlerByRelId(Oid relid)
serverid = tableform->ftserver;
ReleaseSysCache(tp);
+ return GetFdwRoutineByServerId(serverid);
+}
+
+FdwRoutine *
+GetFdwRoutineByServerId(Oid serverid)
+{
+ HeapTuple tp;
+ Form_pg_foreign_server serverform;
+ Oid fdwid;
+
/* Get foreign-data wrapper OID for the server. */
tp = SearchSysCache1(FOREIGNSERVEROID, ObjectIdGetDatum(serverid));
if (!HeapTupleIsValid(tp))
@@ -333,6 +362,16 @@ GetFdwHandlerByRelId(Oid relid)
fdwid = serverform->srvfdw;
ReleaseSysCache(tp);
+ return GetFdwRoutineByFdwId(fdwid);
+}
+
+FdwRoutine *
+GetFdwRoutineByFdwId(Oid fdwid)
+{
+ HeapTuple tp;
+ Form_pg_foreign_data_wrapper fdwform;
+ Oid fdwhandler;
+
/* Get handler function OID for the FDW. */
tp = SearchSysCache1(FOREIGNDATAWRAPPEROID, ObjectIdGetDatum(fdwid));
if (!HeapTupleIsValid(tp))
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c4a06fc..048db39 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1703,6 +1703,16 @@ _outHashPath(StringInfo str, const HashPath *node)
}
static void
+_outForeignJoinPath(StringInfo str, const ForeignJoinPath *node)
+{
+ WRITE_NODE_TYPE("FOREIGNJOINPATH");
+
+ _outJoinPathInfo(str, (const JoinPath *) node);
+
+ WRITE_NODE_FIELD(fdw_private);
+}
+
+static void
_outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
{
WRITE_NODE_TYPE("PLANNERGLOBAL");
@@ -1801,6 +1811,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(subplan);
WRITE_NODE_FIELD(subroot);
WRITE_NODE_FIELD(subplan_params);
+ WRITE_OID_FIELD(fdw_handler);
/* we don't try to print fdwroutine or fdw_private */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_NODE_FIELD(joininfo);
@@ -3125,6 +3136,9 @@ _outNode(StringInfo str, const void *obj)
case T_HashPath:
_outHashPath(str, obj);
break;
+ case T_ForeignJoinPath:
+ _outForeignJoinPath(str, obj);
+ break;
case T_PlannerGlobal:
_outPlannerGlobal(str, obj);
break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..a8506fc 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1782,8 +1782,8 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors)
{
- Path *outer_path = path->outerjoinpath;
- Path *inner_path = path->innerjoinpath;
+ Path *outer_path = path->jpath.outerjoinpath;
+ Path *inner_path = path->jpath.innerjoinpath;
double outer_path_rows = outer_path->rows;
double inner_path_rows = inner_path->rows;
Cost startup_cost = workspace->startup_cost;
@@ -1794,10 +1794,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
double ntuples;
/* Mark the path with the correct row estimate */
- if (path->path.param_info)
- path->path.rows = path->path.param_info->ppi_rows;
+ if (path->jpath.path.param_info)
+ path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
else
- path->path.rows = path->path.parent->rows;
+ path->jpath.path.rows = path->jpath.path.parent->rows;
/*
* We could include disable_cost in the preliminary estimate, but that
@@ -1809,7 +1809,7 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
/* cost of source data */
- if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI)
+ if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI)
{
double outer_matched_rows = workspace->outer_matched_rows;
Selectivity inner_scan_frac = workspace->inner_scan_frac;
@@ -1856,13 +1856,13 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
}
/* CPU costs */
- cost_qual_eval(&restrict_qual_cost, path->joinrestrictinfo, root);
+ cost_qual_eval(&restrict_qual_cost, path->jpath.joinrestrictinfo, root);
startup_cost += restrict_qual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + restrict_qual_cost.per_tuple;
run_cost += cpu_per_tuple * ntuples;
- path->path.startup_cost = startup_cost;
- path->path.total_cost = startup_cost + run_cost;
+ path->jpath.path.startup_cost = startup_cost;
+ path->jpath.path.total_cost = startup_cost + run_cost;
}
/*
@@ -3306,14 +3306,14 @@ compute_semi_anti_join_factors(PlannerInfo *root,
static bool
has_indexed_join_quals(NestPath *joinpath)
{
- Relids joinrelids = joinpath->path.parent->relids;
- Path *innerpath = joinpath->innerjoinpath;
+ Relids joinrelids = joinpath->jpath.path.parent->relids;
+ Path *innerpath = joinpath->jpath.innerjoinpath;
List *indexclauses;
bool found_one;
ListCell *lc;
/* If join still has quals to evaluate, it's not fast */
- if (joinpath->joinrestrictinfo != NIL)
+ if (joinpath->jpath.joinrestrictinfo != NIL)
return false;
/* Nor if the inner path isn't parameterized at all */
if (innerpath->param_info == NULL)
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5a24efa..04e59e6 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,6 +17,7 @@
#include <math.h>
#include "executor/executor.h"
+#include "foreign/fdwapi.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -52,7 +53,6 @@ static List *select_mergejoin_clauses(PlannerInfo *root,
JoinType jointype,
bool *mergejoin_allowed);
-
/*
* add_paths_to_joinrel
* Given a join relation and two component rels from which it can be made,
@@ -209,7 +209,29 @@ add_paths_to_joinrel(PlannerInfo *root,
extra_lateral_rels = NULL;
/*
- * 1. Consider mergejoin paths where both relations must be explicitly
+ * 1. Consider foreignjoin paths when both outer and inner relations are
+ * managed by same foreign-data wrapper, and share same server. Besides it,
+ * checkAsUser of all relations in the join must match. These limitations
+ * ensure that
+ * This is done preceding to any local join consideration because
+ * foreign join would be cheapst in most case when joining on remote side
+ * is possible.
+ */
+ if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+ {
+ joinrel->fdwroutine->GetForeignJoinPath(root,
+ joinrel,
+ outerrel,
+ innerrel,
+ jointype,
+ sjinfo,
+ &semifactors,
+ restrictlist,
+ extra_lateral_rels);
+ }
+
+ /*
+ * 2. Consider mergejoin paths where both relations must be explicitly
* sorted. Skip this if we can't mergejoin.
*/
if (mergejoin_allowed)
@@ -219,7 +241,7 @@ add_paths_to_joinrel(PlannerInfo *root,
param_source_rels, extra_lateral_rels);
/*
- * 2. Consider paths where the outer relation need not be explicitly
+ * 3. Consider paths where the outer relation need not be explicitly
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle right/full
@@ -234,7 +256,7 @@ add_paths_to_joinrel(PlannerInfo *root,
#ifdef NOT_USED
/*
- * 3. Consider paths where the inner relation need not be explicitly
+ * 4. Consider paths where the inner relation need not be explicitly
* sorted. This includes mergejoins only (nestloops were already built in
* match_unsorted_outer).
*
@@ -252,7 +274,7 @@ add_paths_to_joinrel(PlannerInfo *root,
#endif
/*
- * 4. Consider paths where both outer and inner relations must be hashed
+ * 5. Consider paths where both outer and inner relations must be hashed
* before being joined. As above, disregard enable_hashjoin for full
* joins, because there may be no other alternative.
*/
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 06bea4d..d20fb50 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -44,6 +44,7 @@
#include "utils/lsyscache.h"
+static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
static Plan *create_scan_plan(PlannerInfo *root, Path *best_path);
static List *build_path_tlist(PlannerInfo *root, Path *path);
static bool use_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
@@ -82,11 +83,14 @@ static CustomScan *create_customscan_plan(PlannerInfo *root,
CustomPath *best_path,
List *tlist, List *scan_clauses);
static NestLoop *create_nestloop_plan(PlannerInfo *root, NestPath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static MergeJoin *create_mergejoin_plan(PlannerInfo *root, MergePath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
static HashJoin *create_hashjoin_plan(PlannerInfo *root, HashPath *best_path,
- Plan *outer_plan, Plan *inner_plan);
+ List *tlist, Plan *outer_plan, Plan *inner_plan);
+static ForeignScan *create_foreignjoin_plan(PlannerInfo *root,
+ ForeignJoinPath *best_path, List *tlist, Plan *outer_plan,
+ Plan *inner_plan);
static Node *replace_nestloop_params(PlannerInfo *root, Node *expr);
static Node *replace_nestloop_params_mutator(Node *node, PlannerInfo *root);
static void process_subquery_nestloop_params(PlannerInfo *root,
@@ -219,7 +223,7 @@ create_plan(PlannerInfo *root, Path *best_path)
* create_plan_recurse
* Recursive guts of create_plan().
*/
-Plan *
+static Plan *
create_plan_recurse(PlannerInfo *root, Path *best_path)
{
Plan *plan;
@@ -240,6 +244,7 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
case T_CustomScan:
plan = create_scan_plan(root, best_path);
break;
+ case T_ForeignJoinPath:
case T_HashJoin:
case T_MergeJoin:
case T_NestLoop:
@@ -610,6 +615,7 @@ create_gating_plan(PlannerInfo *root, Plan *plan, List *quals)
static Plan *
create_join_plan(PlannerInfo *root, JoinPath *best_path)
{
+ List *tlist;
Plan *outer_plan;
Plan *inner_plan;
Plan *plan;
@@ -624,27 +630,41 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
inner_plan = create_plan_recurse(root, best_path->innerjoinpath);
+ if (best_path->path.pathtype == T_NestLoop)
+ {
+ /* Restore curOuterRels */
+ bms_free(root->curOuterRels);
+ root->curOuterRels = saveOuterRels;
+ }
+ tlist = build_path_tlist(root, &best_path->path);
+
switch (best_path->path.pathtype)
{
+ case T_ForeignJoinPath:
+ plan = (Plan *) create_foreignjoin_plan(root,
+ (ForeignJoinPath *) best_path,
+ tlist,
+ outer_plan,
+ inner_plan);
+ break;
case T_MergeJoin:
plan = (Plan *) create_mergejoin_plan(root,
(MergePath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
case T_HashJoin:
plan = (Plan *) create_hashjoin_plan(root,
(HashPath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
case T_NestLoop:
- /* Restore curOuterRels */
- bms_free(root->curOuterRels);
- root->curOuterRels = saveOuterRels;
-
plan = (Plan *) create_nestloop_plan(root,
(NestPath *) best_path,
+ tlist,
outer_plan,
inner_plan);
break;
@@ -2114,12 +2134,12 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
static NestLoop *
create_nestloop_plan(PlannerInfo *root,
NestPath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
NestLoop *join_plan;
- List *tlist = build_path_tlist(root, &best_path->path);
- List *joinrestrictclauses = best_path->joinrestrictinfo;
+ List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
List *joinclauses;
List *otherclauses;
Relids outerrelids;
@@ -2133,7 +2153,7 @@ create_nestloop_plan(PlannerInfo *root,
/* Get the join qual clauses (in plain expression form) */
/* Any pseudoconstant clauses are ignored here */
- if (IS_OUTER_JOIN(best_path->jointype))
+ if (IS_OUTER_JOIN(best_path->jpath.jointype))
{
extract_actual_join_clauses(joinrestrictclauses,
&joinclauses, &otherclauses);
@@ -2146,7 +2166,7 @@ create_nestloop_plan(PlannerInfo *root,
}
/* Replace any outer-relation variables with nestloop params */
- if (best_path->path.param_info)
+ if (best_path->jpath.path.param_info)
{
joinclauses = (List *)
replace_nestloop_params(root, (Node *) joinclauses);
@@ -2158,7 +2178,7 @@ create_nestloop_plan(PlannerInfo *root,
* Identify any nestloop parameters that should be supplied by this join
* node, and move them from root->curOuterParams to the nestParams list.
*/
- outerrelids = best_path->outerjoinpath->parent->relids;
+ outerrelids = best_path->jpath.outerjoinpath->parent->relids;
nestParams = NIL;
prev = NULL;
for (cell = list_head(root->curOuterParams); cell; cell = next)
@@ -2195,9 +2215,9 @@ create_nestloop_plan(PlannerInfo *root,
nestParams,
outer_plan,
inner_plan,
- best_path->jointype);
+ best_path->jpath.jointype);
- copy_path_costsize(&join_plan->join.plan, &best_path->path);
+ copy_path_costsize(&join_plan->join.plan, &best_path->jpath.path);
return join_plan;
}
@@ -2205,10 +2225,10 @@ create_nestloop_plan(PlannerInfo *root,
static MergeJoin *
create_mergejoin_plan(PlannerInfo *root,
MergePath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
- List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinclauses;
List *otherclauses;
List *mergeclauses;
@@ -2500,10 +2520,10 @@ create_mergejoin_plan(PlannerInfo *root,
static HashJoin *
create_hashjoin_plan(PlannerInfo *root,
HashPath *best_path,
+ List *tlist,
Plan *outer_plan,
Plan *inner_plan)
{
- List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinclauses;
List *otherclauses;
List *hashclauses;
@@ -2622,6 +2642,53 @@ create_hashjoin_plan(PlannerInfo *root,
return join_plan;
}
+/*
+ * Unlike other join paths, ForeignJoinPath is transformed into ForiegnScan
+ * plan node.
+ */
+static ForeignScan *
+create_foreignjoin_plan(PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ Plan *outer_plan,
+ Plan *inner_plan)
+{
+ ForeignScan *join_plan;
+ List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
+ List *joinclauses;
+ List *otherclauses;
+
+ /* Sort join qual clauses into best execution order */
+ joinrestrictclauses = order_qual_clauses(root, joinrestrictclauses);
+
+ /* Get the join qual clauses (in plain expression form) */
+ /* Any pseudoconstant clauses are ignored here */
+ if (IS_OUTER_JOIN(best_path->jpath.jointype))
+ {
+ extract_actual_join_clauses(joinrestrictclauses,
+ &joinclauses, &otherclauses);
+ }
+ else
+ {
+ /* We can treat all clauses alike for an inner join */
+ joinclauses = extract_actual_clauses(joinrestrictclauses, false);
+ otherclauses = NIL;
+ }
+
+ /* Call FDW handler */
+ {
+ RelOptInfo *rel = best_path->jpath.path.parent;
+
+ Assert(rel->fdwroutine);
+ join_plan = rel->fdwroutine->GetForeignJoinPlan(root, best_path,
+ tlist, joinclauses,
+ otherclauses,
+ outer_plan, inner_plan);
+ join_plan->fdw_handler = rel->fdw_handler;
+ }
+
+ return join_plan;
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 1395a21..d6434bf 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1710,9 +1710,9 @@ create_nestloop_path(PlannerInfo *root,
restrict_clauses = jclauses;
}
- pathnode->path.pathtype = T_NestLoop;
- pathnode->path.parent = joinrel;
- pathnode->path.param_info =
+ pathnode->jpath.path.pathtype = T_NestLoop;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
get_joinrel_parampathinfo(root,
joinrel,
outer_path,
@@ -1720,11 +1720,11 @@ create_nestloop_path(PlannerInfo *root,
sjinfo,
required_outer,
&restrict_clauses);
- pathnode->path.pathkeys = pathkeys;
- pathnode->jointype = jointype;
- pathnode->outerjoinpath = outer_path;
- pathnode->innerjoinpath = inner_path;
- pathnode->joinrestrictinfo = restrict_clauses;
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
final_cost_nestloop(root, pathnode, workspace, sjinfo, semifactors);
@@ -1859,6 +1859,58 @@ create_hashjoin_path(PlannerInfo *root,
}
/*
+ * create_foreignjoin_path
+ * Creates a pathnode corresponding to a foreign join between two relations.
+ * Unlike similar funcitons for other join types, final_cost_foreignjoin is
+ * not called, so FDW have to take care of cost information.
+ *
+ * 'joinrel' is the join relation
+ * 'jointype' is the type of join required
+ * 'sjinfo' is extra info about the join for selectivity estimation
+ * 'semifactors' contains valid data if jointype is SEMI or ANTI
+ * 'outer_path' is the cheapest outer path
+ * 'inner_path' is the cheapest inner path
+ * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
+ * 'required_outer' is the set of required outer rels
+ * 'foreignclauses' are the RestrictInfo nodes to use as foreign clauses
+ * (this should be a subset of the restrict_clauses list)
+ */
+ForeignJoinPath *
+create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer)
+{
+ ForeignJoinPath *pathnode = makeNode(ForeignJoinPath);
+
+ pathnode->jpath.path.pathtype = T_ForeignJoinPath;
+ pathnode->jpath.path.parent = joinrel;
+ pathnode->jpath.path.param_info =
+ get_joinrel_parampathinfo(root,
+ joinrel,
+ outer_path,
+ inner_path,
+ sjinfo,
+ required_outer,
+ &restrict_clauses);
+ pathnode->jpath.path.pathkeys = pathkeys;
+ pathnode->jpath.jointype = jointype;
+ pathnode->jpath.outerjoinpath = outer_path;
+ pathnode->jpath.innerjoinpath = inner_path;
+ pathnode->jpath.joinrestrictinfo = restrict_clauses;
+
+ pathnode->fdw_private = NIL;
+
+ return pathnode;
+}
+
+/*
* reparameterize_path
* Attempt to modify a Path to have greater parameterization
*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a4a35c3..57763d4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
#include "catalog/catalog.h"
#include "catalog/heap.h"
#include "foreign/fdwapi.h"
+#include "foreign/foreign.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ca71093..667ae1b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -122,6 +122,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->subplan = NULL;
rel->subroot = NULL;
rel->subplan_params = NIL;
+ rel->fdw_handler = InvalidOid;
rel->fdwroutine = NULL;
rel->fdw_private = NULL;
rel->baserestrictinfo = NIL;
@@ -384,7 +385,17 @@ build_join_rel(PlannerInfo *root,
joinrel->subplan = NULL;
joinrel->subroot = NULL;
joinrel->subplan_params = NIL;
- joinrel->fdwroutine = NULL;
+ /* propagate common FDW information up to join relation */
+ if (inner_rel->fdw_handler == outer_rel->fdw_handler)
+ {
+ joinrel->fdwroutine = inner_rel->fdwroutine;
+ joinrel->fdw_handler = inner_rel->fdw_handler;
+ }
+ else
+ {
+ joinrel->fdw_handler = InvalidOid;
+ joinrel->fdwroutine = NULL;
+ }
joinrel->fdw_private = NULL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index b494ff2..b1f8532 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,24 @@ typedef void (*EndForeignModify_function) (EState *estate,
typedef int (*IsForeignRelUpdatable_function) (Relation rel);
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels);
+
+typedef ForeignScan *(*GetForeignJoinPlan_function) (PlannerInfo *root,
+ ForeignJoinPath *best_path,
+ List *tlist,
+ List *joinclauses,
+ List *otherclauses,
+ Plan *outer_plan,
+ Plan *inner_plan);
+
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
@@ -150,12 +168,19 @@ typedef struct FdwRoutine
/* Support functions for IMPORT FOREIGN SCHEMA */
ImportForeignSchema_function ImportForeignSchema;
+
+ /* Support functions for join push-down */
+ GetForeignJoinPath_function GetForeignJoinPath;
+ GetForeignJoinPlan_function GetForeignJoinPlan;
+
} FdwRoutine;
/* Functions in foreign/foreign.c */
extern FdwRoutine *GetFdwRoutine(Oid fdwhandler);
extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
+extern FdwRoutine * GetFdwRoutineByServerId(Oid serverid);
+extern FdwRoutine * GetFdwRoutineByFdwId(Oid fdwid);
extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
extern Oid GetFdwHandlerForRelation(Relation relation);
extern bool IsImportableForeignTable(const char *tablename,
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 9c737b4..35acae7 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -75,6 +75,7 @@ extern ForeignDataWrapper *GetForeignDataWrapper(Oid fdwid);
extern ForeignDataWrapper *GetForeignDataWrapperByName(const char *name,
bool missing_ok);
extern ForeignTable *GetForeignTable(Oid relid);
+extern Oid GetForeignTableServerOid(Oid relid);
extern List *GetForeignColumnOptions(Oid relid, AttrNumber attnum);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..0f7a15d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -224,6 +224,7 @@ typedef enum NodeTag
T_NestPath,
T_MergePath,
T_HashPath,
+ T_ForeignJoinPath,
T_TidPath,
T_ForeignPath,
T_CustomPath,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9ef0b56..9914d1d 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1046,7 +1046,10 @@ typedef struct JoinPath
* A nested-loop path needs no special fields.
*/
-typedef JoinPath NestPath;
+typedef struct NestPath
+{
+ JoinPath jpath;
+} NestPath;
/*
* A mergejoin path has these fields.
@@ -1102,6 +1105,22 @@ typedef struct HashPath
} HashPath;
/*
+ * ForeignJoinPath represents a join between two relations consist of foreign
+ * table.
+ *
+ * fdw_private stores FDW private data about the join. While fdw_private is
+ * not actually touched by the core code during normal operations, it's
+ * generally a good idea to use a representation that can be dumped by
+ * nodeToString(), so that you can examine the structure during debugging
+ * with tools like pprint().
+ */
+typedef struct ForeignJoinPath
+{
+ JoinPath jpath;
+ List *fdw_private;
+} ForeignJoinPath;
+
+/*
* Restriction clause info.
*
* We create one of these for each AND sub-clause of a restriction condition
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9923f0e..d4b6498 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -124,6 +124,17 @@ extern HashPath *create_hashjoin_path(PlannerInfo *root,
Relids required_outer,
List *hashclauses);
+extern ForeignJoinPath *create_foreignjoin_path(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ Path *outer_path,
+ Path *inner_path,
+ List *restrict_clauses,
+ List *pathkeys,
+ Relids required_outer);
+
extern Path *reparameterize_path(PlannerInfo *root, Path *path,
Relids required_outer,
double loop_count);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e66eaa5..082f7d7 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,7 +41,6 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
* prototypes for plan/createplan.c
*/
extern Plan *create_plan(PlannerInfo *root, Path *best_path);
-extern Plan *create_plan_recurse(PlannerInfo *root, Path *best_path);
extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
Index scanrelid, Plan *subplan);
extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
Let me put some comments in addition to where you're checking now.
[design issues]
* Cost estimation
Estimation and evaluation of cost for remote join query is not an
obvious issue. In principle, local side cannot determine the cost
to run remote join without remote EXPLAIN, because local side has
no information about JOIN logic applied on the remote side.
Probably, we have to put an assumption for remote join algorithm,
because local planner has no idea about remote planner's choice
unless foreign-join don't take "use_remote_estimate".
I think, it is reasonable assumption (even if it is incorrect) to
calculate remote join cost based on local hash-join algorithm.
If user wants more correct estimation, remote EXPLAIN will make
more reliable cost estimation.
It also needs a consensus whether cost for remote CPU execution is
equivalent to local CPU. If we think local CPU is rare resource
than remote one, a discount rate will make planner more preferable
to choose remote join than local one.
Once we assume a join algorithm for remote join, unit cost for
remote CPU, we can calculate a cost for foreign join based on
the local join logic plus cost for network translation (maybe
fdw_tuple_cost?).
* FDW options
Unlike table scan, FDW options we should refer is unclear.
Table level FDW options are associated with a foreign table as
literal. I think we have two options here:
1. Foreign-join refers FDW options for foreign-server, but ones
for foreign-tables are ignored.
2. Foreign-join is prohibited when both of relations don't have
identical FDW options.
My preference is 2. Even though N-way foreign join, it ensures
all the tables involved with (N-1)-way foreign join has identical
FDW options, thus it leads we can make N-way foreign join with
all identical FDW options.
One exception is "updatable" flag of postgres_fdw. It does not
make sense on remote join, so I think mixture of updatable and
non-updatable foreign tables should be admitted, however, it is
a decision by FDW driver.
Probably, above points need to take time for getting consensus.
I'd like to see your opinion prior to editing your patch.
[implementation issues]
The interface does not intend to add new Path/Plan type for each scan
that replaces foreign joins. What postgres_fdw should do is, adding
ForeignPath towards a particular joinrel, then it populates ForeignScan
with remote join query once it got chosen by the planner.
A few functions added in src/backend/foreign/foreign.c are not
called by anywhere, at this moment.
create_plan_recurse() is reverted to static. It is needed for custom-
join enhancement, if no other infrastructure can support.
I'll check the code to construct remote query later.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
Sent: Monday, February 16, 2015 1:54 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; PostgreSQL-development
Subject: ##freemail## Re: [HACKERS] Join push-down support for foreign
tablesKaigai-san,
Oops. I rebased the patch onto your v4 custom/foreign join patch.
But as you mentioned off-list, I found a flaw about inappropriate change
about NestPath still remains in the patch... I might have made my dev branch
into unexpected state. I'll check it soon.2015-02-16 13:13 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Hanada-san,
Your patch mixtures enhancement of custom-/foreign-scan interface and
enhancement of contrib/postgres_fdw... Probably, it is a careless mis-
operation.
Please make your patch as differences from my infrastructure portion.Also, I noticed this "Join pushdown support for foreign tables" patch
is unintentionally rejected in the last commit fest.
https://commitfest.postgresql.org/3/20/
I couldn't register myself as reviewer. How do I operate it on the new
commitfest application?Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru
Hanada
Sent: Monday, February 16, 2015 1:03 PM
To: Robert Haas
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi
I've revised the patch based on Kaigai-san's custom/foreign join
patch posted in the thread below./messages/by-id/9A28C8860F777E439AA12E8AEA7694F8
0
108C355@BPXM15GP.gisp.nec.co.jpBasically not changed from the version in the last CF, but as Robert
commented before, N-way (not only 2-way) joins should be supported in
the first version by construct SELECT SQL by containing source query
in FROM clause as inline views (a.k.a. from clause subquery).2014-12-26 13:48 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2014-12-16 1:22 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
<shigeru.hanada@gmail.com> wrote:I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below.I'm glad you are working on this.
1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs)
needs to check that 1) all foreign tables in the join belong to a
server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join.
This limiation makes SQL generator simple. Fundamentally it's
possible to join even join relations, so N-way join is listed as
enhancement item below.It seems pretty important to me that we have a way to push the
entire join nest down. Being able to push down a 2-way join but
not more seems like quite a severe limitation.Hmm, I agree to support N-way join is very useful. Postgres-XC's
SQL generator seems to give us a hint for such case, I'll check it
out again.--
Shigeru HANADA--
Shigeru HANADA--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
2015-02-17 10:39 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Let me put some comments in addition to where you're checking now.
[design issues]
* Cost estimation
Estimation and evaluation of cost for remote join query is not an
obvious issue. In principle, local side cannot determine the cost
to run remote join without remote EXPLAIN, because local side has
no information about JOIN logic applied on the remote side.
Probably, we have to put an assumption for remote join algorithm,
because local planner has no idea about remote planner's choice
unless foreign-join don't take "use_remote_estimate".
I think, it is reasonable assumption (even if it is incorrect) to
calculate remote join cost based on local hash-join algorithm.
If user wants more correct estimation, remote EXPLAIN will make
more reliable cost estimation.
Hm, I guess that you chose hash-join as "least-costed join". In the
pgbench model, most combination between two tables generate hash join
as cheapest path. Remote EXPLAIN is very expensive in the context of
planning, so it would easily make the plan optimization meaningless.
But giving an option to users is good, I agree.
It also needs a consensus whether cost for remote CPU execution is
equivalent to local CPU. If we think local CPU is rare resource
than remote one, a discount rate will make planner more preferable
to choose remote join than local one
Something like cpu_cost_ratio as a new server-level FDW option?
Once we assume a join algorithm for remote join, unit cost for
remote CPU, we can calculate a cost for foreign join based on
the local join logic plus cost for network translation (maybe
fdw_tuple_cost?).
Yes, sum of these costs is the total cost of a remote join.
o fdw_startup_cost
o hash-join cost, estimated as a local join
o fdw_tuple_cost * rows * width
* FDW options
Unlike table scan, FDW options we should refer is unclear.
Table level FDW options are associated with a foreign table as
literal. I think we have two options here:
1. Foreign-join refers FDW options for foreign-server, but ones
for foreign-tables are ignored.
2. Foreign-join is prohibited when both of relations don't have
identical FDW options.
My preference is 2. Even though N-way foreign join, it ensures
all the tables involved with (N-1)-way foreign join has identical
FDW options, thus it leads we can make N-way foreign join with
all identical FDW options.
One exception is "updatable" flag of postgres_fdw. It does not
make sense on remote join, so I think mixture of updatable and
non-updatable foreign tables should be admitted, however, it is
a decision by FDW driver.Probably, above points need to take time for getting consensus.
I'd like to see your opinion prior to editing your patch.
postgres_fdw can't push down a join which contains foreign tables on
multiple servers, so use_remote_estimate and fdw_startup_cost are the
only FDW options to consider. So we have options for each option.
1-a. If all foreign tables in the join has identical
use_remote_estimate, allow pushing down.
1-b. If any of foreign table in the join has true as
use_remote_estimate, use remote estimate.
2-a. If all foreign tables in the join has identical fdw_startup_cost,
allow pushing down.
2-b. Always use max value in the join. (cost would be more expensive)
2-c. Always use min value in the join. (cost would be cheaper)
I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have
reasonable cost about startup.
I agree about "updatable" option.
[implementation issues]
The interface does not intend to add new Path/Plan type for each scan
that replaces foreign joins. What postgres_fdw should do is, adding
ForeignPath towards a particular joinrel, then it populates ForeignScan
with remote join query once it got chosen by the planner.
That idea is interesting, and make many things simpler. Please let me consider.
A few functions added in src/backend/foreign/foreign.c are not
called by anywhere, at this moment.create_plan_recurse() is reverted to static. It is needed for custom-
join enhancement, if no other infrastructure can support.
I made it back to static because I thought that create_plan_recurse
can be called by core before giving control to FDWs. But I'm not sure
it can be applied to custom scans. I'll recheck that part.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attached is the revised/rebased version of the $SUBJECT.
This patch is based on Kaigai-san's custom/foreign join patch, so
please apply it before this patch. In this version I changed some
points from original postgres_fdw.
1) Disabled SELECT clause optimization
~9.4 postgres_fdw lists only columns actually used in SELECT clause,
but AFAIS it makes SQL generation complex. So I disabled such
optimization and put "NULL" for unnecessary columns in SELECT clause
of remote query.
2) Extended deparse context
To allow deparsing based on multiple source relations, I added some
members to context structure. They are unnecessary for simple query
with single foreign table, but IMO it should be integrated.
With Kaigai-san's advise, changes for supporting foreign join on
postgres_fdw is minimized into postgres_fdw itself. But I added new
FDW API named GetForeignJoinPaths() to keep the policy that all
interface between core and FDW should be in FdwRoutine, instead of
using hook function. Now I'm writing document about it, and will post
it in a day.
2015-02-19 16:19 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2015-02-17 10:39 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Let me put some comments in addition to where you're checking now.
[design issues]
* Cost estimation
Estimation and evaluation of cost for remote join query is not an
obvious issue. In principle, local side cannot determine the cost
to run remote join without remote EXPLAIN, because local side has
no information about JOIN logic applied on the remote side.
Probably, we have to put an assumption for remote join algorithm,
because local planner has no idea about remote planner's choice
unless foreign-join don't take "use_remote_estimate".
I think, it is reasonable assumption (even if it is incorrect) to
calculate remote join cost based on local hash-join algorithm.
If user wants more correct estimation, remote EXPLAIN will make
more reliable cost estimation.Hm, I guess that you chose hash-join as "least-costed join". In the
pgbench model, most combination between two tables generate hash join
as cheapest path. Remote EXPLAIN is very expensive in the context of
planning, so it would easily make the plan optimization meaningless.
But giving an option to users is good, I agree.It also needs a consensus whether cost for remote CPU execution is
equivalent to local CPU. If we think local CPU is rare resource
than remote one, a discount rate will make planner more preferable
to choose remote join than local oneSomething like cpu_cost_ratio as a new server-level FDW option?
Once we assume a join algorithm for remote join, unit cost for
remote CPU, we can calculate a cost for foreign join based on
the local join logic plus cost for network translation (maybe
fdw_tuple_cost?).Yes, sum of these costs is the total cost of a remote join.
o fdw_startup_cost
o hash-join cost, estimated as a local join
o fdw_tuple_cost * rows * width* FDW options
Unlike table scan, FDW options we should refer is unclear.
Table level FDW options are associated with a foreign table as
literal. I think we have two options here:
1. Foreign-join refers FDW options for foreign-server, but ones
for foreign-tables are ignored.
2. Foreign-join is prohibited when both of relations don't have
identical FDW options.
My preference is 2. Even though N-way foreign join, it ensures
all the tables involved with (N-1)-way foreign join has identical
FDW options, thus it leads we can make N-way foreign join with
all identical FDW options.
One exception is "updatable" flag of postgres_fdw. It does not
make sense on remote join, so I think mixture of updatable and
non-updatable foreign tables should be admitted, however, it is
a decision by FDW driver.Probably, above points need to take time for getting consensus.
I'd like to see your opinion prior to editing your patch.postgres_fdw can't push down a join which contains foreign tables on
multiple servers, so use_remote_estimate and fdw_startup_cost are the
only FDW options to consider. So we have options for each option.1-a. If all foreign tables in the join has identical
use_remote_estimate, allow pushing down.
1-b. If any of foreign table in the join has true as
use_remote_estimate, use remote estimate.2-a. If all foreign tables in the join has identical fdw_startup_cost,
allow pushing down.
2-b. Always use max value in the join. (cost would be more expensive)
2-c. Always use min value in the join. (cost would be cheaper)I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have
reasonable cost about startup.I agree about "updatable" option.
[implementation issues]
The interface does not intend to add new Path/Plan type for each scan
that replaces foreign joins. What postgres_fdw should do is, adding
ForeignPath towards a particular joinrel, then it populates ForeignScan
with remote join query once it got chosen by the planner.That idea is interesting, and make many things simpler. Please let me consider.
A few functions added in src/backend/foreign/foreign.c are not
called by anywhere, at this moment.create_plan_recurse() is reverted to static. It is needed for custom-
join enhancement, if no other infrastructure can support.I made it back to static because I thought that create_plan_recurse
can be called by core before giving control to FDWs. But I'm not sure
it can be applied to custom scans. I'll recheck that part.--
Shigeru HANADA
--
Shigeru HANADA
Attachments:
foreign_join_v2.patchapplication/octet-stream; name=foreign_join_v2.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..07cd629 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,7 +44,9 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
#include "optimizer/clauses.h"
#include "optimizer/var.h"
#include "parser/parsetree.h"
@@ -60,7 +62,8 @@
typedef struct foreign_glob_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning for */
+ RelOptInfo *outerrel; /* the foreign relation, or outer child */
+ RelOptInfo *innerrel; /* inner child, only set for join */
} foreign_glob_cxt;
/*
@@ -86,9 +89,12 @@ typedef struct foreign_loc_cxt
typedef struct deparse_expr_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning for */
+ RelOptInfo *outerrel; /* the foreign relation, or outer child */
+ RelOptInfo *innerrel; /* inner child, only set for join */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
+ ForeignScan *outerplan; /* outer child's ForeignScan node */
+ ForeignScan *innerplan; /* inner child's ForeignScan node */
} deparse_expr_cxt;
/*
@@ -160,7 +166,7 @@ classifyConditions(PlannerInfo *root,
{
RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
- if (is_foreign_expr(root, baserel, ri->clause))
+ if (is_foreign_expr(root, baserel, NULL, ri->clause))
*remote_conds = lappend(*remote_conds, ri);
else
*local_conds = lappend(*local_conds, ri);
@@ -172,7 +178,8 @@ classifyConditions(PlannerInfo *root,
*/
bool
is_foreign_expr(PlannerInfo *root,
- RelOptInfo *baserel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
Expr *expr)
{
foreign_glob_cxt glob_cxt;
@@ -183,7 +190,8 @@ is_foreign_expr(PlannerInfo *root,
* remotely.
*/
glob_cxt.root = root;
- glob_cxt.foreignrel = baserel;
+ glob_cxt.outerrel = outerrel;
+ glob_cxt.innerrel = innerrel;
loc_cxt.collation = InvalidOid;
loc_cxt.state = FDW_COLLATE_NONE;
if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt))
@@ -250,7 +258,7 @@ foreign_expr_walker(Node *node,
* Param's collation, ie it's not safe for it to have a
* non-default collation.
*/
- if (var->varno == glob_cxt->foreignrel->relid &&
+ if (var->varno == glob_cxt->outerrel->relid &&
var->varlevelsup == 0)
{
/* Var belongs to foreign table */
@@ -743,18 +751,22 @@ deparseTargetList(StringInfo buf,
if (attr->attisdropped)
continue;
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
if (have_wholerow ||
bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
attrs_used))
{
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
deparseColumnRef(buf, rtindex, i, root);
- *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
+ else
+ appendStringInfoString(buf, "NULL");
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
/*
@@ -794,12 +806,15 @@ deparseTargetList(StringInfo buf,
* so Params and other-relation Vars should be replaced by dummy values.
*/
void
-appendWhereClause(StringInfo buf,
- PlannerInfo *root,
- RelOptInfo *baserel,
- List *exprs,
- bool is_first,
- List **params)
+appendConditions(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ ForeignScan *outerplan,
+ ForeignScan *innerplan,
+ List *exprs,
+ const char *prefix,
+ List **params)
{
deparse_expr_cxt context;
int nestlevel;
@@ -810,9 +825,12 @@ appendWhereClause(StringInfo buf,
/* Set up context struct for recursion */
context.root = root;
- context.foreignrel = baserel;
+ context.outerrel = outerrel;
+ context.innerrel = innerrel;
context.buf = buf;
context.params_list = params;
+ context.outerplan = outerplan;
+ context.innerplan = innerplan;
/* Make sure any constants in the exprs are printed portably */
nestlevel = set_transmission_modes();
@@ -822,22 +840,149 @@ appendWhereClause(StringInfo buf,
RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
/* Connect expressions with "AND" and parenthesize each condition. */
- if (is_first)
- appendStringInfoString(buf, " WHERE ");
- else
- appendStringInfoString(buf, " AND ");
+ if (prefix)
+ appendStringInfo(buf, "%s", prefix);
appendStringInfoChar(buf, '(');
deparseExpr(ri->clause, &context);
appendStringInfoChar(buf, ')');
- is_first = false;
+ prefix= " AND ";
}
reset_transmission_modes(nestlevel);
}
/*
+ * Deparse given Var into buf.
+ */
+static TargetEntry *
+deparseJoinVar(StringInfo buf, Var *var, Path *path_o, Path *path_i, List *tl_o, List *tl_i)
+{
+ List *targetlist;
+ const char *side;
+ ListCell *lc2;
+ TargetEntry *tle = NULL;
+ int j;
+
+ /* Find var from outer/inner subtree */
+ if (bms_is_member(var->varno, path_o->parent->relids))
+ {
+ targetlist = tl_o;
+ side = "l";
+ }
+ else if (bms_is_member(var->varno, path_i->parent->relids))
+ {
+ targetlist = tl_i;
+ side = "r";
+ }
+
+ j = 0;
+ foreach(lc2, targetlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, var))
+ {
+ tle = copyObject(childtle);
+ break;
+ }
+ j++;
+ }
+ Assert(tle);
+
+ appendStringInfo(buf, "%s.a_%d", side, j);
+
+ return tle;
+}
+
+/*
+ * Construct a SELECT statement which contains join clause.
+ *
+ * We also create an TargetEntry List of the columns being retrieved, which is
+ * returned to *fdw_ps_tlist.
+ *
+ * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+ * statement of the outer child relation. postfix _i means those for the inner
+ * child relation. jointype and restrictlist are information of join method.
+ * fdw_ps_tlist is output parameter to pass target list of the pseudo scan to
+ * caller.
+ */
+void
+deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **fdw_ps_tlist)
+{
+ StringInfoData selbuf; /* buffer for SELECT clause */
+ StringInfoData abuf_o; /* buffer for column alias list of outer */
+ StringInfoData abuf_i; /* buffer for column alias list of inner */
+ int i;
+ ListCell *lc;
+ const char *jointype_str;
+
+ jointype_str = jointype == JOIN_INNER ? "INNER" :
+ jointype == JOIN_LEFT ? "LEFT" :
+ jointype == JOIN_RIGHT ? "RIGHT" :
+ jointype == JOIN_FULL ? "FULL" : "";
+
+ /* print SELECT clause of the join scan */
+ /* XXX: should extend deparseTargetList()? */
+ initStringInfo(&selbuf);
+ i = 0;
+ foreach(lc, baserel->reltargetlist)
+ {
+ Var *var = (Var *) lfirst(lc);
+ TargetEntry *tle;
+
+ if (i > 0)
+ appendStringInfoString(&selbuf, ", ");
+ deparseJoinVar(&selbuf, var, path_o, path_i,
+ plan_o->scan.plan.targetlist,
+ plan_i->scan.plan.targetlist);
+
+ tle = makeTargetEntry((Expr *) copyObject(var),
+ i + 1, pstrdup(""), false);
+ if (fdw_ps_tlist)
+ *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+ i++;
+ }
+
+ /* Deparse column alias portion of subquery in FROM clause. */
+ initStringInfo(&abuf_o);
+ initStringInfo(&abuf_i);
+ for (i = 0; i < list_length(plan_o->scan.plan.targetlist); i++)
+ {
+ if (i > 0)
+ appendStringInfoString(&abuf_o, ", ");
+ appendStringInfo(&abuf_o, "a_%d", i);
+ }
+ for (i = 0; i < list_length(plan_i->scan.plan.targetlist); i++)
+ {
+ if (i > 0)
+ appendStringInfoString(&abuf_i, ", ");
+ appendStringInfo(&abuf_i, "a_%d", i);
+ }
+
+ /* Construct SELECT statement */
+ appendStringInfo(sql, "SELECT %s FROM", selbuf.data);
+ appendStringInfo(sql, " (%s) l (%s) %s JOIN (%s) r (%s) ",
+ sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+ /* Append ON clause */
+ appendConditions(sql, root, path_o->parent, path_i->parent, plan_o, plan_i,
+ restrictlist, " ON ", NULL);
+}
+
+/*
* deparse remote INSERT statement
*
* The statement text is appended to buf, and we also create an integer List
@@ -1261,6 +1406,8 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
/*
* Deparse given Var node into context->buf.
*
+ * If context has valid innerrel, this is invoked for a join conditions.
+ *
* If the Var belongs to the foreign relation, just print its remote name.
* Otherwise, it's effectively a Param (and will in fact be a Param at
* run time). Handle it the same way we handle plain Params --- see
@@ -1271,39 +1418,50 @@ deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ if (context->innerrel != NULL)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ deparseJoinVar(context->buf, node,
+ context->outerrel->cheapest_total_path,
+ context->innerrel->cheapest_total_path,
+ context->outerplan->scan.plan.targetlist,
+ context->innerplan->scan.plan.targetlist);
}
else
{
- /* Treat like a Param */
- if (context->params_list)
+ if (node->varno == context->outerrel->relid &&
+ node->varlevelsup == 0)
{
- int pindex = 0;
- ListCell *lc;
-
- /* find its index in params_list */
- foreach(lc, *context->params_list)
+ /* Var belongs to foreign table */
+ deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ }
+ else
+ {
+ /* Treat like a Param */
+ if (context->params_list)
{
- pindex++;
- if (equal(node, (Node *) lfirst(lc)))
- break;
+ int pindex = 0;
+ ListCell *lc;
+
+ /* find its index in params_list */
+ foreach(lc, *context->params_list)
+ {
+ pindex++;
+ if (equal(node, (Node *) lfirst(lc)))
+ break;
+ }
+ if (lc == NULL)
+ {
+ /* not in list, so add it */
+ pindex++;
+ *context->params_list = lappend(*context->params_list, node);
+ }
+
+ printRemoteParam(pindex, node->vartype, node->vartypmod, context);
}
- if (lc == NULL)
+ else
{
- /* not in list, so add it */
- pindex++;
- *context->params_list = lappend(*context->params_list, node);
+ printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
-
- printRemoteParam(pindex, node->vartype, node->vartypmod, context);
- }
- else
- {
- printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
}
}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..92d9b1f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -656,16 +656,16 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Nested Loop
Output: t1.c3, t2.c3
-> Foreign Scan on public.ft1 t1
Output: t1.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 1))
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 2))
(8 rows)
EXECUTE st1(1, 1);
@@ -683,8 +683,8 @@ EXECUTE st1(101, 101);
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -699,7 +699,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
-> Foreign Scan on public.ft2 t2
Output: t2.c3
Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10))
(15 rows)
EXECUTE st2(10, 20);
@@ -717,8 +717,8 @@ EXECUTE st2(101, 121);
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -732,7 +732,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
Output: t2.c3
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
(14 rows)
EXECUTE st3(10, 20);
@@ -1085,7 +1085,7 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
Output: ((ft2_1.c1 + 1000)), ((ft2_1.c2 + 100)), ((ft2_1.c3 || ft2_1.c3))
-> Foreign Scan on public.ft2 ft2_1
Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
- Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1"
+ Remote SQL: SELECT "C 1", c2, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1"
(9 rows)
INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
@@ -1219,7 +1219,7 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, NULL, c8, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -1231,14 +1231,14 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
EXPLAIN (verbose, costs off)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
- QUERY PLAN
-----------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------
Delete on public.ft2
Output: c1, c4
- Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c4
+ Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", NULL, NULL, c4, NULL, NULL, NULL, NULL
-> Foreign Scan on public.ft2
Output: ctid
- Remote SQL: SELECT ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
(6 rows)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
@@ -1360,7 +1360,7 @@ DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.c2
- Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT NULL, c2, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -2594,12 +2594,12 @@ select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
-- Consistent check constraints provide consistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2positive CHECK (c2 >= 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 < 0;
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 < 0;
@@ -2638,12 +2638,12 @@ ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2positive;
-- But inconsistent check constraints provide inconsistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2negative CHECK (c2 < 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 >= 0;
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 >= 0;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..16df515 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -78,10 +79,30 @@ typedef struct PgFdwRelationInfo
ForeignTable *table;
ForeignServer *server;
UserMapping *user; /* only set in use_remote_estimate mode */
+ Oid checkAsUser;
} PgFdwRelationInfo;
/*
- * Indexes of FDW-private information stored in fdw_private lists.
+ * Indexes of FDW-private information stored in fdw_private of ForeignPath.
+ * We use fdw_private of a ForeighPath when the path represents a join which
+ * can be pushed down to remote side.
+ *
+ * 1) Outer child path node
+ * 2) Inner child path node
+ * 3) Join type number(as an Integer node)
+ * 4) RestrictInfo list of join conditions
+ */
+enum FdwPathPrivateIndex
+{
+ FdwPathPrivateOuterPath,
+ FdwPathPrivateInnerPath,
+ FdwPathPrivateJoinType,
+ FdwPathPrivateRestrictList,
+};
+
+/*
+ * Indexes of FDW-private information stored in fdw_private of ForeignScan of
+ * a simple foreign table scan for a SELECT statement.
*
* We store various information in ForeignScan.fdw_private to pass it from
* planner to executor. Currently we store:
@@ -98,7 +119,11 @@ enum FdwScanPrivateIndex
/* SQL statement to execute remotely (as a String node) */
FdwScanPrivateSelectSql,
/* Integer list of attribute numbers retrieved by the SELECT */
- FdwScanPrivateRetrievedAttrs
+ FdwScanPrivateRetrievedAttrs,
+ /* Integer value of server for the scan */
+ FdwScanPrivateServerOid,
+ /* Integer value of checkAsUser for the scan */
+ FdwScanPrivatecheckAsUser,
};
/*
@@ -129,6 +154,7 @@ enum FdwModifyPrivateIndex
typedef struct PgFdwScanState
{
Relation rel; /* relcache entry for the foreign table */
+ TupleDesc tupdesc; /* tuple descriptor of the scan */
AttInMetadata *attinmeta; /* attribute datatype conversion metadata */
/* extracted fdw_private data */
@@ -288,6 +314,15 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -324,6 +359,7 @@ static void analyze_row_processor(PGresult *res, int row,
static HeapTuple make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context);
@@ -368,6 +404,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPath = postgresGetForeignJoinPath;
+
PG_RETURN_POINTER(routine);
}
@@ -385,6 +424,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
{
PgFdwRelationInfo *fpinfo;
ListCell *lc;
+ RangeTblEntry *rte;
/*
* We use PgFdwRelationInfo to pass various information to subsequent
@@ -428,6 +468,13 @@ postgresGetForeignRelSize(PlannerInfo *root,
}
/*
+ * Retrieve RTE to obtain checkAsUser. checkAsUser is used to determine
+ * the user to use to obtain user mapping.
+ */
+ rte = planner_rt_fetch(baserel->relid, root);
+ fpinfo->checkAsUser = rte->checkAsUser;
+
+ /*
* If the table or the server is configured to use remote estimates,
* identify which user to do remote access as during planning. This
* should match what ExecCheckRTEPerms() does. If we fail due to lack of
@@ -435,7 +482,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
*/
if (fpinfo->use_remote_estimate)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
Oid userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
@@ -596,7 +642,7 @@ postgresGetForeignPaths(PlannerInfo *root,
continue;
/* See if it is safe to send to remote */
- if (!is_foreign_expr(root, baserel, rinfo->clause))
+ if (!is_foreign_expr(root, baserel, NULL, rinfo->clause))
continue;
/* Calculate required outer rels for the resulting path */
@@ -672,7 +718,8 @@ postgresGetForeignPaths(PlannerInfo *root,
continue;
/* See if it is safe to send to remote */
- if (!is_foreign_expr(root, baserel, rinfo->clause))
+ if (!is_foreign_expr(root, baserel, NULL,
+ rinfo->clause))
continue;
/* Calculate required outer rels for the resulting path */
@@ -752,6 +799,8 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ List *fdw_ps_tlist = NIL;
+ ForeignScan *scan;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -769,7 +818,7 @@ postgresGetForeignPlan(PlannerInfo *root,
* This code must match "extract_actual_clauses(scan_clauses, false)"
* except for the additional decision about remote versus local execution.
* Note however that we only strip the RestrictInfo nodes from the
- * local_exprs list, since appendWhereClause expects a list of
+ * local_exprs list, since appendConditions expects a list of
* RestrictInfos.
*/
foreach(lc, scan_clauses)
@@ -786,7 +835,7 @@ postgresGetForeignPlan(PlannerInfo *root,
remote_conds = lappend(remote_conds, rinfo);
else if (list_member_ptr(fpinfo->local_conds, rinfo))
local_exprs = lappend(local_exprs, rinfo->clause);
- else if (is_foreign_expr(root, baserel, rinfo->clause))
+ else if (is_foreign_expr(root, baserel, NULL, rinfo->clause))
remote_conds = lappend(remote_conds, rinfo);
else
local_exprs = lappend(local_exprs, rinfo->clause);
@@ -797,64 +846,123 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
- if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
-
- /*
- * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
- * initial row fetch, rather than later on as is done for local tables.
- * The extra roundtrips involved in trying to duplicate the local
- * semantics exactly don't seem worthwhile (see also comments for
- * RowMarkType).
- *
- * Note: because we actually run the query as a cursor, this assumes that
- * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
- */
- if (baserel->relid == root->parse->resultRelation &&
- (root->parse->commandType == CMD_UPDATE ||
- root->parse->commandType == CMD_DELETE))
+ if (scan_relid > 0)
{
- /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
- appendStringInfoString(&sql, " FOR UPDATE");
- }
- else
- {
- RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+ deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
+ &retrieved_attrs);
+ if (remote_conds)
+ appendConditions(&sql, root, baserel, NULL, NULL, NULL,
+ remote_conds, " WHERE ", ¶ms_list);
- if (rc)
+ /*
+ * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
+ * initial row fetch, rather than later on as is done for local tables.
+ * The extra roundtrips involved in trying to duplicate the local
+ * semantics exactly don't seem worthwhile (see also comments for
+ * RowMarkType).
+ *
+ * Note: because we actually run the query as a cursor, this assumes
+ * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+ * before 8.3.
+ */
+ if (baserel->relid == root->parse->resultRelation &&
+ (root->parse->commandType == CMD_UPDATE ||
+ root->parse->commandType == CMD_DELETE))
{
- /*
- * Relation is specified as a FOR UPDATE/SHARE target, so handle
- * that.
- *
- * For now, just ignore any [NO] KEY specification, since (a) it's
- * not clear what that means for a remote table that we don't have
- * complete information about, and (b) it wouldn't work anyway on
- * older remote servers. Likewise, we don't worry about NOWAIT.
- */
- switch (rc->strength)
+ /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+ appendStringInfoString(&sql, " FOR UPDATE");
+ }
+ else
+ {
+ RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+
+ if (rc)
{
- case LCS_FORKEYSHARE:
- case LCS_FORSHARE:
- appendStringInfoString(&sql, " FOR SHARE");
- break;
- case LCS_FORNOKEYUPDATE:
- case LCS_FORUPDATE:
- appendStringInfoString(&sql, " FOR UPDATE");
- break;
+ /*
+ * Relation is specified as a FOR UPDATE/SHARE target, so handle
+ * that.
+ *
+ * For now, just ignore any [NO] KEY specification, since (a)
+ * it's not clear what that means for a remote table that we
+ * don't have complete information about, and (b) it wouldn't
+ * work anyway on older remote servers. Likewise, we don't
+ * worry about NOWAIT.
+ */
+ switch (rc->strength)
+ {
+ case LCS_FORKEYSHARE:
+ case LCS_FORSHARE:
+ appendStringInfoString(&sql, " FOR SHARE");
+ break;
+ case LCS_FORNOKEYUPDATE:
+ case LCS_FORUPDATE:
+ appendStringInfoString(&sql, " FOR UPDATE");
+ break;
+ }
}
}
}
+ else
+ {
+ /* Join case */
+ Path *path_o;
+ Path *path_i;
+ const char *sql_o;
+ const char *sql_i;
+ ForeignScan *plan_o;
+ ForeignScan *plan_i;
+ JoinType jointype;
+ List *restrictlist;
+ int i;
+
+ /*
+ * Retrieve infomation from fdw_private.
+ */
+ path_o = list_nth(best_path->fdw_private, FdwPathPrivateOuterPath);
+ path_i = list_nth(best_path->fdw_private, FdwPathPrivateInnerPath);
+ jointype = intVal(list_nth(best_path->fdw_private,
+ FdwPathPrivateJoinType));
+ restrictlist = list_nth(best_path->fdw_private,
+ FdwPathPrivateRestrictList);
+
+ /*
+ * Construct remote query from bottom to the top. ForeignScan plan
+ * node of underlying scans are node necessary for execute the plan
+ * tree, but it is handy to construct remote query recursively.
+ */
+ plan_o = (ForeignScan *) create_plan_recurse(root, path_o);
+ Assert(IsA(plan_o, ForeignScan));
+ sql_o = strVal(list_nth(plan_o->fdw_private, FdwScanPrivateSelectSql));
+
+ plan_i = (ForeignScan *) create_plan_recurse(root, path_i);
+ Assert(IsA(plan_i, ForeignScan));
+ sql_i = strVal(list_nth(plan_i->fdw_private, FdwScanPrivateSelectSql));
+
+ deparseJoinSql(&sql, root, baserel, path_o, path_i, plan_o, plan_i,
+ sql_o, sql_i, jointype, restrictlist, &fdw_ps_tlist);
+ retrieved_attrs = NIL;
+ for (i = 0; i < list_length(fdw_ps_tlist); i++)
+ retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
+ }
/*
* Build the fdw_private list that will be available to the executor.
* Items in the list must match enum FdwScanPrivateIndex, above.
*/
- fdw_private = list_make2(makeString(sql.data),
- retrieved_attrs);
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+
+ /*
+ * In pseudo scan case such as join push-down, add OID of server and
+ * checkAsUser as extra information.
+ * XXX: passing serverid and checkAsUser might simplify code through
+ * all cases, simple scans and join push-down.
+ */
+ if (scan_relid == 0)
+ {
+ fdw_private = lappend(fdw_private,
+ makeInteger(fpinfo->server->serverid));
+ fdw_private = lappend(fdw_private, makeInteger(fpinfo->checkAsUser));
+ }
/*
* Create the ForeignScan node from target list, local filtering
@@ -864,11 +972,18 @@ postgresGetForeignPlan(PlannerInfo *root,
* field of the finished plan node; we can't keep them in private state
* because then they wouldn't be subject to later planner processing.
*/
- return make_foreignscan(tlist,
+ scan = make_foreignscan(tlist,
local_exprs,
scan_relid,
params_list,
fdw_private);
+
+ /*
+ * set fdw_ps_tlist to handle tuples generated by this scan.
+ */
+ scan->fdw_ps_tlist = fdw_ps_tlist;
+
+ return scan;
}
/*
@@ -881,9 +996,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
PgFdwScanState *fsstate;
- RangeTblEntry *rte;
+ Oid serverid;
Oid userid;
- ForeignTable *table;
ForeignServer *server;
UserMapping *user;
int numParams;
@@ -903,22 +1017,51 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
node->fdw_state = (void *) fsstate;
/*
- * Identify which user to do the remote access as. This should match what
- * ExecCheckRTEPerms() does.
+ * Initialize fsstate.
+ *
+ * These values should be determined.
+ * - fsstate->rel, NULL if no actual relation
+ * - serverid, OID of forign server to use for the scan
+ * - userid, searching user mapping
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ /* Simple foreign table scan */
+ RangeTblEntry *rte;
+ ForeignTable *table;
- /* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
- user = GetUserMapping(userid, server->serverid);
+ /*
+ * Identify which user to do the remote access as. This should match
+ * what ExecCheckRTEPerms() does.
+ */
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ /* Get info about foreign table. */
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ serverid = table->serverid;
+ }
+ else
+ {
+ Oid checkAsUser;
+
+ /* Join */
+ fsstate->rel = NULL; /* No actual relation to scan */
+
+ serverid = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivateServerOid));
+ checkAsUser = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivatecheckAsUser));
+ userid = checkAsUser ? checkAsUser : GetUserId();
+ }
/*
* Get connection to the foreign server. Connection manager will
* establish new connection if necessary.
*/
+ server = GetForeignServer(serverid);
+ user = GetUserMapping(userid, server->serverid);
fsstate->conn = GetConnection(server, user, false);
/* Assign a unique ID for my cursor */
@@ -929,7 +1072,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
fsstate->query = strVal(list_nth(fsplan->fdw_private,
FdwScanPrivateSelectSql));
fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
- FdwScanPrivateRetrievedAttrs);
+ FdwScanPrivateRetrievedAttrs);
/* Create contexts for batches of tuples and per-tuple temp workspace. */
fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -944,7 +1087,11 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->tupdesc = RelationGetDescr(fsstate->rel);
+ else
+ fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1747,11 +1894,13 @@ estimate_path_cost_size(PlannerInfo *root,
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
- true, NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL, NULL,
+ fpinfo->remote_conds, " WHERE ", NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
- (fpinfo->remote_conds == NIL), NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL, NULL,
+ remote_join_conds,
+ fpinfo->remote_conds == NIL ? " WHERE " : " AND ",
+ NULL);
/* Get the remote estimate */
conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -2052,6 +2201,7 @@ fetch_more_data(ForeignScanState *node)
fsstate->tuples[i] =
make_tuple_from_result_row(res, i,
fsstate->rel,
+ fsstate->tupdesc,
fsstate->attinmeta,
fsstate->retrieved_attrs,
fsstate->temp_cxt);
@@ -2270,6 +2420,7 @@ store_returning_result(PgFdwModifyState *fmstate,
newtup = make_tuple_from_result_row(res, 0,
fmstate->rel,
+ NULL,
fmstate->attinmeta,
fmstate->retrieved_attrs,
fmstate->temp_cxt);
@@ -2562,6 +2713,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
astate->rows[pos] = make_tuple_from_result_row(res, row,
astate->rel,
+ NULL,
astate->attinmeta,
astate->retrieved_attrs,
astate->temp_cxt);
@@ -2835,6 +2987,181 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o,
+ PgFdwRelationInfo *fpinfo_i,
+ JoinType jointype)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+ copyObject(fpinfo_i->remote_conds));
+ fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+ copyObject(fpinfo_i->local_conds));
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+ fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+ fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+ fpinfo_i->local_conds_sel;
+ if (jointype == JOIN_INNER)
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ else
+ fpinfo->rows = Max(fpinfo_o->rows, fpinfo_i->rows);
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+ /* checkAsuser must be identical */
+ fpinfo->checkAsUser = fpinfo_o->checkAsUser;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPath
+ * Add possible ForeignPath to joinrel.
+ *
+ * Joins satify conditions below can be pushed down to remote PostgreSQL server.
+ *
+ * 1) Join type is inner or outer
+ * 2) Join conditions consist of remote-safe expressions.
+ * 3) Join source relations don't have any local filter.
+ */
+static void
+postgresGetForeignJoinPath(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignPath *joinpath;
+ ForeignPath *path_o = (ForeignPath *) outerrel->cheapest_total_path;
+ ForeignPath *path_i = (ForeignPath *) innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwRelationInfo *fpinfo;
+ double rows;
+ Cost startup_cost;
+ Cost total_cost;
+ ListCell *lc;
+ List *fdw_private;
+
+ /* Source relations should be ForeignPath. */
+ if (!IsA(path_o, ForeignPath) || !IsA(path_i, ForeignPath))
+ return;
+
+ /*
+ * Skip considering reversed join combination.
+ */
+ if (outerrel->relid < innerrel->relid)
+ return;
+
+ /*
+ * Both relations in the join must belong to same server.
+ */
+ fpinfo_o = path_o->path.parent->fdw_private;
+ fpinfo_i = path_i->path.parent->fdw_private;
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ return;
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ return;
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && restrictlist == NIL)
+ return;
+
+ /*
+ * Neither source relation can have local conditions. This can be relaxed
+ * if the join is an inner join and local conditions don't contain volatile
+ * function/operator, but as of now we leave it as future enhancement.
+ */
+ if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+ return;
+
+ /*
+ * Join condition must be safe to push down.
+ */
+ foreach(lc, restrictlist)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+ if (!is_foreign_expr(root, joinrel, NULL, rinfo->clause))
+ return;
+ }
+
+ /*
+ * checkAsUser of source pathes should match.
+ */
+ if (fpinfo_o->checkAsUser != fpinfo_i->checkAsUser)
+ return;
+
+ /* Here we know that this join can be pushed-down to remote side. */
+
+ /* Construct fpinfo for the join relation */
+ fpinfo = merge_fpinfo(fpinfo_o, fpinfo_i, jointype);
+ joinrel->fdw_private = fpinfo;
+
+ /* TODO determine cost and rows of the join. */
+ rows = fpinfo->rows;
+ startup_cost = fpinfo->startup_cost;
+ total_cost = fpinfo->total_cost;
+
+ fdw_private = list_make4(path_o,
+ path_i,
+ makeInteger(jointype),
+ restrictlist);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ joinpath = create_foreignscan_path(root,
+ joinrel,
+ rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no required_outer */
+ fdw_private);
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
@@ -2846,12 +3173,12 @@ static HeapTuple
make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context)
{
HeapTuple tuple;
- TupleDesc tupdesc = RelationGetDescr(rel);
Datum *values;
bool *nulls;
ItemPointer ctid = NULL;
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..23a6b24 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -16,6 +16,7 @@
#include "foreign/foreign.h"
#include "lib/stringinfo.h"
#include "nodes/relation.h"
+#include "nodes/plannodes.h"
#include "utils/relcache.h"
#include "libpq-fe.h"
@@ -45,19 +46,35 @@ extern void classifyConditions(PlannerInfo *root,
List **remote_conds,
List **local_conds);
extern bool is_foreign_expr(PlannerInfo *root,
- RelOptInfo *baserel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
Expr *expr);
extern void deparseSelectSql(StringInfo buf,
PlannerInfo *root,
RelOptInfo *baserel,
Bitmapset *attrs_used,
List **retrieved_attrs);
-extern void appendWhereClause(StringInfo buf,
+extern void appendConditions(StringInfo buf,
PlannerInfo *root,
- RelOptInfo *baserel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ ForeignScan *outerplan,
+ ForeignScan *innerplan,
List *exprs,
- bool is_first,
+ const char *prefix,
List **params);
+extern void deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **retrieved_attrs);
extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
List *targetAttrs, List *returningList,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9261e7f..7abc8e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3071,6 +3071,20 @@ include_dir 'conf.d'
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-foiregnjoin" xreflabel="enable_foiregnjoin">
+ <term><varname>enable_foiregnjoin</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_foiregnjoin</> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's use of foreign-scan plan
+ types for joining foreign tables. The default is <literal>on</>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-hashagg" xreflabel="enable_hashagg">
<term><varname>enable_hashagg</varname> (<type>boolean</type>)
<indexterm>
@@ -7687,6 +7701,7 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
</entry>
<entry>
<literal>enable_bitmapscan = off</>,
+ <literal>enable_foreignjoin = off</>,
<literal>enable_hashjoin = off</>,
<literal>enable_indexscan = off</>,
<literal>enable_mergejoin = off</>,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 78ef229..b1d2683 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -117,6 +117,7 @@ bool enable_nestloop = true;
bool enable_material = true;
bool enable_mergejoin = true;
bool enable_hashjoin = true;
+bool enable_foreignjoin = true;
typedef struct
{
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 1d7b9fa..cf45c55 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -17,6 +17,7 @@
#include <math.h>
#include "executor/executor.h"
+#include "foreign/fdwapi.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -264,17 +265,33 @@ add_paths_to_joinrel(PlannerInfo *root,
param_source_rels, extra_lateral_rels);
/*
- * 5. Consider paths added by FDW drivers or custom-scan providers, in
- * addition to built-in paths.
- *
- * XXX - In case of FDW, we may be able to omit invocation if joinrel's
- * fdwhandler (set only if both relations are managed by same FDW server).
+ * 5. Consider paths added by custom-scan providers, in addition to
+ * built-in paths.
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
restrictlist, jointype,
sjinfo, &semifactors,
param_source_rels, extra_lateral_rels);
+
+ /*
+ * 6. Consider paths added by FDWs when both outer and inner relations are
+ * managed by same foreign-data wrapper. Matching of foreign server and/or
+ * checkAsUser should be checked in GetForeignJoinPath by the FDW.
+ */
+ if (enable_foreignjoin &&
+ joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPath)
+ {
+ joinrel->fdwroutine->GetForeignJoinPath(root,
+ joinrel,
+ outerrel,
+ innerrel,
+ jointype,
+ sjinfo,
+ &semifactors,
+ restrictlist,
+ extra_lateral_rels);
+ }
}
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 33720e8..9014a72 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3236,6 +3236,9 @@ set_plan_disabling_options(const char *arg, GucContext context, GucSource source
case 'h': /* hashjoin */
tmp = "enable_hashjoin";
break;
+ case 'f': /* foreignjoin */
+ tmp = "enable_foreignjoin";
+ break;
}
if (tmp)
{
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index d84dba7..7c2343f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -875,6 +875,15 @@ static struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
{
+ {"enable_foreignjoin", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of foreign scan plans for foreign joins."),
+ NULL
+ },
+ &enable_foreignjoin,
+ true,
+ NULL, NULL, NULL
+ },
+ {
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
gettext_noop("This algorithm attempts to do planning without "
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index f8f9ce1..4fc4455 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -272,6 +272,7 @@
# - Planner Method Configuration -
#enable_bitmapscan = on
+#enable_foreignjoin = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index b494ff2..d4ab71a 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -82,6 +82,16 @@ typedef void (*EndForeignModify_function) (EState *estate,
typedef int (*IsForeignRelUpdatable_function) (Relation rel);
+typedef void (*GetForeignJoinPath_function ) (PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels);
+
typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
struct ExplainState *es);
@@ -150,6 +160,10 @@ typedef struct FdwRoutine
/* Support functions for IMPORT FOREIGN SCHEMA */
ImportForeignSchema_function ImportForeignSchema;
+
+ /* Support functions for join push-down */
+ GetForeignJoinPath_function GetForeignJoinPath;
+
} FdwRoutine;
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 9c2000b..e41c88a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -61,6 +61,7 @@ extern bool enable_nestloop;
extern bool enable_material;
extern bool enable_mergejoin;
extern bool enable_hashjoin;
+extern bool enable_foreignjoin;
extern int constraint_exclusion;
extern double clamp_row_est(double nrows);
diff --git a/src/test/regress/expected/rangefuncs.out b/src/test/regress/expected/rangefuncs.out
index 7991e99..4e02062 100644
--- a/src/test/regress/expected/rangefuncs.out
+++ b/src/test/regress/expected/rangefuncs.out
@@ -2,6 +2,7 @@ SELECT name, setting FROM pg_settings WHERE name LIKE 'enable%';
name | setting
----------------------+---------
enable_bitmapscan | on
+ enable_foreignjoin | on
enable_hashagg | on
enable_hashjoin | on
enable_indexonlyscan | on
@@ -12,7 +13,7 @@ SELECT name, setting FROM pg_settings WHERE name LIKE 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(11 rows)
+(12 rows)
CREATE TABLE foo2(fooid int, f2 int);
INSERT INTO foo2 VALUES(1, 11);
On 2 March 2015 at 12:48, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
Attached is the revised/rebased version of the $SUBJECT.
This patch is based on Kaigai-san's custom/foreign join patch, so
please apply it before this patch. In this version I changed some
points from original postgres_fdw.1) Disabled SELECT clause optimization
~9.4 postgres_fdw lists only columns actually used in SELECT clause,
but AFAIS it makes SQL generation complex. So I disabled such
optimization and put "NULL" for unnecessary columns in SELECT clause
of remote query.2) Extended deparse context
To allow deparsing based on multiple source relations, I added some
members to context structure. They are unnecessary for simple query
with single foreign table, but IMO it should be integrated.With Kaigai-san's advise, changes for supporting foreign join on
postgres_fdw is minimized into postgres_fdw itself. But I added new
FDW API named GetForeignJoinPaths() to keep the policy that all
interface between core and FDW should be in FdwRoutine, instead of
using hook function. Now I'm writing document about it, and will post
it in a day.
I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p
on e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM
(SELECT id, country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT
id, name, country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3
= l.a_0))
And the error message could be somewhat confusing. This mentions table
"r", but there's no such table or alias in my actual query.
Another issue:
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN
countries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0
--
Thom
I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on
e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT id,
country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 = l.a_0))
In this case, the 4th target-entry should be "l", not l.a_1.
And the error message could be somewhat confusing. This mentions table "r", but
there's no such table or alias in my actual query.
However, do we have a mechanical/simple way to distinguish the cases when
we need relation alias from the case when we don't need it?
Like a self-join cases, we has to construct a remote query even if same
table is referenced multiple times in a query. Do you have a good idea?
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: thombrown@gmail.com [mailto:thombrown@gmail.com] On Behalf Of Thom Brown
Sent: Monday, March 02, 2015 10:51 PM
To: Shigeru Hanada
Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
Subject: ##freemail## Re: [HACKERS] Join push-down support for foreign tablesOn 2 March 2015 at 12:48, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
Attached is the revised/rebased version of the $SUBJECT.
This patch is based on Kaigai-san's custom/foreign join patch, so
please apply it before this patch. In this version I changed some
points from original postgres_fdw.1) Disabled SELECT clause optimization
~9.4 postgres_fdw lists only columns actually used in SELECT clause,
but AFAIS it makes SQL generation complex. So I disabled such
optimization and put "NULL" for unnecessary columns in SELECT clause
of remote query.2) Extended deparse context
To allow deparsing based on multiple source relations, I added some
members to context structure. They are unnecessary for simple query
with single foreign table, but IMO it should be integrated.With Kaigai-san's advise, changes for supporting foreign join on
postgres_fdw is minimized into postgres_fdw itself. But I added new
FDW API named GetForeignJoinPaths() to keep the policy that all
interface between core and FDW should be in FdwRoutine, instead of
using hook function. Now I'm writing document about it, and will post
it in a day.I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on
e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT id,
country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 = l.a_0))And the error message could be somewhat confusing. This mentions table "r", but
there's no such table or alias in my actual query.Another issue:
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN
countries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0--
Thom
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
On 2 March 2015 at 14:07, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people
p on
e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM(SELECT id,
country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 =l.a_0))
In this case, the 4th target-entry should be "l", not l.a_1.
This will no doubt be my naivety talking, but if we know we need the whole
row, can we not request the row without additionally requesting individual
columns?
And the error message could be somewhat confusing. This mentions table
"r", but
there's no such table or alias in my actual query.
However, do we have a mechanical/simple way to distinguish the cases when
we need relation alias from the case when we don't need it?
Like a self-join cases, we has to construct a remote query even if same
table is referenced multiple times in a query. Do you have a good idea?
Then perhaps all that's really needed here is to clarify that the error
pertains to the remote execution plan rather than the query crafted by the
user. Or maybe I'm nitpicking.
--
Thom
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Thom Brown
Sent: Monday, March 02, 2015 11:36 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Shigeru Hanada; Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesOn 2 March 2015 at 14:07, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people
p on
e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM(SELECT id,
country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 =l.a_0))
In this case, the 4th target-entry should be "l", not l.a_1.
This will no doubt be my naivety talking, but if we know we need the whole row,
can we not request the row without additionally requesting individual columns?
I doubt whether local construction of whole-row reference is more efficient
than redundant copy. Tuple deform/projection is not a work we can ignore its
cost from my experience. Even if redundant column is copied over the network,
local CPU can skip tuple modification to fit remote query results for local
expectation. (NOTE: FDW driver is responsible to return a tuple according to
the tts_tupleDescriptor, so we don't need to transform it if target-list of
remote query is identical.)
And the error message could be somewhat confusing. This mentions table
"r", but
there's no such table or alias in my actual query.
However, do we have a mechanical/simple way to distinguish the cases when
we need relation alias from the case when we don't need it?
Like a self-join cases, we has to construct a remote query even if same
table is referenced multiple times in a query. Do you have a good idea?Then perhaps all that's really needed here is to clarify that the error pertains
to the remote execution plan rather than the query crafted by the user. Or maybe
I'm nitpicking.
I could understand your concern about this version. However, it is a cosmetic
feature that we can fix up later, and what we should focus on at this moment
is to ensure the design concept; that run foreign/custom-scan instead of built-
in join node and they performs as like a usual scan on materialized relations.
So, it is not a good idea to make Hanada-san improve this feature _at this moment_.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for reviewing my patch.
2015-03-02 22:50 GMT+09:00 Thom Brown <thom@linux.com>:
I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on
e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT
id, country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 = l.a_0))And the error message could be somewhat confusing. This mentions table "r",
but there's no such table or alias in my actual query.
Your concern would not be limited to the feature I'm proposing,
because fdw_options about object name also introduce such mismatch
between the query user constructed and another one which postgres_fdw
constructed for remote execution. Currently we put CONTEXT line for
such purpose, but it might hard to understand soon only from error
messages. So I'd like to add section about query debugging for
postgres_fdw.
Another issue:
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN
countries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0
Good catch. In my quick trial, removing LIMIT3 avoids this error.
I'll check it right now.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2015-03-02 23:07 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
I seem to be getting a problem with whole-row references:
# SELECT p.name, c.country, e.pet_name, p FROM pets e INNER JOIN people p on
e.person_id = p.id inner join countries c on p.country_id = c.id;
ERROR: table "r" has 3 columns available but 4 columns specified
CONTEXT: Remote SQL command: SELECT r.a_0, r.a_1, r.a_2, l.a_1 FROM (SELECT id,
country FROM public.countries) l (a_0, a_1) INNER JOIN (SELECT id, name,
country_id FROM public.people) r (a_0, a_1, a_2, a_3) ON ((r.a_3 = l.a_0))In this case, the 4th target-entry should be "l", not l.a_1.
Actually. I fixed that part.
And the error message could be somewhat confusing. This mentions table "r", but
there's no such table or alias in my actual query.However, do we have a mechanical/simple way to distinguish the cases when
we need relation alias from the case when we don't need it?
Like a self-join cases, we has to construct a remote query even if same
table is referenced multiple times in a query. Do you have a good idea?
I'd like to vote for keeping current aliasing style, use "l" and "r"
for join source relations, and use a_0, a_1, ... for each column of
them.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hanada-san,
I checked the patch, below is the random comments from my side.
* Context variables
-------------------
Sorry, I might give you a wrong suggestion.
The foreign_glob_cxt and deparse_expr_cxt were re-defined as follows:
typedef struct foreign_glob_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning
+ RelOptInfo *outerrel; /* the foreign relation, or outer child
+ RelOptInfo *innerrel; /* inner child, only set for join */
} foreign_glob_cxt;
/*
@@ -86,9 +89,12 @@ typedef struct foreign_loc_cxt
typedef struct deparse_expr_cxt
{
PlannerInfo *root; /* global planner state */
- RelOptInfo *foreignrel; /* the foreign relation we are planning
+ RelOptInfo *outerrel; /* the foreign relation, or outer child
+ RelOptInfo *innerrel; /* inner child, only set for join */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params
+ ForeignScan *outerplan; /* outer child's ForeignScan node */
+ ForeignScan *innerplan; /* inner child's ForeignScan node */
} deparse_expr_cxt;
However, the outerrel does not need to have double-meaning.
RelOptInfo->reloptkind gives us information whether the target
relation is base-relation or join-relation.
So, foreign_expr_walker() can be implemented as follows:
if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
var->varlevelsup == 0)
{
:
also, deparseVar() can checks relation type using:
if (context->foreignrel->reloptkind == RELOPT_JOINREL)
{
deparseJoinVar(...);
In addition, what we need in deparse_expr_cxt are target-list of
outer and inner relation in deparse_expr_cxt.
How about to add inner_tlist/outer_tlist instead of innerplan and
outerplan in deparse_expr_cxt?
The deparseJoinVar() references these fields, but only targetlist.
* GetForeignJoinPath method of FDW
----------------------------------
It should be portion of the interface patch, so I added these
enhancement of FDW APIs with documentation updates.
Please see the newer version of foreign/custom-join interface patch.
* enable_foreignjoin parameter
------------------------------
I'm uncertain whether we should have this GUC parameter that affects
to all FDW implementation. Rather than a built-in one, my preference
is an individual GUC variable defined with DefineCustomBoolVariable(),
by postgres_fdw.
Pros: It offers user more flexible configuration.
Cons: Each FDW has to implement this GUC by itself?
* syntax violated query if empty targetlist
-------------------------------------------
At least NULL shall be injected if no columns are referenced.
Also, add a dummy entry to fdw_ps_tlist to fit slot tuple descriptor.
postgres=# explain verbose select NULL from ft1,ft2 where aid=bid;
QUERY PLAN
---------------------------------------------------------------------------
Foreign Scan (cost=100.00..129.25 rows=2925 width=0)
Output: NULL::unknown
Remote SQL: SELECT FROM (SELECT bid, NULL FROM public.t2) l (a_0, a_1) INNER
JOIN (SELECT aid, NULL FROM public.t1) r (a_0, a_1) ON ((r.a_0 = l.a_0))
* Bug reported by Thom Brown
-----------------------------
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN countries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0
Sorry, it was a problem caused by my portion. The patched setrefs.c
checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
node is associated with a certain base relation. If *_ps_tlist is
valid, it also expects scanrelid == 0.
However, things we should check is incorrect. We may have a case
with empty *_ps_tlist if remote join expects no columns.
So, I adjusted the condition to check scanrelid instead.
* make_tuple_from_result_row() call
------------------------------------
The 4th argument (newly added) is referenced without NULL checks,
however, store_returning_result() and analyze_row_processor() put
NULL on this argument, then it leads segmentation fault.
RelationGetDescr(fmstate->rel or astate->rel) should be given on
the caller.
* regression test crash
------------------------
The query below gets crashed:
UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
According to the crash dump, tidin() got a cstring input with
unexpected format. I guess column number is incorrectly assigned,
but have no clear scenario at this moment.
#0 0x00007f9b45a11513 in conversion_error_callback (arg=0x7fffc257ecc0)
at postgres_fdw.c:3293
#1 0x00000000008e51d6 in errfinish (dummy=0) at elog.c:436
#2 0x00000000008935cd in tidin (fcinfo=0x7fffc257e8a0) at tid.c:69
/home/kaigai/repo/sepgsql/src/backend/utils/adt/tid.c:69:1734:beg:0x8935cd
(gdb) p str
$6 = 0x1d17cf7 "foo"
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
Sent: Monday, March 02, 2015 9:48 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; PostgreSQL-development
Subject: ##freemail## Re: [HACKERS] Join push-down support for foreign tablesAttached is the revised/rebased version of the $SUBJECT.
This patch is based on Kaigai-san's custom/foreign join patch, so
please apply it before this patch. In this version I changed some
points from original postgres_fdw.1) Disabled SELECT clause optimization
~9.4 postgres_fdw lists only columns actually used in SELECT clause,
but AFAIS it makes SQL generation complex. So I disabled such
optimization and put "NULL" for unnecessary columns in SELECT clause
of remote query.2) Extended deparse context
To allow deparsing based on multiple source relations, I added some
members to context structure. They are unnecessary for simple query
with single foreign table, but IMO it should be integrated.With Kaigai-san's advise, changes for supporting foreign join on
postgres_fdw is minimized into postgres_fdw itself. But I added new
FDW API named GetForeignJoinPaths() to keep the policy that all
interface between core and FDW should be in FdwRoutine, instead of
using hook function. Now I'm writing document about it, and will post
it in a day.2015-02-19 16:19 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2015-02-17 10:39 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Let me put some comments in addition to where you're checking now.
[design issues]
* Cost estimation
Estimation and evaluation of cost for remote join query is not an
obvious issue. In principle, local side cannot determine the cost
to run remote join without remote EXPLAIN, because local side has
no information about JOIN logic applied on the remote side.
Probably, we have to put an assumption for remote join algorithm,
because local planner has no idea about remote planner's choice
unless foreign-join don't take "use_remote_estimate".
I think, it is reasonable assumption (even if it is incorrect) to
calculate remote join cost based on local hash-join algorithm.
If user wants more correct estimation, remote EXPLAIN will make
more reliable cost estimation.Hm, I guess that you chose hash-join as "least-costed join". In the
pgbench model, most combination between two tables generate hash join
as cheapest path. Remote EXPLAIN is very expensive in the context of
planning, so it would easily make the plan optimization meaningless.
But giving an option to users is good, I agree.It also needs a consensus whether cost for remote CPU execution is
equivalent to local CPU. If we think local CPU is rare resource
than remote one, a discount rate will make planner more preferable
to choose remote join than local oneSomething like cpu_cost_ratio as a new server-level FDW option?
Once we assume a join algorithm for remote join, unit cost for
remote CPU, we can calculate a cost for foreign join based on
the local join logic plus cost for network translation (maybe
fdw_tuple_cost?).Yes, sum of these costs is the total cost of a remote join.
o fdw_startup_cost
o hash-join cost, estimated as a local join
o fdw_tuple_cost * rows * width* FDW options
Unlike table scan, FDW options we should refer is unclear.
Table level FDW options are associated with a foreign table as
literal. I think we have two options here:
1. Foreign-join refers FDW options for foreign-server, but ones
for foreign-tables are ignored.
2. Foreign-join is prohibited when both of relations don't have
identical FDW options.
My preference is 2. Even though N-way foreign join, it ensures
all the tables involved with (N-1)-way foreign join has identical
FDW options, thus it leads we can make N-way foreign join with
all identical FDW options.
One exception is "updatable" flag of postgres_fdw. It does not
make sense on remote join, so I think mixture of updatable and
non-updatable foreign tables should be admitted, however, it is
a decision by FDW driver.Probably, above points need to take time for getting consensus.
I'd like to see your opinion prior to editing your patch.postgres_fdw can't push down a join which contains foreign tables on
multiple servers, so use_remote_estimate and fdw_startup_cost are the
only FDW options to consider. So we have options for each option.1-a. If all foreign tables in the join has identical
use_remote_estimate, allow pushing down.
1-b. If any of foreign table in the join has true as
use_remote_estimate, use remote estimate.2-a. If all foreign tables in the join has identical fdw_startup_cost,
allow pushing down.
2-b. Always use max value in the join. (cost would be more expensive)
2-c. Always use min value in the join. (cost would be cheaper)I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have
reasonable cost about startup.I agree about "updatable" option.
[implementation issues]
The interface does not intend to add new Path/Plan type for each scan
that replaces foreign joins. What postgres_fdw should do is, adding
ForeignPath towards a particular joinrel, then it populates ForeignScan
with remote join query once it got chosen by the planner.That idea is interesting, and make many things simpler. Please let me consider.
A few functions added in src/backend/foreign/foreign.c are not
called by anywhere, at this moment.create_plan_recurse() is reverted to static. It is needed for custom-
join enhancement, if no other infrastructure can support.I made it back to static because I thought that create_plan_recurse
can be called by core before giving control to FDWs. But I'm not sure
it can be applied to custom scans. I'll recheck that part.--
Shigeru HANADA--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
Thanks for the detailed comments.
2015-03-03 18:01 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Hanada-san,
I checked the patch, below is the random comments from my side.
* Context variables
-------------------
Sorry, I might give you a wrong suggestion.
The foreign_glob_cxt and deparse_expr_cxt were re-defined as follows:typedef struct foreign_glob_cxt { PlannerInfo *root; /* global planner state */ - RelOptInfo *foreignrel; /* the foreign relation we are planning + RelOptInfo *outerrel; /* the foreign relation, or outer child + RelOptInfo *innerrel; /* inner child, only set for join */ } foreign_glob_cxt;/* @@ -86,9 +89,12 @@ typedef struct foreign_loc_cxt typedef struct deparse_expr_cxt { PlannerInfo *root; /* global planner state */ - RelOptInfo *foreignrel; /* the foreign relation we are planning + RelOptInfo *outerrel; /* the foreign relation, or outer child + RelOptInfo *innerrel; /* inner child, only set for join */ StringInfo buf; /* output buffer to append to */ List **params_list; /* exprs that will become remote Params + ForeignScan *outerplan; /* outer child's ForeignScan node */ + ForeignScan *innerplan; /* inner child's ForeignScan node */ } deparse_expr_cxt;However, the outerrel does not need to have double-meaning.
RelOptInfo->reloptkind gives us information whether the target
relation is base-relation or join-relation.
So, foreign_expr_walker() can be implemented as follows:
if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
var->varlevelsup == 0)
{
:
also, deparseVar() can checks relation type using:
if (context->foreignrel->reloptkind == RELOPT_JOINREL)
{
deparseJoinVar(...);In addition, what we need in deparse_expr_cxt are target-list of
outer and inner relation in deparse_expr_cxt.
How about to add inner_tlist/outer_tlist instead of innerplan and
outerplan in deparse_expr_cxt?
The deparseJoinVar() references these fields, but only target list.
Ah, I've totally misunderstood your suggestion. Now I reverted my
changes and use target lists to know whether the var came from either
of the relations.
* GetForeignJoinPath method of FDW
----------------------------------
It should be portion of the interface patch, so I added these
enhancement of FDW APIs with documentation updates.
Please see the newer version of foreign/custom-join interface patch.
Agreed.
* enable_foreignjoin parameter
------------------------------
I'm uncertain whether we should have this GUC parameter that affects
to all FDW implementation. Rather than a built-in one, my preference
is an individual GUC variable defined with DefineCustomBoolVariable(),
by postgres_fdw.
Pros: It offers user more flexible configuration.
Cons: Each FDW has to implement this GUC by itself?
Hum...
In a sense, I added this GUC parameter for debugging purpose. As you
pointed out, users might want to control join push-down feature
per-FDW. I'd like to hear others' opinion.
* syntax violated query if empty targetlist
-------------------------------------------
At least NULL shall be injected if no columns are referenced.
Also, add a dummy entry to fdw_ps_tlist to fit slot tuple descriptor.postgres=# explain verbose select NULL from ft1,ft2 where aid=bid;
QUERY PLAN
---------------------------------------------------------------------------
Foreign Scan (cost=100.00..129.25 rows=2925 width=0)
Output: NULL::unknown
Remote SQL: SELECT FROM (SELECT bid, NULL FROM public.t2) l (a_0, a_1) INNER
JOIN (SELECT aid, NULL FROM public.t1) r (a_0, a_1) ON ((r.a_0 = l.a_0))
Fixed.
* Bug reported by Thom Brown
-----------------------------
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOIN countries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0Sorry, it was a problem caused by my portion. The patched setrefs.c
checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
node is associated with a certain base relation. If *_ps_tlist is
valid, it also expects scanrelid == 0.
However, things we should check is incorrect. We may have a case
with empty *_ps_tlist if remote join expects no columns.
So, I adjusted the condition to check scanrelid instead.
Is this issue fixed by v5 custom/foreign join patch?
* make_tuple_from_result_row() call
------------------------------------
The 4th argument (newly added) is referenced without NULL checks,
however, store_returning_result() and analyze_row_processor() put
NULL on this argument, then it leads segmentation fault.
RelationGetDescr(fmstate->rel or astate->rel) should be given on
the caller.
Fixed.
* regression test crash
------------------------
The query below gets crashed:
UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;According to the crash dump, tidin() got a cstring input with
unexpected format. I guess column number is incorrectly assigned,
but have no clear scenario at this moment.#0 0x00007f9b45a11513 in conversion_error_callback (arg=0x7fffc257ecc0)
at postgres_fdw.c:3293
#1 0x00000000008e51d6 in errfinish (dummy=0) at elog.c:436
#2 0x00000000008935cd in tidin (fcinfo=0x7fffc257e8a0) at tid.c:69
/home/kaigai/repo/sepgsql/src/backend/utils/adt/tid.c:69:1734:beg:0x8935cd
(gdb) p str
$6 = 0x1d17cf7 "foo"
Join push-down underlying UPDATE or DELETE requires ctid as its
output, but it seems not fully supported. I'm fixing this issue now.
Regards,
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Bug reported by Thom Brown
-----------------------------
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOINcountries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0
Sorry, it was a problem caused by my portion. The patched setrefs.c
checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
node is associated with a certain base relation. If *_ps_tlist is
valid, it also expects scanrelid == 0.
However, things we should check is incorrect. We may have a case
with empty *_ps_tlist if remote join expects no columns.
So, I adjusted the condition to check scanrelid instead.Is this issue fixed by v5 custom/foreign join patch?
Yes, please rebase it.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch. I posted some comments to v6 patch in this post:
/messages/by-id/CAEZqfEcNvjqq-P=jxnW1Pb4T9wvpcPoRCN7G6cc46JGuB7dY8w@mail.gmail.com
Before applying my v3 patch, please apply Kaigai-san's v6 patch and my
mod_cjv6.patch.
Sorry for complex patch combination. Those patches will be arranged
soon by Kaigai-san and me.
I fixed the issues pointed out by Thom and Kohei, but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue. Besides this issue, existing regression test
passed.
2015-03-03 19:48 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
* Bug reported by Thom Brown
-----------------------------
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOINcountries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0
Sorry, it was a problem caused by my portion. The patched setrefs.c
checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
node is associated with a certain base relation. If *_ps_tlist is
valid, it also expects scanrelid == 0.
However, things we should check is incorrect. We may have a case
with empty *_ps_tlist if remote join expects no columns.
So, I adjusted the condition to check scanrelid instead.Is this issue fixed by v5 custom/foreign join patch?
Yes, please rebase it.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Shigeru HANADA
Attachments:
foreign_join_v3.patchapplication/octet-stream; name=foreign_join_v3.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..5c08baa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,7 +44,9 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
#include "optimizer/clauses.h"
#include "optimizer/var.h"
#include "parser/parsetree.h"
@@ -89,6 +91,8 @@ typedef struct deparse_expr_cxt
RelOptInfo *foreignrel; /* the foreign relation we are planning for */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
+ List *outertlist; /* outer child's target list */
+ List *innertlist; /* inner child's target list */
} deparse_expr_cxt;
/*
@@ -250,7 +254,7 @@ foreign_expr_walker(Node *node,
* Param's collation, ie it's not safe for it to have a
* non-default collation.
*/
- if (var->varno == glob_cxt->foreignrel->relid &&
+ if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
var->varlevelsup == 0)
{
/* Var belongs to foreign table */
@@ -743,18 +747,22 @@ deparseTargetList(StringInfo buf,
if (attr->attisdropped)
continue;
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
if (have_wholerow ||
bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
attrs_used))
{
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
deparseColumnRef(buf, rtindex, i, root);
- *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
+ else
+ appendStringInfoString(buf, "NULL");
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
/*
@@ -794,12 +802,14 @@ deparseTargetList(StringInfo buf,
* so Params and other-relation Vars should be replaced by dummy values.
*/
void
-appendWhereClause(StringInfo buf,
- PlannerInfo *root,
- RelOptInfo *baserel,
- List *exprs,
- bool is_first,
- List **params)
+appendConditions(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
+ List *exprs,
+ const char *prefix,
+ List **params)
{
deparse_expr_cxt context;
int nestlevel;
@@ -813,6 +823,8 @@ appendWhereClause(StringInfo buf,
context.foreignrel = baserel;
context.buf = buf;
context.params_list = params;
+ context.outertlist = outertlist;
+ context.innertlist = innertlist;
/* Make sure any constants in the exprs are printed portably */
nestlevel = set_transmission_modes();
@@ -822,22 +834,180 @@ appendWhereClause(StringInfo buf,
RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
/* Connect expressions with "AND" and parenthesize each condition. */
- if (is_first)
- appendStringInfoString(buf, " WHERE ");
- else
- appendStringInfoString(buf, " AND ");
+ if (prefix)
+ appendStringInfo(buf, "%s", prefix);
appendStringInfoChar(buf, '(');
deparseExpr(ri->clause, &context);
appendStringInfoChar(buf, ')');
- is_first = false;
+ prefix= " AND ";
}
reset_transmission_modes(nestlevel);
}
/*
+ * Deparse given Var into buf.
+ */
+static TargetEntry *
+deparseJoinVar(Var *node, deparse_expr_cxt *context)
+{
+ const char *side;
+ ListCell *lc2;
+ TargetEntry *tle = NULL;
+ int j;
+
+ j = 0;
+ foreach(lc2, context->outertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "l";
+ break;
+ }
+ j++;
+ }
+ if (tle == NULL)
+ {
+ j = 0;
+ foreach(lc2, context->innertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "r";
+ break;
+ }
+ j++;
+ }
+ }
+ Assert(tle);
+
+ if (node->varattno == 0)
+ appendStringInfo(context->buf, "%s", side);
+ else
+ appendStringInfo(context->buf, "%s.a_%d", side, j);
+
+ return tle;
+}
+
+static void
+deparseColumnAliases(StringInfo buf, List *targetlist)
+{
+ int i;
+ ListCell *lc;
+
+ i = 0;
+ foreach(lc, targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ Var *var = (Var *) tle->expr;
+
+ Assert(IsA(var, Var));
+
+ /* Skip whole-row reference */
+ if (var->varattno == 0)
+ continue;
+
+ /* Deparse column alias for the subquery */
+ if (i > 0)
+ appendStringInfoString(buf, ", ");
+ appendStringInfo(buf, "a_%d", i);
+ i++;
+ }
+}
+
+/*
+ * Construct a SELECT statement which contains join clause.
+ *
+ * We also create an TargetEntry List of the columns being retrieved, which is
+ * returned to *fdw_ps_tlist.
+ *
+ * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+ * statement of the outer child relation. postfix _i means those for the inner
+ * child relation. jointype and restrictlist are information of join method.
+ * fdw_ps_tlist is output parameter to pass target list of the pseudo scan to
+ * caller.
+ */
+void
+deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **fdw_ps_tlist)
+{
+ StringInfoData selbuf; /* buffer for SELECT clause */
+ StringInfoData abuf_o; /* buffer for column alias list of outer */
+ StringInfoData abuf_i; /* buffer for column alias list of inner */
+ int i;
+ ListCell *lc;
+ const char *jointype_str;
+ deparse_expr_cxt context;
+
+ context.root = root;
+ context.foreignrel = baserel;
+ context.buf = &selbuf;
+ context.params_list = NULL;
+ context.outertlist = plan_o->scan.plan.targetlist;
+ context.innertlist = plan_i->scan.plan.targetlist;
+
+ jointype_str = jointype == JOIN_INNER ? "INNER" :
+ jointype == JOIN_LEFT ? "LEFT" :
+ jointype == JOIN_RIGHT ? "RIGHT" :
+ jointype == JOIN_FULL ? "FULL" : "";
+
+ /* print SELECT clause of the join scan */
+ /* XXX: should extend deparseTargetList()? */
+ initStringInfo(&selbuf);
+ i = 0;
+ foreach(lc, baserel->reltargetlist)
+ {
+ Var *var = (Var *) lfirst(lc);
+ TargetEntry *tle;
+
+ if (i > 0)
+ appendStringInfoString(&selbuf, ", ");
+ deparseJoinVar(var, &context);
+
+ tle = makeTargetEntry((Expr *) copyObject(var),
+ i + 1, pstrdup(""), false);
+ if (fdw_ps_tlist)
+ *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+ i++;
+ }
+
+ /* Deparse column alias portion of subquery in FROM clause. */
+ initStringInfo(&abuf_o);
+ deparseColumnAliases(&abuf_o, plan_o->scan.plan.targetlist);
+ initStringInfo(&abuf_i);
+ deparseColumnAliases(&abuf_i, plan_i->scan.plan.targetlist);
+
+ /* Construct SELECT statement */
+ appendStringInfo(sql, "SELECT %s FROM", selbuf.data);
+ appendStringInfo(sql, " (%s) l (%s) %s JOIN (%s) r (%s) ",
+ sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+ /* Append ON clause */
+ appendConditions(sql, root, baserel,
+ plan_o->scan.plan.targetlist,
+ plan_i->scan.plan.targetlist,
+ restrictlist, " ON ", NULL);
+}
+
+/*
* deparse remote INSERT statement
*
* The statement text is appended to buf, and we also create an integer List
@@ -1261,6 +1431,8 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
/*
* Deparse given Var node into context->buf.
*
+ * If context has valid innerrel, this is invoked for a join conditions.
+ *
* If the Var belongs to the foreign relation, just print its remote name.
* Otherwise, it's effectively a Param (and will in fact be a Param at
* run time). Handle it the same way we handle plain Params --- see
@@ -1271,39 +1443,46 @@ deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ if (context->foreignrel->reloptkind == RELOPT_JOINREL)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ deparseJoinVar(node, context);
}
else
{
- /* Treat like a Param */
- if (context->params_list)
+ if (node->varno == context->foreignrel->relid &&
+ node->varlevelsup == 0)
{
- int pindex = 0;
- ListCell *lc;
-
- /* find its index in params_list */
- foreach(lc, *context->params_list)
+ /* Var belongs to foreign table */
+ deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ }
+ else
+ {
+ /* Treat like a Param */
+ if (context->params_list)
{
- pindex++;
- if (equal(node, (Node *) lfirst(lc)))
- break;
+ int pindex = 0;
+ ListCell *lc;
+
+ /* find its index in params_list */
+ foreach(lc, *context->params_list)
+ {
+ pindex++;
+ if (equal(node, (Node *) lfirst(lc)))
+ break;
+ }
+ if (lc == NULL)
+ {
+ /* not in list, so add it */
+ pindex++;
+ *context->params_list = lappend(*context->params_list, node);
+ }
+
+ printRemoteParam(pindex, node->vartype, node->vartypmod, context);
}
- if (lc == NULL)
+ else
{
- /* not in list, so add it */
- pindex++;
- *context->params_list = lappend(*context->params_list, node);
+ printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
-
- printRemoteParam(pindex, node->vartype, node->vartypmod, context);
- }
- else
- {
- printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
}
}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..7f96793 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -489,17 +489,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- can't
-- parameterized remote path
EXPLAIN (VERBOSE, COSTS false)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------
- Nested Loop
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- -> Foreign Scan on public.ft2 a
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
- -> Foreign Scan on public.ft2 b
- Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
+ Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT r.a_0, r.a_1, r.a_2, r.a_3, r.a_4, r.a_5, r.a_6, r.a_7, l.a_0, l.a_1, l.a_2, l.a_3, l.a_4, l.a_5, l.a_6, l.a_7 FROM (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1") l (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) INNER JOIN (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))) r (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) ON ((r.a_1 = l.a_0))
+(3 rows)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
@@ -656,16 +651,16 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Nested Loop
Output: t1.c3, t2.c3
-> Foreign Scan on public.ft1 t1
Output: t1.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 1))
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 2))
(8 rows)
EXECUTE st1(1, 1);
@@ -683,8 +678,8 @@ EXECUTE st1(101, 101);
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -699,7 +694,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
-> Foreign Scan on public.ft2 t2
Output: t2.c3
Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10))
(15 rows)
EXECUTE st2(10, 20);
@@ -717,8 +712,8 @@ EXECUTE st2(101, 121);
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -732,7 +727,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
Output: t2.c3
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
(14 rows)
EXECUTE st3(10, 20);
@@ -1085,7 +1080,7 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
Output: ((ft2_1.c1 + 1000)), ((ft2_1.c2 + 100)), ((ft2_1.c3 || ft2_1.c3))
-> Foreign Scan on public.ft2 ft2_1
Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
- Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1"
+ Remote SQL: SELECT "C 1", c2, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1"
(9 rows)
INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
@@ -1219,7 +1214,7 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, NULL, c8, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -1231,14 +1226,14 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
EXPLAIN (verbose, costs off)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
- QUERY PLAN
-----------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------
Delete on public.ft2
Output: c1, c4
- Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c4
+ Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", NULL, NULL, c4, NULL, NULL, NULL, NULL
-> Foreign Scan on public.ft2
Output: ctid
- Remote SQL: SELECT ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
(6 rows)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
@@ -1360,7 +1355,7 @@ DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.c2
- Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT NULL, c2, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -2594,12 +2589,12 @@ select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
-- Consistent check constraints provide consistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2positive CHECK (c2 >= 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 < 0;
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 < 0;
@@ -2638,12 +2633,12 @@ ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2positive;
-- But inconsistent check constraints provide inconsistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2negative CHECK (c2 < 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 >= 0;
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 >= 0;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..1791fca 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -78,10 +79,30 @@ typedef struct PgFdwRelationInfo
ForeignTable *table;
ForeignServer *server;
UserMapping *user; /* only set in use_remote_estimate mode */
+ Oid checkAsUser;
} PgFdwRelationInfo;
/*
- * Indexes of FDW-private information stored in fdw_private lists.
+ * Indexes of FDW-private information stored in fdw_private of ForeignPath.
+ * We use fdw_private of a ForeighPath when the path represents a join which
+ * can be pushed down to remote side.
+ *
+ * 1) Outer child path node
+ * 2) Inner child path node
+ * 3) Join type number(as an Integer node)
+ * 4) RestrictInfo list of join conditions
+ */
+enum FdwPathPrivateIndex
+{
+ FdwPathPrivateOuterPath,
+ FdwPathPrivateInnerPath,
+ FdwPathPrivateJoinType,
+ FdwPathPrivateRestrictList,
+};
+
+/*
+ * Indexes of FDW-private information stored in fdw_private of ForeignScan of
+ * a simple foreign table scan for a SELECT statement.
*
* We store various information in ForeignScan.fdw_private to pass it from
* planner to executor. Currently we store:
@@ -98,7 +119,11 @@ enum FdwScanPrivateIndex
/* SQL statement to execute remotely (as a String node) */
FdwScanPrivateSelectSql,
/* Integer list of attribute numbers retrieved by the SELECT */
- FdwScanPrivateRetrievedAttrs
+ FdwScanPrivateRetrievedAttrs,
+ /* Integer value of server for the scan */
+ FdwScanPrivateServerOid,
+ /* Integer value of checkAsUser for the scan */
+ FdwScanPrivatecheckAsUser,
};
/*
@@ -129,6 +154,7 @@ enum FdwModifyPrivateIndex
typedef struct PgFdwScanState
{
Relation rel; /* relcache entry for the foreign table */
+ TupleDesc tupdesc; /* tuple descriptor of the scan */
AttInMetadata *attinmeta; /* attribute datatype conversion metadata */
/* extracted fdw_private data */
@@ -288,6 +314,15 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -324,6 +359,7 @@ static void analyze_row_processor(PGresult *res, int row,
static HeapTuple make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context);
@@ -368,6 +404,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPaths = postgresGetForeignJoinPaths;
+
PG_RETURN_POINTER(routine);
}
@@ -385,6 +424,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
{
PgFdwRelationInfo *fpinfo;
ListCell *lc;
+ RangeTblEntry *rte;
/*
* We use PgFdwRelationInfo to pass various information to subsequent
@@ -428,6 +468,13 @@ postgresGetForeignRelSize(PlannerInfo *root,
}
/*
+ * Retrieve RTE to obtain checkAsUser. checkAsUser is used to determine
+ * the user to use to obtain user mapping.
+ */
+ rte = planner_rt_fetch(baserel->relid, root);
+ fpinfo->checkAsUser = rte->checkAsUser;
+
+ /*
* If the table or the server is configured to use remote estimates,
* identify which user to do remote access as during planning. This
* should match what ExecCheckRTEPerms() does. If we fail due to lack of
@@ -435,7 +482,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
*/
if (fpinfo->use_remote_estimate)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
Oid userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
@@ -752,6 +798,8 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ List *fdw_ps_tlist = NIL;
+ ForeignScan *scan;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -769,7 +817,7 @@ postgresGetForeignPlan(PlannerInfo *root,
* This code must match "extract_actual_clauses(scan_clauses, false)"
* except for the additional decision about remote versus local execution.
* Note however that we only strip the RestrictInfo nodes from the
- * local_exprs list, since appendWhereClause expects a list of
+ * local_exprs list, since appendConditions expects a list of
* RestrictInfos.
*/
foreach(lc, scan_clauses)
@@ -797,64 +845,123 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
- if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
-
- /*
- * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
- * initial row fetch, rather than later on as is done for local tables.
- * The extra roundtrips involved in trying to duplicate the local
- * semantics exactly don't seem worthwhile (see also comments for
- * RowMarkType).
- *
- * Note: because we actually run the query as a cursor, this assumes that
- * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
- */
- if (baserel->relid == root->parse->resultRelation &&
- (root->parse->commandType == CMD_UPDATE ||
- root->parse->commandType == CMD_DELETE))
+ if (scan_relid > 0)
{
- /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
- appendStringInfoString(&sql, " FOR UPDATE");
- }
- else
- {
- RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+ deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
+ &retrieved_attrs);
+ if (remote_conds)
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_conds, " WHERE ", ¶ms_list);
- if (rc)
+ /*
+ * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
+ * initial row fetch, rather than later on as is done for local tables.
+ * The extra roundtrips involved in trying to duplicate the local
+ * semantics exactly don't seem worthwhile (see also comments for
+ * RowMarkType).
+ *
+ * Note: because we actually run the query as a cursor, this assumes
+ * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+ * before 8.3.
+ */
+ if (baserel->relid == root->parse->resultRelation &&
+ (root->parse->commandType == CMD_UPDATE ||
+ root->parse->commandType == CMD_DELETE))
{
- /*
- * Relation is specified as a FOR UPDATE/SHARE target, so handle
- * that.
- *
- * For now, just ignore any [NO] KEY specification, since (a) it's
- * not clear what that means for a remote table that we don't have
- * complete information about, and (b) it wouldn't work anyway on
- * older remote servers. Likewise, we don't worry about NOWAIT.
- */
- switch (rc->strength)
+ /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+ appendStringInfoString(&sql, " FOR UPDATE");
+ }
+ else
+ {
+ RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+
+ if (rc)
{
- case LCS_FORKEYSHARE:
- case LCS_FORSHARE:
- appendStringInfoString(&sql, " FOR SHARE");
- break;
- case LCS_FORNOKEYUPDATE:
- case LCS_FORUPDATE:
- appendStringInfoString(&sql, " FOR UPDATE");
- break;
+ /*
+ * Relation is specified as a FOR UPDATE/SHARE target, so handle
+ * that.
+ *
+ * For now, just ignore any [NO] KEY specification, since (a)
+ * it's not clear what that means for a remote table that we
+ * don't have complete information about, and (b) it wouldn't
+ * work anyway on older remote servers. Likewise, we don't
+ * worry about NOWAIT.
+ */
+ switch (rc->strength)
+ {
+ case LCS_FORKEYSHARE:
+ case LCS_FORSHARE:
+ appendStringInfoString(&sql, " FOR SHARE");
+ break;
+ case LCS_FORNOKEYUPDATE:
+ case LCS_FORUPDATE:
+ appendStringInfoString(&sql, " FOR UPDATE");
+ break;
+ }
}
}
}
+ else
+ {
+ /* Join case */
+ Path *path_o;
+ Path *path_i;
+ const char *sql_o;
+ const char *sql_i;
+ ForeignScan *plan_o;
+ ForeignScan *plan_i;
+ JoinType jointype;
+ List *restrictlist;
+ int i;
+
+ /*
+ * Retrieve infomation from fdw_private.
+ */
+ path_o = list_nth(best_path->fdw_private, FdwPathPrivateOuterPath);
+ path_i = list_nth(best_path->fdw_private, FdwPathPrivateInnerPath);
+ jointype = intVal(list_nth(best_path->fdw_private,
+ FdwPathPrivateJoinType));
+ restrictlist = list_nth(best_path->fdw_private,
+ FdwPathPrivateRestrictList);
+
+ /*
+ * Construct remote query from bottom to the top. ForeignScan plan
+ * node of underlying scans are node necessary for execute the plan
+ * tree, but it is handy to construct remote query recursively.
+ */
+ plan_o = (ForeignScan *) create_plan_recurse(root, path_o);
+ Assert(IsA(plan_o, ForeignScan));
+ sql_o = strVal(list_nth(plan_o->fdw_private, FdwScanPrivateSelectSql));
+
+ plan_i = (ForeignScan *) create_plan_recurse(root, path_i);
+ Assert(IsA(plan_i, ForeignScan));
+ sql_i = strVal(list_nth(plan_i->fdw_private, FdwScanPrivateSelectSql));
+
+ deparseJoinSql(&sql, root, baserel, path_o, path_i, plan_o, plan_i,
+ sql_o, sql_i, jointype, restrictlist, &fdw_ps_tlist);
+ retrieved_attrs = NIL;
+ for (i = 0; i < list_length(fdw_ps_tlist); i++)
+ retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
+ }
/*
* Build the fdw_private list that will be available to the executor.
* Items in the list must match enum FdwScanPrivateIndex, above.
*/
- fdw_private = list_make2(makeString(sql.data),
- retrieved_attrs);
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+
+ /*
+ * In pseudo scan case such as join push-down, add OID of server and
+ * checkAsUser as extra information.
+ * XXX: passing serverid and checkAsUser might simplify code through
+ * all cases, simple scans and join push-down.
+ */
+ if (scan_relid == 0)
+ {
+ fdw_private = lappend(fdw_private,
+ makeInteger(fpinfo->server->serverid));
+ fdw_private = lappend(fdw_private, makeInteger(fpinfo->checkAsUser));
+ }
/*
* Create the ForeignScan node from target list, local filtering
@@ -864,11 +971,18 @@ postgresGetForeignPlan(PlannerInfo *root,
* field of the finished plan node; we can't keep them in private state
* because then they wouldn't be subject to later planner processing.
*/
- return make_foreignscan(tlist,
+ scan = make_foreignscan(tlist,
local_exprs,
scan_relid,
params_list,
fdw_private);
+
+ /*
+ * set fdw_ps_tlist to handle tuples generated by this scan.
+ */
+ scan->fdw_ps_tlist = fdw_ps_tlist;
+
+ return scan;
}
/*
@@ -881,9 +995,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
PgFdwScanState *fsstate;
- RangeTblEntry *rte;
+ Oid serverid;
Oid userid;
- ForeignTable *table;
ForeignServer *server;
UserMapping *user;
int numParams;
@@ -903,22 +1016,51 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
node->fdw_state = (void *) fsstate;
/*
- * Identify which user to do the remote access as. This should match what
- * ExecCheckRTEPerms() does.
+ * Initialize fsstate.
+ *
+ * These values should be determined.
+ * - fsstate->rel, NULL if no actual relation
+ * - serverid, OID of forign server to use for the scan
+ * - userid, searching user mapping
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ /* Simple foreign table scan */
+ RangeTblEntry *rte;
+ ForeignTable *table;
- /* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
- user = GetUserMapping(userid, server->serverid);
+ /*
+ * Identify which user to do the remote access as. This should match
+ * what ExecCheckRTEPerms() does.
+ */
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ /* Get info about foreign table. */
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ serverid = table->serverid;
+ }
+ else
+ {
+ Oid checkAsUser;
+
+ /* Join */
+ fsstate->rel = NULL; /* No actual relation to scan */
+
+ serverid = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivateServerOid));
+ checkAsUser = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivatecheckAsUser));
+ userid = checkAsUser ? checkAsUser : GetUserId();
+ }
/*
* Get connection to the foreign server. Connection manager will
* establish new connection if necessary.
*/
+ server = GetForeignServer(serverid);
+ user = GetUserMapping(userid, server->serverid);
fsstate->conn = GetConnection(server, user, false);
/* Assign a unique ID for my cursor */
@@ -929,7 +1071,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
fsstate->query = strVal(list_nth(fsplan->fdw_private,
FdwScanPrivateSelectSql));
fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
- FdwScanPrivateRetrievedAttrs);
+ FdwScanPrivateRetrievedAttrs);
/* Create contexts for batches of tuples and per-tuple temp workspace. */
fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -944,7 +1086,11 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->tupdesc = RelationGetDescr(fsstate->rel);
+ else
+ fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1747,11 +1893,13 @@ estimate_path_cost_size(PlannerInfo *root,
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
- true, NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ fpinfo->remote_conds, " WHERE ", NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
- (fpinfo->remote_conds == NIL), NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_join_conds,
+ fpinfo->remote_conds == NIL ? " WHERE " : " AND ",
+ NULL);
/* Get the remote estimate */
conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -2052,6 +2200,7 @@ fetch_more_data(ForeignScanState *node)
fsstate->tuples[i] =
make_tuple_from_result_row(res, i,
fsstate->rel,
+ fsstate->tupdesc,
fsstate->attinmeta,
fsstate->retrieved_attrs,
fsstate->temp_cxt);
@@ -2270,6 +2419,7 @@ store_returning_result(PgFdwModifyState *fmstate,
newtup = make_tuple_from_result_row(res, 0,
fmstate->rel,
+ RelationGetDescr(fmstate->rel),
fmstate->attinmeta,
fmstate->retrieved_attrs,
fmstate->temp_cxt);
@@ -2562,6 +2712,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
astate->rows[pos] = make_tuple_from_result_row(res, row,
astate->rel,
+ RelationGetDescr(astate->rel),
astate->attinmeta,
astate->retrieved_attrs,
astate->temp_cxt);
@@ -2835,6 +2986,181 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o,
+ PgFdwRelationInfo *fpinfo_i,
+ JoinType jointype)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+ copyObject(fpinfo_i->remote_conds));
+ fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+ copyObject(fpinfo_i->local_conds));
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+ fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+ fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+ fpinfo_i->local_conds_sel;
+ if (jointype == JOIN_INNER)
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ else
+ fpinfo->rows = Max(fpinfo_o->rows, fpinfo_i->rows);
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+ /* checkAsuser must be identical */
+ fpinfo->checkAsUser = fpinfo_o->checkAsUser;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPaths
+ * Add possible ForeignPath to joinrel.
+ *
+ * Joins satify conditions below can be pushed down to remote PostgreSQL server.
+ *
+ * 1) Join type is inner or outer
+ * 2) Join conditions consist of remote-safe expressions.
+ * 3) Join source relations don't have any local filter.
+ */
+static void
+postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignPath *joinpath;
+ ForeignPath *path_o = (ForeignPath *) outerrel->cheapest_total_path;
+ ForeignPath *path_i = (ForeignPath *) innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwRelationInfo *fpinfo;
+ double rows;
+ Cost startup_cost;
+ Cost total_cost;
+ ListCell *lc;
+ List *fdw_private;
+
+ /* Source relations should be ForeignPath. */
+ if (!IsA(path_o, ForeignPath) || !IsA(path_i, ForeignPath))
+ return;
+
+ /*
+ * Skip considering reversed join combination.
+ */
+ if (outerrel->relid < innerrel->relid)
+ return;
+
+ /*
+ * Both relations in the join must belong to same server.
+ */
+ fpinfo_o = path_o->path.parent->fdw_private;
+ fpinfo_i = path_i->path.parent->fdw_private;
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ return;
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ return;
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && restrictlist == NIL)
+ return;
+
+ /*
+ * Neither source relation can have local conditions. This can be relaxed
+ * if the join is an inner join and local conditions don't contain volatile
+ * function/operator, but as of now we leave it as future enhancement.
+ */
+ if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+ return;
+
+ /*
+ * Join condition must be safe to push down.
+ */
+ foreach(lc, restrictlist)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+ if (!is_foreign_expr(root, joinrel, rinfo->clause))
+ return;
+ }
+
+ /*
+ * checkAsUser of source pathes should match.
+ */
+ if (fpinfo_o->checkAsUser != fpinfo_i->checkAsUser)
+ return;
+
+ /* Here we know that this join can be pushed-down to remote side. */
+
+ /* Construct fpinfo for the join relation */
+ fpinfo = merge_fpinfo(fpinfo_o, fpinfo_i, jointype);
+ joinrel->fdw_private = fpinfo;
+
+ /* TODO determine cost and rows of the join. */
+ rows = fpinfo->rows;
+ startup_cost = fpinfo->startup_cost;
+ total_cost = fpinfo->total_cost;
+
+ fdw_private = list_make4(path_o,
+ path_i,
+ makeInteger(jointype),
+ restrictlist);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ joinpath = create_foreignscan_path(root,
+ joinrel,
+ rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no required_outer */
+ fdw_private);
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
@@ -2846,12 +3172,12 @@ static HeapTuple
make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context)
{
HeapTuple tuple;
- TupleDesc tupdesc = RelationGetDescr(rel);
Datum *values;
bool *nulls;
ItemPointer ctid = NULL;
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..fd8b257 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -16,6 +16,7 @@
#include "foreign/foreign.h"
#include "lib/stringinfo.h"
#include "nodes/relation.h"
+#include "nodes/plannodes.h"
#include "utils/relcache.h"
#include "libpq-fe.h"
@@ -52,12 +53,26 @@ extern void deparseSelectSql(StringInfo buf,
RelOptInfo *baserel,
Bitmapset *attrs_used,
List **retrieved_attrs);
-extern void appendWhereClause(StringInfo buf,
+extern void appendConditions(StringInfo buf,
PlannerInfo *root,
RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
List *exprs,
- bool is_first,
+ const char *prefix,
List **params);
+extern void deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **retrieved_attrs);
extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
List *targetAttrs, List *returningList,
On 3 March 2015 at 12:34, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch. I posted some comments to v6 patch in this post:/messages/by-id/CAEZqfEcNvjqq-P=jxnW1Pb4T9wvpcPoRCN7G6cc46JGuB7dY8w@mail.gmail.com
Before applying my v3 patch, please apply Kaigai-san's v6 patch and my
mod_cjv6.patch.
Sorry for complex patch combination. Those patches will be arranged
soon by Kaigai-san and me.I fixed the issues pointed out by Thom and Kohei, but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue. Besides this issue, existing regression test
passed.
Re-tested the broken query and it works for me now.
--
Thom
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/03/03 21:34, Shigeru Hanada wrote:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch.
Thanks for the work, Hanada-san and KaiGai-san!
Maybe I'm missing something, but did we agree to take this approach, ie,
"join push-down" on top of custom join? There is a comment ahout that
[1]: /messages/by-id/23343.1418658355@sss.pgh.pa.us
implementing the feature further.
but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue.
Is that something like "UPDATE foo ... FROM bar ..." where both foo and
bar are remote? If so, I think it'd be better to push such an update
down to the remote, as discussed in [2]/messages/by-id/31942.1410534785@sss.pgh.pa.us, and I'd like to work on that
together!
Sorry for having been late for the party.
Best regards,
Etsuro Fujita
[1]: /messages/by-id/23343.1418658355@sss.pgh.pa.us
[2]: /messages/by-id/31942.1410534785@sss.pgh.pa.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/03/03 21:34, Shigeru Hanada wrote:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch.Thanks for the work, Hanada-san and KaiGai-san!
Maybe I'm missing something, but did we agree to take this approach, ie,
"join push-down" on top of custom join? There is a comment ahout that
[1]. I just thought it'd be better to achieve a consensus before
implementing the feature further.
It is not correct. The join push-down feature is not implemented
on top of the custom-join feature, however, both of them are 99%
similar on both of the concept and implementation.
So, we're working to enhance foreign/custom-join interface together,
according to Robert's suggestion [3]http://bit.ly/1w1PoDU, using postgres_fdw extension
as a minimum worthwhile example for both of foreign/custom-scan.
but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue.Is that something like "UPDATE foo ... FROM bar ..." where both foo and
bar are remote? If so, I think it'd be better to push such an update
down to the remote, as discussed in [2], and I'd like to work on that
together!
Hanada-san, could you give us test query to reproduce the problem
above? I and Fujita-san can help to investigate the problem from
different standpoints for each.
Sorry for having been late for the party.
We are still in the party.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/03/04 17:31, Kouhei Kaigai wrote:
On 2015/03/03 21:34, Shigeru Hanada wrote:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch.
Maybe I'm missing something, but did we agree to take this approach, ie,
"join push-down" on top of custom join? There is a comment ahout that
[1]. I just thought it'd be better to achieve a consensus before
implementing the feature further.
It is not correct. The join push-down feature is not implemented
on top of the custom-join feature, however, both of them are 99%
similar on both of the concept and implementation.
So, we're working to enhance foreign/custom-join interface together,
according to Robert's suggestion [3], using postgres_fdw extension
as a minimum worthwhile example for both of foreign/custom-scan.
OK, thanks for the explanation!
but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue.
Is that something like "UPDATE foo ... FROM bar ..." where both foo and
bar are remote? If so, I think it'd be better to push such an update
down to the remote, as discussed in [2], and I'd like to work on that
together!
Hanada-san, could you give us test query to reproduce the problem
above? I and Fujita-san can help to investigate the problem from
different standpoints for each.
Yeah, will do.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2015-03-04 17:00 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
On 2015/03/03 21:34, Shigeru Hanada wrote:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch.Thanks for the work, Hanada-san and KaiGai-san!
Maybe I'm missing something, but did we agree to take this approach, ie,
"join push-down" on top of custom join? There is a comment ahout that [1].
I just thought it'd be better to achieve a consensus before implementing the
feature further.
As Kaigai-san says, foreign join push-down is beside custom scan, and
they are on the custom/foreign join api patch.
but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue.Is that something like "UPDATE foo ... FROM bar ..." where both foo and bar
are remote? If so, I think it'd be better to push such an update down to
the remote, as discussed in [2], and I'd like to work on that together!
A part of it, perhaps. But at the moment I see many issues to solve
around pushing down complex UPDATE/DELETE. So I once tightened the
restriction, that joins between foreign tables are pushed down only if
they are part of SELECT statement. Please see next v4 patch I'll post
soon.
Sorry for having been late for the party.
Best regards,
Etsuro Fujita[1] /messages/by-id/23343.1418658355@sss.pgh.pa.us
[2] /messages/by-id/31942.1410534785@sss.pgh.pa.us
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/03/04 17:57, Shigeru Hanada wrote:
2015-03-04 17:00 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
On 2015/03/03 21:34, Shigeru Hanada wrote:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch.
but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue.
Is that something like "UPDATE foo ... FROM bar ..." where both foo and bar
are remote? If so, I think it'd be better to push such an update down to
the remote, as discussed in [2], and I'd like to work on that together!
A part of it, perhaps. But at the moment I see many issues to solve
around pushing down complex UPDATE/DELETE. So I once tightened the
restriction, that joins between foreign tables are pushed down only if
they are part of SELECT statement. Please see next v4 patch I'll post
soon.
OK, thanks for the reply!
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Here is v4 patch of Join push-down support for foreign tables. This
patch requires Custom/Foreign join patch v7 posted by Kaigai-san.
In this version I added check about query type which gives up pushing
down joins when the join is a part of an underlying query of
UPDATE/DELETE.
As of now postgres_fdw builds a proper remote query but it can't bring
ctid value up to postgresExecForeignUpdate()...
I'm still working on supporting such query, but I'm not sure that
supporting UPDATE/DELETE is required in the first version. I attached
a patch foreign_join_update.patch to sure WIP for supporting
update/delete as top of foreign joins.
How to reproduce the error, please execute query below after running
attached init_fdw.sql for building test environment. Note that the
script drops "user1", and creates database "fdw" and "pgbench".
fdw=# explain (verbose) update pgbench_branches b set filler = 'foo'
from pgbench_tellers t where t.bid = b.bid and t.tid < 10;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
Update on public.pgbench_branches b (cost=100.00..100.67 rows=67 width=390)
Remote SQL: UPDATE public.pgbench_branches SET filler = $2 WHERE ctid = $1
-> Foreign Scan (cost=100.00..100.67 rows=67 width=390)
Output: b.bid, b.bbalance, 'foo
'::character(88), b.ctid, *
Remote SQL: SELECT r.a_0, r.a_1, r.a_2, l FROM (SELECT tid,
bid, tbalance, filler FROM public.pgbench_tellers WHERE ((tid < 10)))
l (a_0, a_1) INNER JOIN (SELECT b
id, bbalance, NULL, ctid FROM public.pgbench_branches FOR UPDATE) r
(a_0, a_1, a_2, a_3) ON ((r.a_0 = l.a_1))
(5 rows)
fdw=# explain (analyze, verbose) update pgbench_branches b set filler
= 'foo' from pgbench_tellers t where t.bid = b.bid and t.tid < 10;
ERROR: ctid is NULL
2015-03-03 21:34 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
I rebased "join push-down" patch onto Kaigai-san's Custom/Foreign Join
v6 patch. I posted some comments to v6 patch in this post:/messages/by-id/CAEZqfEcNvjqq-P=jxnW1Pb4T9wvpcPoRCN7G6cc46JGuB7dY8w@mail.gmail.com
Before applying my v3 patch, please apply Kaigai-san's v6 patch and my
mod_cjv6.patch.
Sorry for complex patch combination. Those patches will be arranged
soon by Kaigai-san and me.I fixed the issues pointed out by Thom and Kohei, but still the patch
has an issue about joins underlying UPDATE or DELETE. Now I'm working
on fixing this issue. Besides this issue, existing regression test
passed.2015-03-03 19:48 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
* Bug reported by Thom Brown
-----------------------------
# EXPLAIN VERBOSE SELECT NULL FROM (SELECT people.id FROM people INNER JOINcountries ON people.country_id = countries.id LIMIT 3) x;
ERROR: could not open relation with OID 0
Sorry, it was a problem caused by my portion. The patched setrefs.c
checks fdw_/custom_ps_tlist to determine whether Foreign/CustomScan
node is associated with a certain base relation. If *_ps_tlist is
valid, it also expects scanrelid == 0.
However, things we should check is incorrect. We may have a case
with empty *_ps_tlist if remote join expects no columns.
So, I adjusted the condition to check scanrelid instead.Is this issue fixed by v5 custom/foreign join patch?
Yes, please rebase it.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>--
Shigeru HANADA
--
Shigeru HANADA
Attachments:
foreign_join_v4.patchapplication/octet-stream; name=foreign_join_v4.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..5c08baa 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,7 +44,9 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
#include "optimizer/clauses.h"
#include "optimizer/var.h"
#include "parser/parsetree.h"
@@ -89,6 +91,8 @@ typedef struct deparse_expr_cxt
RelOptInfo *foreignrel; /* the foreign relation we are planning for */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
+ List *outertlist; /* outer child's target list */
+ List *innertlist; /* inner child's target list */
} deparse_expr_cxt;
/*
@@ -250,7 +254,7 @@ foreign_expr_walker(Node *node,
* Param's collation, ie it's not safe for it to have a
* non-default collation.
*/
- if (var->varno == glob_cxt->foreignrel->relid &&
+ if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
var->varlevelsup == 0)
{
/* Var belongs to foreign table */
@@ -743,18 +747,22 @@ deparseTargetList(StringInfo buf,
if (attr->attisdropped)
continue;
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
if (have_wholerow ||
bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
attrs_used))
{
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
deparseColumnRef(buf, rtindex, i, root);
- *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
+ else
+ appendStringInfoString(buf, "NULL");
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
/*
@@ -794,12 +802,14 @@ deparseTargetList(StringInfo buf,
* so Params and other-relation Vars should be replaced by dummy values.
*/
void
-appendWhereClause(StringInfo buf,
- PlannerInfo *root,
- RelOptInfo *baserel,
- List *exprs,
- bool is_first,
- List **params)
+appendConditions(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
+ List *exprs,
+ const char *prefix,
+ List **params)
{
deparse_expr_cxt context;
int nestlevel;
@@ -813,6 +823,8 @@ appendWhereClause(StringInfo buf,
context.foreignrel = baserel;
context.buf = buf;
context.params_list = params;
+ context.outertlist = outertlist;
+ context.innertlist = innertlist;
/* Make sure any constants in the exprs are printed portably */
nestlevel = set_transmission_modes();
@@ -822,22 +834,180 @@ appendWhereClause(StringInfo buf,
RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
/* Connect expressions with "AND" and parenthesize each condition. */
- if (is_first)
- appendStringInfoString(buf, " WHERE ");
- else
- appendStringInfoString(buf, " AND ");
+ if (prefix)
+ appendStringInfo(buf, "%s", prefix);
appendStringInfoChar(buf, '(');
deparseExpr(ri->clause, &context);
appendStringInfoChar(buf, ')');
- is_first = false;
+ prefix= " AND ";
}
reset_transmission_modes(nestlevel);
}
/*
+ * Deparse given Var into buf.
+ */
+static TargetEntry *
+deparseJoinVar(Var *node, deparse_expr_cxt *context)
+{
+ const char *side;
+ ListCell *lc2;
+ TargetEntry *tle = NULL;
+ int j;
+
+ j = 0;
+ foreach(lc2, context->outertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "l";
+ break;
+ }
+ j++;
+ }
+ if (tle == NULL)
+ {
+ j = 0;
+ foreach(lc2, context->innertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "r";
+ break;
+ }
+ j++;
+ }
+ }
+ Assert(tle);
+
+ if (node->varattno == 0)
+ appendStringInfo(context->buf, "%s", side);
+ else
+ appendStringInfo(context->buf, "%s.a_%d", side, j);
+
+ return tle;
+}
+
+static void
+deparseColumnAliases(StringInfo buf, List *targetlist)
+{
+ int i;
+ ListCell *lc;
+
+ i = 0;
+ foreach(lc, targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ Var *var = (Var *) tle->expr;
+
+ Assert(IsA(var, Var));
+
+ /* Skip whole-row reference */
+ if (var->varattno == 0)
+ continue;
+
+ /* Deparse column alias for the subquery */
+ if (i > 0)
+ appendStringInfoString(buf, ", ");
+ appendStringInfo(buf, "a_%d", i);
+ i++;
+ }
+}
+
+/*
+ * Construct a SELECT statement which contains join clause.
+ *
+ * We also create an TargetEntry List of the columns being retrieved, which is
+ * returned to *fdw_ps_tlist.
+ *
+ * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+ * statement of the outer child relation. postfix _i means those for the inner
+ * child relation. jointype and restrictlist are information of join method.
+ * fdw_ps_tlist is output parameter to pass target list of the pseudo scan to
+ * caller.
+ */
+void
+deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **fdw_ps_tlist)
+{
+ StringInfoData selbuf; /* buffer for SELECT clause */
+ StringInfoData abuf_o; /* buffer for column alias list of outer */
+ StringInfoData abuf_i; /* buffer for column alias list of inner */
+ int i;
+ ListCell *lc;
+ const char *jointype_str;
+ deparse_expr_cxt context;
+
+ context.root = root;
+ context.foreignrel = baserel;
+ context.buf = &selbuf;
+ context.params_list = NULL;
+ context.outertlist = plan_o->scan.plan.targetlist;
+ context.innertlist = plan_i->scan.plan.targetlist;
+
+ jointype_str = jointype == JOIN_INNER ? "INNER" :
+ jointype == JOIN_LEFT ? "LEFT" :
+ jointype == JOIN_RIGHT ? "RIGHT" :
+ jointype == JOIN_FULL ? "FULL" : "";
+
+ /* print SELECT clause of the join scan */
+ /* XXX: should extend deparseTargetList()? */
+ initStringInfo(&selbuf);
+ i = 0;
+ foreach(lc, baserel->reltargetlist)
+ {
+ Var *var = (Var *) lfirst(lc);
+ TargetEntry *tle;
+
+ if (i > 0)
+ appendStringInfoString(&selbuf, ", ");
+ deparseJoinVar(var, &context);
+
+ tle = makeTargetEntry((Expr *) copyObject(var),
+ i + 1, pstrdup(""), false);
+ if (fdw_ps_tlist)
+ *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+ i++;
+ }
+
+ /* Deparse column alias portion of subquery in FROM clause. */
+ initStringInfo(&abuf_o);
+ deparseColumnAliases(&abuf_o, plan_o->scan.plan.targetlist);
+ initStringInfo(&abuf_i);
+ deparseColumnAliases(&abuf_i, plan_i->scan.plan.targetlist);
+
+ /* Construct SELECT statement */
+ appendStringInfo(sql, "SELECT %s FROM", selbuf.data);
+ appendStringInfo(sql, " (%s) l (%s) %s JOIN (%s) r (%s) ",
+ sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+ /* Append ON clause */
+ appendConditions(sql, root, baserel,
+ plan_o->scan.plan.targetlist,
+ plan_i->scan.plan.targetlist,
+ restrictlist, " ON ", NULL);
+}
+
+/*
* deparse remote INSERT statement
*
* The statement text is appended to buf, and we also create an integer List
@@ -1261,6 +1431,8 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
/*
* Deparse given Var node into context->buf.
*
+ * If context has valid innerrel, this is invoked for a join conditions.
+ *
* If the Var belongs to the foreign relation, just print its remote name.
* Otherwise, it's effectively a Param (and will in fact be a Param at
* run time). Handle it the same way we handle plain Params --- see
@@ -1271,39 +1443,46 @@ deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ if (context->foreignrel->reloptkind == RELOPT_JOINREL)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ deparseJoinVar(node, context);
}
else
{
- /* Treat like a Param */
- if (context->params_list)
+ if (node->varno == context->foreignrel->relid &&
+ node->varlevelsup == 0)
{
- int pindex = 0;
- ListCell *lc;
-
- /* find its index in params_list */
- foreach(lc, *context->params_list)
+ /* Var belongs to foreign table */
+ deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ }
+ else
+ {
+ /* Treat like a Param */
+ if (context->params_list)
{
- pindex++;
- if (equal(node, (Node *) lfirst(lc)))
- break;
+ int pindex = 0;
+ ListCell *lc;
+
+ /* find its index in params_list */
+ foreach(lc, *context->params_list)
+ {
+ pindex++;
+ if (equal(node, (Node *) lfirst(lc)))
+ break;
+ }
+ if (lc == NULL)
+ {
+ /* not in list, so add it */
+ pindex++;
+ *context->params_list = lappend(*context->params_list, node);
+ }
+
+ printRemoteParam(pindex, node->vartype, node->vartypmod, context);
}
- if (lc == NULL)
+ else
{
- /* not in list, so add it */
- pindex++;
- *context->params_list = lappend(*context->params_list, node);
+ printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
-
- printRemoteParam(pindex, node->vartype, node->vartypmod, context);
- }
- else
- {
- printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
}
}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..7f96793 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -489,17 +489,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- can't
-- parameterized remote path
EXPLAIN (VERBOSE, COSTS false)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------
- Nested Loop
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- -> Foreign Scan on public.ft2 a
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
- -> Foreign Scan on public.ft2 b
- Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
+ Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT r.a_0, r.a_1, r.a_2, r.a_3, r.a_4, r.a_5, r.a_6, r.a_7, l.a_0, l.a_1, l.a_2, l.a_3, l.a_4, l.a_5, l.a_6, l.a_7 FROM (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1") l (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) INNER JOIN (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))) r (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) ON ((r.a_1 = l.a_0))
+(3 rows)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
@@ -656,16 +651,16 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Nested Loop
Output: t1.c3, t2.c3
-> Foreign Scan on public.ft1 t1
Output: t1.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 1))
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 2))
(8 rows)
EXECUTE st1(1, 1);
@@ -683,8 +678,8 @@ EXECUTE st1(101, 101);
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -699,7 +694,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
-> Foreign Scan on public.ft2 t2
Output: t2.c3
Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10))
(15 rows)
EXECUTE st2(10, 20);
@@ -717,8 +712,8 @@ EXECUTE st2(101, 121);
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -732,7 +727,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
Output: t2.c3
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
(14 rows)
EXECUTE st3(10, 20);
@@ -1085,7 +1080,7 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
Output: ((ft2_1.c1 + 1000)), ((ft2_1.c2 + 100)), ((ft2_1.c3 || ft2_1.c3))
-> Foreign Scan on public.ft2 ft2_1
Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
- Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1"
+ Remote SQL: SELECT "C 1", c2, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1"
(9 rows)
INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
@@ -1219,7 +1214,7 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, NULL, c8, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -1231,14 +1226,14 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
EXPLAIN (verbose, costs off)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
- QUERY PLAN
-----------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------
Delete on public.ft2
Output: c1, c4
- Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c4
+ Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", NULL, NULL, c4, NULL, NULL, NULL, NULL
-> Foreign Scan on public.ft2
Output: ctid
- Remote SQL: SELECT ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
(6 rows)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
@@ -1360,7 +1355,7 @@ DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.c2
- Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT NULL, c2, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -2594,12 +2589,12 @@ select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
-- Consistent check constraints provide consistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2positive CHECK (c2 >= 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 < 0;
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 < 0;
@@ -2638,12 +2633,12 @@ ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2positive;
-- But inconsistent check constraints provide inconsistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2negative CHECK (c2 < 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 >= 0;
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 >= 0;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..0cd5d7c 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -78,10 +79,30 @@ typedef struct PgFdwRelationInfo
ForeignTable *table;
ForeignServer *server;
UserMapping *user; /* only set in use_remote_estimate mode */
+ Oid checkAsUser;
} PgFdwRelationInfo;
/*
- * Indexes of FDW-private information stored in fdw_private lists.
+ * Indexes of FDW-private information stored in fdw_private of ForeignPath.
+ * We use fdw_private of a ForeighPath when the path represents a join which
+ * can be pushed down to remote side.
+ *
+ * 1) Outer child path node
+ * 2) Inner child path node
+ * 3) Join type number(as an Integer node)
+ * 4) RestrictInfo list of join conditions
+ */
+enum FdwPathPrivateIndex
+{
+ FdwPathPrivateOuterPath,
+ FdwPathPrivateInnerPath,
+ FdwPathPrivateJoinType,
+ FdwPathPrivateRestrictList,
+};
+
+/*
+ * Indexes of FDW-private information stored in fdw_private of ForeignScan of
+ * a simple foreign table scan for a SELECT statement.
*
* We store various information in ForeignScan.fdw_private to pass it from
* planner to executor. Currently we store:
@@ -98,7 +119,11 @@ enum FdwScanPrivateIndex
/* SQL statement to execute remotely (as a String node) */
FdwScanPrivateSelectSql,
/* Integer list of attribute numbers retrieved by the SELECT */
- FdwScanPrivateRetrievedAttrs
+ FdwScanPrivateRetrievedAttrs,
+ /* Integer value of server for the scan */
+ FdwScanPrivateServerOid,
+ /* Integer value of checkAsUser for the scan */
+ FdwScanPrivatecheckAsUser,
};
/*
@@ -129,6 +154,7 @@ enum FdwModifyPrivateIndex
typedef struct PgFdwScanState
{
Relation rel; /* relcache entry for the foreign table */
+ TupleDesc tupdesc; /* tuple descriptor of the scan */
AttInMetadata *attinmeta; /* attribute datatype conversion metadata */
/* extracted fdw_private data */
@@ -288,6 +314,15 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -324,6 +359,7 @@ static void analyze_row_processor(PGresult *res, int row,
static HeapTuple make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context);
@@ -368,6 +404,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPaths = postgresGetForeignJoinPaths;
+
PG_RETURN_POINTER(routine);
}
@@ -385,6 +424,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
{
PgFdwRelationInfo *fpinfo;
ListCell *lc;
+ RangeTblEntry *rte;
/*
* We use PgFdwRelationInfo to pass various information to subsequent
@@ -428,6 +468,13 @@ postgresGetForeignRelSize(PlannerInfo *root,
}
/*
+ * Retrieve RTE to obtain checkAsUser. checkAsUser is used to determine
+ * the user to use to obtain user mapping.
+ */
+ rte = planner_rt_fetch(baserel->relid, root);
+ fpinfo->checkAsUser = rte->checkAsUser;
+
+ /*
* If the table or the server is configured to use remote estimates,
* identify which user to do remote access as during planning. This
* should match what ExecCheckRTEPerms() does. If we fail due to lack of
@@ -435,7 +482,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
*/
if (fpinfo->use_remote_estimate)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
Oid userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
@@ -752,6 +798,8 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ List *fdw_ps_tlist = NIL;
+ ForeignScan *scan;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -769,7 +817,7 @@ postgresGetForeignPlan(PlannerInfo *root,
* This code must match "extract_actual_clauses(scan_clauses, false)"
* except for the additional decision about remote versus local execution.
* Note however that we only strip the RestrictInfo nodes from the
- * local_exprs list, since appendWhereClause expects a list of
+ * local_exprs list, since appendConditions expects a list of
* RestrictInfos.
*/
foreach(lc, scan_clauses)
@@ -797,64 +845,123 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
- if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
-
- /*
- * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
- * initial row fetch, rather than later on as is done for local tables.
- * The extra roundtrips involved in trying to duplicate the local
- * semantics exactly don't seem worthwhile (see also comments for
- * RowMarkType).
- *
- * Note: because we actually run the query as a cursor, this assumes that
- * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
- */
- if (baserel->relid == root->parse->resultRelation &&
- (root->parse->commandType == CMD_UPDATE ||
- root->parse->commandType == CMD_DELETE))
- {
- /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
- appendStringInfoString(&sql, " FOR UPDATE");
- }
- else
+ if (scan_relid > 0)
{
- RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+ deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
+ &retrieved_attrs);
+ if (remote_conds)
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_conds, " WHERE ", ¶ms_list);
- if (rc)
+ /*
+ * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
+ * initial row fetch, rather than later on as is done for local tables.
+ * The extra roundtrips involved in trying to duplicate the local
+ * semantics exactly don't seem worthwhile (see also comments for
+ * RowMarkType).
+ *
+ * Note: because we actually run the query as a cursor, this assumes
+ * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+ * before 8.3.
+ */
+ if (baserel->relid == root->parse->resultRelation &&
+ (root->parse->commandType == CMD_UPDATE ||
+ root->parse->commandType == CMD_DELETE))
{
- /*
- * Relation is specified as a FOR UPDATE/SHARE target, so handle
- * that.
- *
- * For now, just ignore any [NO] KEY specification, since (a) it's
- * not clear what that means for a remote table that we don't have
- * complete information about, and (b) it wouldn't work anyway on
- * older remote servers. Likewise, we don't worry about NOWAIT.
- */
- switch (rc->strength)
+ /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+ appendStringInfoString(&sql, " FOR UPDATE");
+ }
+ else
+ {
+ RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+
+ if (rc)
{
- case LCS_FORKEYSHARE:
- case LCS_FORSHARE:
- appendStringInfoString(&sql, " FOR SHARE");
- break;
- case LCS_FORNOKEYUPDATE:
- case LCS_FORUPDATE:
- appendStringInfoString(&sql, " FOR UPDATE");
- break;
+ /*
+ * Relation is specified as a FOR UPDATE/SHARE target, so handle
+ * that.
+ *
+ * For now, just ignore any [NO] KEY specification, since (a)
+ * it's not clear what that means for a remote table that we
+ * don't have complete information about, and (b) it wouldn't
+ * work anyway on older remote servers. Likewise, we don't
+ * worry about NOWAIT.
+ */
+ switch (rc->strength)
+ {
+ case LCS_FORKEYSHARE:
+ case LCS_FORSHARE:
+ appendStringInfoString(&sql, " FOR SHARE");
+ break;
+ case LCS_FORNOKEYUPDATE:
+ case LCS_FORUPDATE:
+ appendStringInfoString(&sql, " FOR UPDATE");
+ break;
+ }
}
}
}
+ else
+ {
+ /* Join case */
+ Path *path_o;
+ Path *path_i;
+ const char *sql_o;
+ const char *sql_i;
+ ForeignScan *plan_o;
+ ForeignScan *plan_i;
+ JoinType jointype;
+ List *restrictlist;
+ int i;
+
+ /*
+ * Retrieve infomation from fdw_private.
+ */
+ path_o = list_nth(best_path->fdw_private, FdwPathPrivateOuterPath);
+ path_i = list_nth(best_path->fdw_private, FdwPathPrivateInnerPath);
+ jointype = intVal(list_nth(best_path->fdw_private,
+ FdwPathPrivateJoinType));
+ restrictlist = list_nth(best_path->fdw_private,
+ FdwPathPrivateRestrictList);
+
+ /*
+ * Construct remote query from bottom to the top. ForeignScan plan
+ * node of underlying scans are node necessary for execute the plan
+ * tree, but it is handy to construct remote query recursively.
+ */
+ plan_o = (ForeignScan *) create_plan_recurse(root, path_o);
+ Assert(IsA(plan_o, ForeignScan));
+ sql_o = strVal(list_nth(plan_o->fdw_private, FdwScanPrivateSelectSql));
+
+ plan_i = (ForeignScan *) create_plan_recurse(root, path_i);
+ Assert(IsA(plan_i, ForeignScan));
+ sql_i = strVal(list_nth(plan_i->fdw_private, FdwScanPrivateSelectSql));
+
+ deparseJoinSql(&sql, root, baserel, path_o, path_i, plan_o, plan_i,
+ sql_o, sql_i, jointype, restrictlist, &fdw_ps_tlist);
+ retrieved_attrs = NIL;
+ for (i = 0; i < list_length(fdw_ps_tlist); i++)
+ retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
+ }
/*
* Build the fdw_private list that will be available to the executor.
* Items in the list must match enum FdwScanPrivateIndex, above.
*/
- fdw_private = list_make2(makeString(sql.data),
- retrieved_attrs);
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+
+ /*
+ * In pseudo scan case such as join push-down, add OID of server and
+ * checkAsUser as extra information.
+ * XXX: passing serverid and checkAsUser might simplify code through
+ * all cases, simple scans and join push-down.
+ */
+ if (scan_relid == 0)
+ {
+ fdw_private = lappend(fdw_private,
+ makeInteger(fpinfo->server->serverid));
+ fdw_private = lappend(fdw_private, makeInteger(fpinfo->checkAsUser));
+ }
/*
* Create the ForeignScan node from target list, local filtering
@@ -864,11 +971,18 @@ postgresGetForeignPlan(PlannerInfo *root,
* field of the finished plan node; we can't keep them in private state
* because then they wouldn't be subject to later planner processing.
*/
- return make_foreignscan(tlist,
+ scan = make_foreignscan(tlist,
local_exprs,
scan_relid,
params_list,
fdw_private);
+
+ /*
+ * set fdw_ps_tlist to handle tuples generated by this scan.
+ */
+ scan->fdw_ps_tlist = fdw_ps_tlist;
+
+ return scan;
}
/*
@@ -881,9 +995,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
PgFdwScanState *fsstate;
- RangeTblEntry *rte;
+ Oid serverid;
Oid userid;
- ForeignTable *table;
ForeignServer *server;
UserMapping *user;
int numParams;
@@ -903,22 +1016,51 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
node->fdw_state = (void *) fsstate;
/*
- * Identify which user to do the remote access as. This should match what
- * ExecCheckRTEPerms() does.
+ * Initialize fsstate.
+ *
+ * These values should be determined.
+ * - fsstate->rel, NULL if no actual relation
+ * - serverid, OID of forign server to use for the scan
+ * - userid, searching user mapping
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ /* Simple foreign table scan */
+ RangeTblEntry *rte;
+ ForeignTable *table;
- /* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
- user = GetUserMapping(userid, server->serverid);
+ /*
+ * Identify which user to do the remote access as. This should match
+ * what ExecCheckRTEPerms() does.
+ */
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ /* Get info about foreign table. */
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ serverid = table->serverid;
+ }
+ else
+ {
+ Oid checkAsUser;
+
+ /* Join */
+ fsstate->rel = NULL; /* No actual relation to scan */
+
+ serverid = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivateServerOid));
+ checkAsUser = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivatecheckAsUser));
+ userid = checkAsUser ? checkAsUser : GetUserId();
+ }
/*
* Get connection to the foreign server. Connection manager will
* establish new connection if necessary.
*/
+ server = GetForeignServer(serverid);
+ user = GetUserMapping(userid, server->serverid);
fsstate->conn = GetConnection(server, user, false);
/* Assign a unique ID for my cursor */
@@ -929,7 +1071,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
fsstate->query = strVal(list_nth(fsplan->fdw_private,
FdwScanPrivateSelectSql));
fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
- FdwScanPrivateRetrievedAttrs);
+ FdwScanPrivateRetrievedAttrs);
/* Create contexts for batches of tuples and per-tuple temp workspace. */
fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -944,7 +1086,11 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->tupdesc = RelationGetDescr(fsstate->rel);
+ else
+ fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1747,11 +1893,13 @@ estimate_path_cost_size(PlannerInfo *root,
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
- true, NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ fpinfo->remote_conds, " WHERE ", NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
- (fpinfo->remote_conds == NIL), NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_join_conds,
+ fpinfo->remote_conds == NIL ? " WHERE " : " AND ",
+ NULL);
/* Get the remote estimate */
conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -2052,6 +2200,7 @@ fetch_more_data(ForeignScanState *node)
fsstate->tuples[i] =
make_tuple_from_result_row(res, i,
fsstate->rel,
+ fsstate->tupdesc,
fsstate->attinmeta,
fsstate->retrieved_attrs,
fsstate->temp_cxt);
@@ -2270,6 +2419,7 @@ store_returning_result(PgFdwModifyState *fmstate,
newtup = make_tuple_from_result_row(res, 0,
fmstate->rel,
+ RelationGetDescr(fmstate->rel),
fmstate->attinmeta,
fmstate->retrieved_attrs,
fmstate->temp_cxt);
@@ -2562,6 +2712,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
astate->rows[pos] = make_tuple_from_result_row(res, row,
astate->rel,
+ RelationGetDescr(astate->rel),
astate->attinmeta,
astate->retrieved_attrs,
astate->temp_cxt);
@@ -2835,6 +2986,215 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o,
+ PgFdwRelationInfo *fpinfo_i,
+ JoinType jointype)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+ copyObject(fpinfo_i->remote_conds));
+ fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+ copyObject(fpinfo_i->local_conds));
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+ fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+ fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+ fpinfo_i->local_conds_sel;
+ if (jointype == JOIN_INNER)
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ else
+ fpinfo->rows = Max(fpinfo_o->rows, fpinfo_i->rows);
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+ /* checkAsuser must be identical */
+ fpinfo->checkAsUser = fpinfo_o->checkAsUser;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPaths
+ * Add possible ForeignPath to joinrel.
+ *
+ * Joins satify conditions below can be pushed down to remote PostgreSQL server.
+ *
+ * 1) Join type is inner or outer
+ * 2) Join conditions consist of remote-safe expressions.
+ * 3) Join source relations don't have any local filter.
+ */
+static void
+postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignPath *joinpath;
+ ForeignPath *path_o = (ForeignPath *) outerrel->cheapest_total_path;
+ ForeignPath *path_i = (ForeignPath *) innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwRelationInfo *fpinfo;
+ double rows;
+ Cost startup_cost;
+ Cost total_cost;
+ ListCell *lc;
+ List *fdw_private;
+
+ /*
+ * Currently we don't push-down joins in query for UPDATE/DELETE. This
+ * restriction might be relaxed in a later release.
+ */
+ if (root->parse->commandType != CMD_SELECT)
+ {
+ ereport(DEBUG3, (errmsg("command type is not SELECT")));
+ return;
+ }
+
+ /* Source relations should be ForeignPath. */
+ if (!IsA(path_o, ForeignPath) || !IsA(path_i, ForeignPath))
+ {
+ ereport(DEBUG3, (errmsg("underlying path is not a ForeignPath")));
+ return;
+ }
+
+ /*
+ * Skip considering reversed join combination.
+ */
+ if (outerrel->relid < innerrel->relid)
+ {
+ ereport(DEBUG3, (errmsg("reversed combination")));
+ return;
+ }
+
+ /*
+ * Both relations in the join must belong to same server.
+ */
+ fpinfo_o = path_o->path.parent->fdw_private;
+ fpinfo_i = path_i->path.parent->fdw_private;
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ {
+ ereport(DEBUG3, (errmsg("server unmatch")));
+ return;
+ }
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ {
+ ereport(DEBUG3, (errmsg("unsupported join type (SEMI, ANTI)")));
+ return;
+ }
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && restrictlist == NIL)
+ {
+ ereport(DEBUG3, (errmsg("unsupported join type (CROSS)")));
+ return;
+ }
+
+ /*
+ * Neither source relation can have local conditions. This can be relaxed
+ * if the join is an inner join and local conditions don't contain volatile
+ * function/operator, but as of now we leave it as future enhancement.
+ */
+ if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+ {
+ ereport(DEBUG3, (errmsg("join with local filter is not supported")));
+ return;
+ }
+
+ /*
+ * Join condition must be safe to push down.
+ */
+ foreach(lc, restrictlist)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+ if (!is_foreign_expr(root, joinrel, rinfo->clause))
+ {
+ ereport(DEBUG3, (errmsg("one of join conditions is not safe to push-down")));
+ return;
+ }
+ }
+
+ /*
+ * checkAsUser of source pathes should match.
+ */
+ if (fpinfo_o->checkAsUser != fpinfo_i->checkAsUser)
+ {
+ ereport(DEBUG3, (errmsg("unmatch checkAsUser")));
+ return;
+ }
+
+ /* Here we know that this join can be pushed-down to remote side. */
+
+ /* Construct fpinfo for the join relation */
+ fpinfo = merge_fpinfo(fpinfo_o, fpinfo_i, jointype);
+ joinrel->fdw_private = fpinfo;
+
+ /* TODO determine cost and rows of the join. */
+ rows = fpinfo->rows;
+ startup_cost = fpinfo->startup_cost;
+ total_cost = fpinfo->total_cost;
+
+ fdw_private = list_make4(path_o,
+ path_i,
+ makeInteger(jointype),
+ restrictlist);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ joinpath = create_foreignscan_path(root,
+ joinrel,
+ rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no required_outer */
+ fdw_private);
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
@@ -2846,12 +3206,12 @@ static HeapTuple
make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context)
{
HeapTuple tuple;
- TupleDesc tupdesc = RelationGetDescr(rel);
Datum *values;
bool *nulls;
ItemPointer ctid = NULL;
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..fd8b257 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -16,6 +16,7 @@
#include "foreign/foreign.h"
#include "lib/stringinfo.h"
#include "nodes/relation.h"
+#include "nodes/plannodes.h"
#include "utils/relcache.h"
#include "libpq-fe.h"
@@ -52,12 +53,26 @@ extern void deparseSelectSql(StringInfo buf,
RelOptInfo *baserel,
Bitmapset *attrs_used,
List **retrieved_attrs);
-extern void appendWhereClause(StringInfo buf,
+extern void appendConditions(StringInfo buf,
PlannerInfo *root,
RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
List *exprs,
- bool is_first,
+ const char *prefix,
List **params);
+extern void deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **retrieved_attrs);
extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
List *targetAttrs, List *returningList,
foreign_join_update.patchapplication/octet-stream; name=foreign_join_update.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..09b0115 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,7 +44,9 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
#include "optimizer/clauses.h"
#include "optimizer/var.h"
#include "parser/parsetree.h"
@@ -89,6 +91,8 @@ typedef struct deparse_expr_cxt
RelOptInfo *foreignrel; /* the foreign relation we are planning for */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
+ List *outertlist; /* outer child's target list */
+ List *innertlist; /* inner child's target list */
} deparse_expr_cxt;
/*
@@ -250,7 +254,7 @@ foreign_expr_walker(Node *node,
* Param's collation, ie it's not safe for it to have a
* non-default collation.
*/
- if (var->varno == glob_cxt->foreignrel->relid &&
+ if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
var->varlevelsup == 0)
{
/* Var belongs to foreign table */
@@ -743,18 +747,22 @@ deparseTargetList(StringInfo buf,
if (attr->attisdropped)
continue;
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
if (have_wholerow ||
bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
attrs_used))
{
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
deparseColumnRef(buf, rtindex, i, root);
- *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
+ else
+ appendStringInfoString(buf, "NULL");
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
/*
@@ -794,12 +802,14 @@ deparseTargetList(StringInfo buf,
* so Params and other-relation Vars should be replaced by dummy values.
*/
void
-appendWhereClause(StringInfo buf,
- PlannerInfo *root,
- RelOptInfo *baserel,
- List *exprs,
- bool is_first,
- List **params)
+appendConditions(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
+ List *exprs,
+ const char *prefix,
+ List **params)
{
deparse_expr_cxt context;
int nestlevel;
@@ -813,6 +823,8 @@ appendWhereClause(StringInfo buf,
context.foreignrel = baserel;
context.buf = buf;
context.params_list = params;
+ context.outertlist = outertlist;
+ context.innertlist = innertlist;
/* Make sure any constants in the exprs are printed portably */
nestlevel = set_transmission_modes();
@@ -822,22 +834,186 @@ appendWhereClause(StringInfo buf,
RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
/* Connect expressions with "AND" and parenthesize each condition. */
- if (is_first)
- appendStringInfoString(buf, " WHERE ");
- else
- appendStringInfoString(buf, " AND ");
+ if (prefix)
+ appendStringInfo(buf, "%s", prefix);
appendStringInfoChar(buf, '(');
deparseExpr(ri->clause, &context);
appendStringInfoChar(buf, ')');
- is_first = false;
+ prefix= " AND ";
}
reset_transmission_modes(nestlevel);
}
/*
+ * Deparse given Var into buf.
+ */
+static TargetEntry *
+deparseJoinVar(Var *node, deparse_expr_cxt *context)
+{
+ const char *side;
+ ListCell *lc2;
+ TargetEntry *tle = NULL;
+ int j;
+
+ j = 0;
+ foreach(lc2, context->outertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "l";
+ break;
+ }
+ j++;
+ }
+ if (tle == NULL)
+ {
+ j = 0;
+ foreach(lc2, context->innertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "r";
+ break;
+ }
+ j++;
+ }
+ }
+ Assert(tle);
+
+ if (node->varattno == 0)
+ appendStringInfo(context->buf, "%s", side);
+ else
+ appendStringInfo(context->buf, "%s.a_%d", side, j);
+
+ return tle;
+}
+
+static void
+deparseColumnAliases(StringInfo buf, List *targetlist, bool has_ctid)
+{
+ int i;
+ ListCell *lc;
+
+ i = 0;
+ foreach(lc, targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ Var *var = (Var *) tle->expr;
+
+ Assert(IsA(var, Var));
+
+ /* Deparse column alias for the subquery */
+ if (i > 0)
+ appendStringInfoString(buf, ", ");
+ appendStringInfo(buf, "a_%d", i);
+ i++;
+ }
+
+ /* Append alias for ctid system attribute */
+ if (has_ctid)
+ {
+ if (i > 0)
+ appendStringInfoString(buf, ", ");
+ appendStringInfo(buf, "a_%d", i);
+ }
+}
+
+/*
+ * Construct a SELECT statement which contains join clause.
+ *
+ * We also create an TargetEntry List of the columns being retrieved, which is
+ * returned to *fdw_ps_tlist.
+ *
+ * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+ * statement of the outer child relation. postfix _i means those for the inner
+ * child relation. jointype and restrictlist are information of join method.
+ * fdw_ps_tlist is output parameter to pass target list of the pseudo scan to
+ * caller.
+ */
+void
+deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ bool has_ctid_o,
+ bool has_ctid_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **fdw_ps_tlist)
+{
+ StringInfoData selbuf; /* buffer for SELECT clause */
+ StringInfoData abuf_o; /* buffer for column alias list of outer */
+ StringInfoData abuf_i; /* buffer for column alias list of inner */
+ int i;
+ ListCell *lc;
+ const char *jointype_str;
+ deparse_expr_cxt context;
+
+ context.root = root;
+ context.foreignrel = baserel;
+ context.buf = &selbuf;
+ context.params_list = NULL;
+ context.outertlist = plan_o->scan.plan.targetlist;
+ context.innertlist = plan_i->scan.plan.targetlist;
+
+ jointype_str = jointype == JOIN_INNER ? "INNER" :
+ jointype == JOIN_LEFT ? "LEFT" :
+ jointype == JOIN_RIGHT ? "RIGHT" :
+ jointype == JOIN_FULL ? "FULL" : "";
+
+ /* print SELECT clause of the join scan */
+ /* XXX: should extend deparseTargetList()? */
+ initStringInfo(&selbuf);
+ i = 0;
+ foreach(lc, baserel->reltargetlist)
+ {
+ Var *var = (Var *) lfirst(lc);
+ TargetEntry *tle;
+
+ if (i > 0)
+ appendStringInfoString(&selbuf, ", ");
+ deparseJoinVar(var, &context);
+
+ tle = makeTargetEntry((Expr *) copyObject(var),
+ i + 1, pstrdup(""), false);
+ if (fdw_ps_tlist)
+ *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+ i++;
+ }
+
+ /* Deparse column alias portion of subquery in FROM clause. */
+ initStringInfo(&abuf_o);
+ deparseColumnAliases(&abuf_o, plan_o->scan.plan.targetlist, has_ctid_o);
+ initStringInfo(&abuf_i);
+ deparseColumnAliases(&abuf_i, plan_i->scan.plan.targetlist, has_ctid_i);
+
+ /* Construct SELECT statement */
+ appendStringInfo(sql, "SELECT %s FROM", selbuf.data);
+ appendStringInfo(sql, " (%s) l (%s) %s JOIN (%s) r (%s) ",
+ sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+ /* Append ON clause */
+ appendConditions(sql, root, baserel,
+ plan_o->scan.plan.targetlist,
+ plan_i->scan.plan.targetlist,
+ restrictlist, " ON ", NULL);
+}
+
+/*
* deparse remote INSERT statement
*
* The statement text is appended to buf, and we also create an integer List
@@ -1261,6 +1437,8 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
/*
* Deparse given Var node into context->buf.
*
+ * If context has valid innerrel, this is invoked for a join conditions.
+ *
* If the Var belongs to the foreign relation, just print its remote name.
* Otherwise, it's effectively a Param (and will in fact be a Param at
* run time). Handle it the same way we handle plain Params --- see
@@ -1271,39 +1449,46 @@ deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ if (context->foreignrel->reloptkind == RELOPT_JOINREL)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ deparseJoinVar(node, context);
}
else
{
- /* Treat like a Param */
- if (context->params_list)
+ if (node->varno == context->foreignrel->relid &&
+ node->varlevelsup == 0)
{
- int pindex = 0;
- ListCell *lc;
-
- /* find its index in params_list */
- foreach(lc, *context->params_list)
+ /* Var belongs to foreign table */
+ deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ }
+ else
+ {
+ /* Treat like a Param */
+ if (context->params_list)
{
- pindex++;
- if (equal(node, (Node *) lfirst(lc)))
- break;
+ int pindex = 0;
+ ListCell *lc;
+
+ /* find its index in params_list */
+ foreach(lc, *context->params_list)
+ {
+ pindex++;
+ if (equal(node, (Node *) lfirst(lc)))
+ break;
+ }
+ if (lc == NULL)
+ {
+ /* not in list, so add it */
+ pindex++;
+ *context->params_list = lappend(*context->params_list, node);
+ }
+
+ printRemoteParam(pindex, node->vartype, node->vartypmod, context);
}
- if (lc == NULL)
+ else
{
- /* not in list, so add it */
- pindex++;
- *context->params_list = lappend(*context->params_list, node);
+ printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
-
- printRemoteParam(pindex, node->vartype, node->vartypmod, context);
- }
- else
- {
- printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
}
}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..7f96793 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -489,17 +489,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- can't
-- parameterized remote path
EXPLAIN (VERBOSE, COSTS false)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------
- Nested Loop
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- -> Foreign Scan on public.ft2 a
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
- -> Foreign Scan on public.ft2 b
- Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
+ Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT r.a_0, r.a_1, r.a_2, r.a_3, r.a_4, r.a_5, r.a_6, r.a_7, l.a_0, l.a_1, l.a_2, l.a_3, l.a_4, l.a_5, l.a_6, l.a_7 FROM (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1") l (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) INNER JOIN (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))) r (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) ON ((r.a_1 = l.a_0))
+(3 rows)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
@@ -656,16 +651,16 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Nested Loop
Output: t1.c3, t2.c3
-> Foreign Scan on public.ft1 t1
Output: t1.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 1))
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 2))
(8 rows)
EXECUTE st1(1, 1);
@@ -683,8 +678,8 @@ EXECUTE st1(101, 101);
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -699,7 +694,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
-> Foreign Scan on public.ft2 t2
Output: t2.c3
Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10))
(15 rows)
EXECUTE st2(10, 20);
@@ -717,8 +712,8 @@ EXECUTE st2(101, 121);
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -732,7 +727,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
Output: t2.c3
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
(14 rows)
EXECUTE st3(10, 20);
@@ -1085,7 +1080,7 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
Output: ((ft2_1.c1 + 1000)), ((ft2_1.c2 + 100)), ((ft2_1.c3 || ft2_1.c3))
-> Foreign Scan on public.ft2 ft2_1
Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
- Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1"
+ Remote SQL: SELECT "C 1", c2, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1"
(9 rows)
INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
@@ -1219,7 +1214,7 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, NULL, c8, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -1231,14 +1226,14 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
EXPLAIN (verbose, costs off)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
- QUERY PLAN
-----------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------
Delete on public.ft2
Output: c1, c4
- Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c4
+ Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", NULL, NULL, c4, NULL, NULL, NULL, NULL
-> Foreign Scan on public.ft2
Output: ctid
- Remote SQL: SELECT ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
(6 rows)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
@@ -1360,7 +1355,7 @@ DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.c2
- Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT NULL, c2, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -2594,12 +2589,12 @@ select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
-- Consistent check constraints provide consistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2positive CHECK (c2 >= 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 < 0;
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 < 0;
@@ -2638,12 +2633,12 @@ ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2positive;
-- But inconsistent check constraints provide inconsistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2negative CHECK (c2 < 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 >= 0;
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 >= 0;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..9ba3a6d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -78,10 +79,30 @@ typedef struct PgFdwRelationInfo
ForeignTable *table;
ForeignServer *server;
UserMapping *user; /* only set in use_remote_estimate mode */
+ Oid checkAsUser;
} PgFdwRelationInfo;
/*
- * Indexes of FDW-private information stored in fdw_private lists.
+ * Indexes of FDW-private information stored in fdw_private of ForeignPath.
+ * We use fdw_private of a ForeighPath when the path represents a join which
+ * can be pushed down to remote side.
+ *
+ * 1) Outer child path node
+ * 2) Inner child path node
+ * 3) Join type number(as an Integer node)
+ * 4) RestrictInfo list of join conditions
+ */
+enum FdwPathPrivateIndex
+{
+ FdwPathPrivateOuterPath,
+ FdwPathPrivateInnerPath,
+ FdwPathPrivateJoinType,
+ FdwPathPrivateRestrictList,
+};
+
+/*
+ * Indexes of FDW-private information stored in fdw_private of ForeignScan of
+ * a simple foreign table scan for a SELECT statement.
*
* We store various information in ForeignScan.fdw_private to pass it from
* planner to executor. Currently we store:
@@ -98,7 +119,11 @@ enum FdwScanPrivateIndex
/* SQL statement to execute remotely (as a String node) */
FdwScanPrivateSelectSql,
/* Integer list of attribute numbers retrieved by the SELECT */
- FdwScanPrivateRetrievedAttrs
+ FdwScanPrivateRetrievedAttrs,
+ /* Integer value of server for the scan */
+ FdwScanPrivateServerOid,
+ /* Integer value of checkAsUser for the scan */
+ FdwScanPrivatecheckAsUser,
};
/*
@@ -129,6 +154,7 @@ enum FdwModifyPrivateIndex
typedef struct PgFdwScanState
{
Relation rel; /* relcache entry for the foreign table */
+ TupleDesc tupdesc; /* tuple descriptor of the scan */
AttInMetadata *attinmeta; /* attribute datatype conversion metadata */
/* extracted fdw_private data */
@@ -288,6 +314,15 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -324,6 +359,7 @@ static void analyze_row_processor(PGresult *res, int row,
static HeapTuple make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context);
@@ -368,6 +404,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPaths = postgresGetForeignJoinPaths;
+
PG_RETURN_POINTER(routine);
}
@@ -385,6 +424,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
{
PgFdwRelationInfo *fpinfo;
ListCell *lc;
+ RangeTblEntry *rte;
/*
* We use PgFdwRelationInfo to pass various information to subsequent
@@ -428,6 +468,13 @@ postgresGetForeignRelSize(PlannerInfo *root,
}
/*
+ * Retrieve RTE to obtain checkAsUser. checkAsUser is used to determine
+ * the user to use to obtain user mapping.
+ */
+ rte = planner_rt_fetch(baserel->relid, root);
+ fpinfo->checkAsUser = rte->checkAsUser;
+
+ /*
* If the table or the server is configured to use remote estimates,
* identify which user to do remote access as during planning. This
* should match what ExecCheckRTEPerms() does. If we fail due to lack of
@@ -435,7 +482,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
*/
if (fpinfo->use_remote_estimate)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
Oid userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
@@ -752,6 +798,8 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ List *fdw_ps_tlist = NIL;
+ ForeignScan *scan;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -769,7 +817,7 @@ postgresGetForeignPlan(PlannerInfo *root,
* This code must match "extract_actual_clauses(scan_clauses, false)"
* except for the additional decision about remote versus local execution.
* Note however that we only strip the RestrictInfo nodes from the
- * local_exprs list, since appendWhereClause expects a list of
+ * local_exprs list, since appendConditions expects a list of
* RestrictInfos.
*/
foreach(lc, scan_clauses)
@@ -797,64 +845,139 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
- if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
-
- /*
- * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
- * initial row fetch, rather than later on as is done for local tables.
- * The extra roundtrips involved in trying to duplicate the local
- * semantics exactly don't seem worthwhile (see also comments for
- * RowMarkType).
- *
- * Note: because we actually run the query as a cursor, this assumes that
- * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
- */
- if (baserel->relid == root->parse->resultRelation &&
- (root->parse->commandType == CMD_UPDATE ||
- root->parse->commandType == CMD_DELETE))
+ if (scan_relid > 0)
{
- /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
- appendStringInfoString(&sql, " FOR UPDATE");
- }
- else
- {
- RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+ deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
+ &retrieved_attrs);
+ if (remote_conds)
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_conds, " WHERE ", ¶ms_list);
- if (rc)
+ /*
+ * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
+ * initial row fetch, rather than later on as is done for local tables.
+ * The extra roundtrips involved in trying to duplicate the local
+ * semantics exactly don't seem worthwhile (see also comments for
+ * RowMarkType).
+ *
+ * Note: because we actually run the query as a cursor, this assumes
+ * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+ * before 8.3.
+ */
+ if (baserel->relid == root->parse->resultRelation &&
+ (root->parse->commandType == CMD_UPDATE ||
+ root->parse->commandType == CMD_DELETE))
{
- /*
- * Relation is specified as a FOR UPDATE/SHARE target, so handle
- * that.
- *
- * For now, just ignore any [NO] KEY specification, since (a) it's
- * not clear what that means for a remote table that we don't have
- * complete information about, and (b) it wouldn't work anyway on
- * older remote servers. Likewise, we don't worry about NOWAIT.
- */
- switch (rc->strength)
+ /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+ appendStringInfoString(&sql, " FOR UPDATE");
+ }
+ else
+ {
+ RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+
+ if (rc)
{
- case LCS_FORKEYSHARE:
- case LCS_FORSHARE:
- appendStringInfoString(&sql, " FOR SHARE");
- break;
- case LCS_FORNOKEYUPDATE:
- case LCS_FORUPDATE:
- appendStringInfoString(&sql, " FOR UPDATE");
- break;
+ /*
+ * Relation is specified as a FOR UPDATE/SHARE target, so handle
+ * that.
+ *
+ * For now, just ignore any [NO] KEY specification, since (a)
+ * it's not clear what that means for a remote table that we
+ * don't have complete information about, and (b) it wouldn't
+ * work anyway on older remote servers. Likewise, we don't
+ * worry about NOWAIT.
+ */
+ switch (rc->strength)
+ {
+ case LCS_FORKEYSHARE:
+ case LCS_FORSHARE:
+ appendStringInfoString(&sql, " FOR SHARE");
+ break;
+ case LCS_FORNOKEYUPDATE:
+ case LCS_FORUPDATE:
+ appendStringInfoString(&sql, " FOR UPDATE");
+ break;
+ }
}
}
}
+ else
+ {
+ /* Join case */
+ Path *path_o;
+ Path *path_i;
+ const char *sql_o;
+ const char *sql_i;
+ ForeignScan *plan_o;
+ ForeignScan *plan_i;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ Bitmapset *attrs_used_o;
+ Bitmapset *attrs_used_i;
+ bool has_ctid_o;
+ bool has_ctid_i;
+ JoinType jointype;
+ List *restrictlist;
+ int i;
+
+#define CTID_ATTNO (SelfItemPointerAttributeNumber - FirstLowInvalidHeapAttributeNumber)
+
+ /*
+ * Retrieve infomation from fdw_private.
+ */
+ path_o = list_nth(best_path->fdw_private, FdwPathPrivateOuterPath);
+ path_i = list_nth(best_path->fdw_private, FdwPathPrivateInnerPath);
+ jointype = intVal(list_nth(best_path->fdw_private,
+ FdwPathPrivateJoinType));
+ restrictlist = list_nth(best_path->fdw_private,
+ FdwPathPrivateRestrictList);
+
+ fpinfo_o = (PgFdwRelationInfo *) path_o->parent->fdw_private;
+ attrs_used_o = fpinfo_o->attrs_used;
+ has_ctid_o = bms_is_member(CTID_ATTNO, attrs_used_o);
+ fpinfo_i = (PgFdwRelationInfo *) path_i->parent->fdw_private;
+ attrs_used_i = fpinfo_i->attrs_used;
+ has_ctid_i = bms_is_member(CTID_ATTNO, attrs_used_i);
+
+ /*
+ * Construct remote query from bottom to the top. ForeignScan plan
+ * node of underlying scans are node necessary for execute the plan
+ * tree, but it is handy to construct remote query recursively.
+ */
+ plan_o = (ForeignScan *) create_plan_recurse(root, path_o);
+ Assert(IsA(plan_o, ForeignScan));
+ sql_o = strVal(list_nth(plan_o->fdw_private, FdwScanPrivateSelectSql));
+
+ plan_i = (ForeignScan *) create_plan_recurse(root, path_i);
+ Assert(IsA(plan_i, ForeignScan));
+ sql_i = strVal(list_nth(plan_i->fdw_private, FdwScanPrivateSelectSql));
+
+ deparseJoinSql(&sql, root, baserel, path_o, path_i,
+ has_ctid_o, has_ctid_i, plan_o, plan_i,
+ sql_o, sql_i, jointype, restrictlist, &fdw_ps_tlist);
+ retrieved_attrs = NIL;
+ for (i = 0; i < list_length(fdw_ps_tlist); i++)
+ retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
+ }
/*
* Build the fdw_private list that will be available to the executor.
* Items in the list must match enum FdwScanPrivateIndex, above.
*/
- fdw_private = list_make2(makeString(sql.data),
- retrieved_attrs);
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+
+ /*
+ * In pseudo scan case such as join push-down, add OID of server and
+ * checkAsUser as extra information.
+ * XXX: passing serverid and checkAsUser might simplify code through
+ * all cases, simple scans and join push-down.
+ */
+ if (scan_relid == 0)
+ {
+ fdw_private = lappend(fdw_private,
+ makeInteger(fpinfo->server->serverid));
+ fdw_private = lappend(fdw_private, makeInteger(fpinfo->checkAsUser));
+ }
/*
* Create the ForeignScan node from target list, local filtering
@@ -864,11 +987,18 @@ postgresGetForeignPlan(PlannerInfo *root,
* field of the finished plan node; we can't keep them in private state
* because then they wouldn't be subject to later planner processing.
*/
- return make_foreignscan(tlist,
+ scan = make_foreignscan(tlist,
local_exprs,
scan_relid,
params_list,
fdw_private);
+
+ /*
+ * set fdw_ps_tlist to handle tuples generated by this scan.
+ */
+ scan->fdw_ps_tlist = fdw_ps_tlist;
+
+ return scan;
}
/*
@@ -881,9 +1011,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
PgFdwScanState *fsstate;
- RangeTblEntry *rte;
+ Oid serverid;
Oid userid;
- ForeignTable *table;
ForeignServer *server;
UserMapping *user;
int numParams;
@@ -903,22 +1032,51 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
node->fdw_state = (void *) fsstate;
/*
- * Identify which user to do the remote access as. This should match what
- * ExecCheckRTEPerms() does.
+ * Initialize fsstate.
+ *
+ * These values should be determined.
+ * - fsstate->rel, NULL if no actual relation
+ * - serverid, OID of forign server to use for the scan
+ * - userid, searching user mapping
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ /* Simple foreign table scan */
+ RangeTblEntry *rte;
+ ForeignTable *table;
- /* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
- user = GetUserMapping(userid, server->serverid);
+ /*
+ * Identify which user to do the remote access as. This should match
+ * what ExecCheckRTEPerms() does.
+ */
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ /* Get info about foreign table. */
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ serverid = table->serverid;
+ }
+ else
+ {
+ Oid checkAsUser;
+
+ /* Join */
+ fsstate->rel = NULL; /* No actual relation to scan */
+
+ serverid = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivateServerOid));
+ checkAsUser = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivatecheckAsUser));
+ userid = checkAsUser ? checkAsUser : GetUserId();
+ }
/*
* Get connection to the foreign server. Connection manager will
* establish new connection if necessary.
*/
+ server = GetForeignServer(serverid);
+ user = GetUserMapping(userid, server->serverid);
fsstate->conn = GetConnection(server, user, false);
/* Assign a unique ID for my cursor */
@@ -929,7 +1087,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
fsstate->query = strVal(list_nth(fsplan->fdw_private,
FdwScanPrivateSelectSql));
fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
- FdwScanPrivateRetrievedAttrs);
+ FdwScanPrivateRetrievedAttrs);
/* Create contexts for batches of tuples and per-tuple temp workspace. */
fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -944,7 +1102,11 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->tupdesc = RelationGetDescr(fsstate->rel);
+ else
+ fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1747,11 +1909,13 @@ estimate_path_cost_size(PlannerInfo *root,
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
- true, NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ fpinfo->remote_conds, " WHERE ", NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
- (fpinfo->remote_conds == NIL), NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_join_conds,
+ fpinfo->remote_conds == NIL ? " WHERE " : " AND ",
+ NULL);
/* Get the remote estimate */
conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -2052,6 +2216,7 @@ fetch_more_data(ForeignScanState *node)
fsstate->tuples[i] =
make_tuple_from_result_row(res, i,
fsstate->rel,
+ fsstate->tupdesc,
fsstate->attinmeta,
fsstate->retrieved_attrs,
fsstate->temp_cxt);
@@ -2270,6 +2435,7 @@ store_returning_result(PgFdwModifyState *fmstate,
newtup = make_tuple_from_result_row(res, 0,
fmstate->rel,
+ RelationGetDescr(fmstate->rel),
fmstate->attinmeta,
fmstate->retrieved_attrs,
fmstate->temp_cxt);
@@ -2562,6 +2728,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
astate->rows[pos] = make_tuple_from_result_row(res, row,
astate->rel,
+ RelationGetDescr(astate->rel),
astate->attinmeta,
astate->retrieved_attrs,
astate->temp_cxt);
@@ -2835,6 +3002,205 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o,
+ PgFdwRelationInfo *fpinfo_i,
+ JoinType jointype)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+ copyObject(fpinfo_i->remote_conds));
+ fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+ copyObject(fpinfo_i->local_conds));
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+ fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+ fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+ fpinfo_i->local_conds_sel;
+ if (jointype == JOIN_INNER)
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ else
+ fpinfo->rows = Max(fpinfo_o->rows, fpinfo_i->rows);
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+ /* checkAsuser must be identical */
+ fpinfo->checkAsUser = fpinfo_o->checkAsUser;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPaths
+ * Add possible ForeignPath to joinrel.
+ *
+ * Joins satify conditions below can be pushed down to remote PostgreSQL server.
+ *
+ * 1) Join type is inner or outer
+ * 2) Join conditions consist of remote-safe expressions.
+ * 3) Join source relations don't have any local filter.
+ */
+static void
+postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignPath *joinpath;
+ ForeignPath *path_o = (ForeignPath *) outerrel->cheapest_total_path;
+ ForeignPath *path_i = (ForeignPath *) innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwRelationInfo *fpinfo;
+ double rows;
+ Cost startup_cost;
+ Cost total_cost;
+ ListCell *lc;
+ List *fdw_private;
+
+ /* Source relations should be ForeignPath. */
+ if (!IsA(path_o, ForeignPath) || !IsA(path_i, ForeignPath))
+ {
+ ereport(DEBUG3, (errmsg("underlying path is not a ForeignPath")));
+ return;
+ }
+
+ /*
+ * Skip considering reversed join combination.
+ */
+ if (outerrel->relid < innerrel->relid)
+ {
+ ereport(DEBUG3, (errmsg("reversed combination")));
+ return;
+ }
+
+ /*
+ * Both relations in the join must belong to same server.
+ */
+ fpinfo_o = path_o->path.parent->fdw_private;
+ fpinfo_i = path_i->path.parent->fdw_private;
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ {
+ ereport(DEBUG3, (errmsg("server unmatch")));
+ return;
+ }
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ {
+ ereport(DEBUG3, (errmsg("unsupported join type (SEMI, ANTI)")));
+ return;
+ }
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty restrictlist. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && restrictlist == NIL)
+ {
+ ereport(DEBUG3, (errmsg("unsupported join type (CROSS)")));
+ return;
+ }
+
+ /*
+ * Neither source relation can have local conditions. This can be relaxed
+ * if the join is an inner join and local conditions don't contain volatile
+ * function/operator, but as of now we leave it as future enhancement.
+ */
+ if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+ {
+ ereport(DEBUG3, (errmsg("join with local filter is not supported")));
+ return;
+ }
+
+ /*
+ * Join condition must be safe to push down.
+ */
+ foreach(lc, restrictlist)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+ if (!is_foreign_expr(root, joinrel, rinfo->clause))
+ {
+ ereport(DEBUG3, (errmsg("one of join conditions is not safe to push-down")));
+ return;
+ }
+ }
+
+ /*
+ * checkAsUser of source pathes should match.
+ */
+ if (fpinfo_o->checkAsUser != fpinfo_i->checkAsUser)
+ {
+ ereport(DEBUG3, (errmsg("unmatch checkAsUser")));
+ return;
+ }
+
+ /* Here we know that this join can be pushed-down to remote side. */
+
+ /* Construct fpinfo for the join relation */
+ fpinfo = merge_fpinfo(fpinfo_o, fpinfo_i, jointype);
+ joinrel->fdw_private = fpinfo;
+
+ /* TODO determine cost and rows of the join. */
+ rows = fpinfo->rows;
+ startup_cost = fpinfo->startup_cost;
+ total_cost = fpinfo->total_cost;
+
+ fdw_private = list_make4(path_o,
+ path_i,
+ makeInteger(jointype),
+ restrictlist);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ joinpath = create_foreignscan_path(root,
+ joinrel,
+ rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no required_outer */
+ fdw_private);
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
@@ -2846,12 +3212,12 @@ static HeapTuple
make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context)
{
HeapTuple tuple;
- TupleDesc tupdesc = RelationGetDescr(rel);
Datum *values;
bool *nulls;
ItemPointer ctid = NULL;
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..d1b8bf2 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -16,6 +16,7 @@
#include "foreign/foreign.h"
#include "lib/stringinfo.h"
#include "nodes/relation.h"
+#include "nodes/plannodes.h"
#include "utils/relcache.h"
#include "libpq-fe.h"
@@ -52,12 +53,28 @@ extern void deparseSelectSql(StringInfo buf,
RelOptInfo *baserel,
Bitmapset *attrs_used,
List **retrieved_attrs);
-extern void appendWhereClause(StringInfo buf,
+extern void appendConditions(StringInfo buf,
PlannerInfo *root,
RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
List *exprs,
- bool is_first,
+ const char *prefix,
List **params);
+extern void deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ bool has_ctid_o,
+ bool has_ctid_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *restrictlist,
+ List **retrieved_attrs);
extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
List *targetAttrs, List *returningList,
Hi Hanada-san,
I am looking at the patch. Here are my comments
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
1588
required_outer);
Now, that the same function is being used for creating foreign scan paths
for joins, we should be calling get_joinrel_parampathinfo() on a join rel
and get_baserel_parampathinfo() on base rel.
The patch seems to handle all the restriction clauses in the same way.
There are two kinds of restriction clauses - a. join quals (specified using
ON clause; optimizer might move them to the other class if that doesn't
affect correctness) and b. quals on join relation (specified in the WHERE
clause, optimizer might move them to the other class if that doesn't affect
correctness). The quals in "a" are applied while the join is being computed
whereas those in "b" are applied after the join is computed. For example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)
postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)
postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)
The difference between these two kinds is evident in case of outer joins,
for inner join optimizer puts all of them in class "b". The remote query
sent to the foreign server has all those in ON clause. Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')
postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')
postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2
FROM public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt)
r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
l.a_1))
(3 rows)
The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)
which should match the result obtained by substituting local tables for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)
Once we start distinguishing the two kinds of quals, there is some
optimization possible. For pushing down a join it's essential that all the
quals in "a" are safe to be pushed down. But a join can be pushed down,
even if quals in "a" are not safe to be pushed down. But more clauses one
pushed down to foreign server, lesser are the rows fetched from the foreign
server. In postgresGetForeignJoinPath, instead of checking all the
restriction clauses to be safe to be pushed down, we need to check only
those which are join quals (class "a").
Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
ft1.val + ft1.val2 = ft2.val;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM public.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1) ON ((
(r.a_0 + r.a_1) = l.a_0))
Output just specified val + val2, it doesn't tell, where those val and val2
come from, neither it's evident from the rest of the context.
On Mon, Mar 2, 2015 at 6:18 PM, Shigeru Hanada <shigeru.hanada@gmail.com>
wrote:
Attached is the revised/rebased version of the $SUBJECT.
This patch is based on Kaigai-san's custom/foreign join patch, so
please apply it before this patch. In this version I changed some
points from original postgres_fdw.1) Disabled SELECT clause optimization
~9.4 postgres_fdw lists only columns actually used in SELECT clause,
but AFAIS it makes SQL generation complex. So I disabled such
optimization and put "NULL" for unnecessary columns in SELECT clause
of remote query.2) Extended deparse context
To allow deparsing based on multiple source relations, I added some
members to context structure. They are unnecessary for simple query
with single foreign table, but IMO it should be integrated.With Kaigai-san's advise, changes for supporting foreign join on
postgres_fdw is minimized into postgres_fdw itself. But I added new
FDW API named GetForeignJoinPaths() to keep the policy that all
interface between core and FDW should be in FdwRoutine, instead of
using hook function. Now I'm writing document about it, and will post
it in a day.2015-02-19 16:19 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
2015-02-17 10:39 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Let me put some comments in addition to where you're checking now.
[design issues]
* Cost estimation
Estimation and evaluation of cost for remote join query is not an
obvious issue. In principle, local side cannot determine the cost
to run remote join without remote EXPLAIN, because local side has
no information about JOIN logic applied on the remote side.
Probably, we have to put an assumption for remote join algorithm,
because local planner has no idea about remote planner's choice
unless foreign-join don't take "use_remote_estimate".
I think, it is reasonable assumption (even if it is incorrect) to
calculate remote join cost based on local hash-join algorithm.
If user wants more correct estimation, remote EXPLAIN will make
more reliable cost estimation.Hm, I guess that you chose hash-join as "least-costed join". In the
pgbench model, most combination between two tables generate hash join
as cheapest path. Remote EXPLAIN is very expensive in the context of
planning, so it would easily make the plan optimization meaningless.
But giving an option to users is good, I agree.It also needs a consensus whether cost for remote CPU execution is
equivalent to local CPU. If we think local CPU is rare resource
than remote one, a discount rate will make planner more preferable
to choose remote join than local oneSomething like cpu_cost_ratio as a new server-level FDW option?
Once we assume a join algorithm for remote join, unit cost for
remote CPU, we can calculate a cost for foreign join based on
the local join logic plus cost for network translation (maybe
fdw_tuple_cost?).Yes, sum of these costs is the total cost of a remote join.
o fdw_startup_cost
o hash-join cost, estimated as a local join
o fdw_tuple_cost * rows * width* FDW options
Unlike table scan, FDW options we should refer is unclear.
Table level FDW options are associated with a foreign table as
literal. I think we have two options here:
1. Foreign-join refers FDW options for foreign-server, but ones
for foreign-tables are ignored.
2. Foreign-join is prohibited when both of relations don't have
identical FDW options.
My preference is 2. Even though N-way foreign join, it ensures
all the tables involved with (N-1)-way foreign join has identical
FDW options, thus it leads we can make N-way foreign join with
all identical FDW options.
One exception is "updatable" flag of postgres_fdw. It does not
make sense on remote join, so I think mixture of updatable and
non-updatable foreign tables should be admitted, however, it is
a decision by FDW driver.Probably, above points need to take time for getting consensus.
I'd like to see your opinion prior to editing your patch.postgres_fdw can't push down a join which contains foreign tables on
multiple servers, so use_remote_estimate and fdw_startup_cost are the
only FDW options to consider. So we have options for each option.1-a. If all foreign tables in the join has identical
use_remote_estimate, allow pushing down.
1-b. If any of foreign table in the join has true as
use_remote_estimate, use remote estimate.2-a. If all foreign tables in the join has identical fdw_startup_cost,
allow pushing down.
2-b. Always use max value in the join. (cost would be more expensive)
2-c. Always use min value in the join. (cost would be cheaper)I prefer 1-a and 2-b, so more joins avoid remote EXPLAIN but have
reasonable cost about startup.I agree about "updatable" option.
[implementation issues]
The interface does not intend to add new Path/Plan type for each scan
that replaces foreign joins. What postgres_fdw should do is, adding
ForeignPath towards a particular joinrel, then it populates ForeignScan
with remote join query once it got chosen by the planner.That idea is interesting, and make many things simpler. Please let me
consider.
A few functions added in src/backend/foreign/foreign.c are not
called by anywhere, at this moment.create_plan_recurse() is reverted to static. It is needed for custom-
join enhancement, if no other infrastructure can support.I made it back to static because I thought that create_plan_recurse
can be called by core before giving control to FDWs. But I'm not sure
it can be applied to custom scans. I'll recheck that part.--
Shigeru HANADA--
Shigeru HANADA--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Here is v4 patch of Join push-down support for foreign tables. This
patch requires Custom/Foreign join patch v7 posted by Kaigai-san.
Thanks for your efforts,
In this version I added check about query type which gives up pushing
down joins when the join is a part of an underlying query of
UPDATE/DELETE.As of now postgres_fdw builds a proper remote query but it can't bring
ctid value up to postgresExecForeignUpdate()...
The "ctid" reference shall exist as an usual column reference in
the target-list of join-rel. It is not origin of the problem.
See my investigation below. I guess special treatment on whole-row-
reference is problematic, rather than ctid.
How to reproduce the error, please execute query below after running
attached init_fdw.sql for building test environment. Note that the
script drops "user1", and creates database "fdw" and "pgbench".fdw=# explain (verbose) update pgbench_branches b set filler = 'foo'
from pgbench_tellers t where t.bid = b.bid and t.tid < 10;QUERY
PLAN----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------------------
----------------------------------------------------------------------------
-----------------------------------
Update on public.pgbench_branches b (cost=100.00..100.67 rows=67 width=390)
Remote SQL: UPDATE public.pgbench_branches SET filler = $2 WHERE ctid = $1
-> Foreign Scan (cost=100.00..100.67 rows=67 width=390)
Output: b.bid, b.bbalance, 'foo'::character(88), b.ctid, *
Remote SQL: SELECT r.a_0, r.a_1, r.a_2, l FROM (SELECT tid,
bid, tbalance, filler FROM public.pgbench_tellers WHERE ((tid < 10)))
l (a_0, a_1) INNER JOIN (SELECT b
id, bbalance, NULL, ctid FROM public.pgbench_branches FOR UPDATE) r
(a_0, a_1, a_2, a_3) ON ((r.a_0 = l.a_1))
(5 rows)
fdw=# explain (analyze, verbose) update pgbench_branches b set filler
= 'foo' from pgbench_tellers t where t.bid = b.bid and t.tid < 10;
ERROR: ctid is NULL
It seems to me the left relation has smaller number of alias definitions
than required. The left SELECT statement has 4 target-entries, however,
only a_0 and a_1 are defined.
The logic to assign aliases of relation/column might be problematic.
Because deparseColumnAliases() add aliases for each target-entries of
underlying SELECT statement, but skips whole-row0-reference.
On the other hands, postgres_fdw takes special treatment on whole-
row-reference. Once a whole-row-reference is used, postgres_fdw
put all the non-system columns on the target-list of remote SELECT
statement.
Thus, it makes mismatch between baserel->targetlist and generated
aliases.
I think we have two options:
(1) Stop special treatment for whole-row-reference, even if it is
simple foreign-scan (which does not involve join).
(2) Add a new routine to reconstruct whole-row-reference of
a particular relation from the target-entries of joined relations.
My preference is (2). It can keep the code simple, and I doubt
whether network traffic optimization actually has advantage towards
disadvantage of local tuple reconstruction (which consumes additional
CPU cycles).
Does it make sense for you?
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
Hi Ashutosh, thanks for the review.
2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
1588
required_outer);
Now, that the same function is being used for creating foreign scan paths
for joins, we should be calling get_joinrel_parampathinfo() on a join rel
and get_baserel_parampathinfo() on base rel.
Got it. Please let me check the difference.
The patch seems to handle all the restriction clauses in the same way. There
are two kinds of restriction clauses - a. join quals (specified using ON
clause; optimizer might move them to the other class if that doesn't affect
correctness) and b. quals on join relation (specified in the WHERE clause,
optimizer might move them to the other class if that doesn't affect
correctness). The quals in "a" are applied while the join is being computed
whereas those in "b" are applied after the join is computed. For example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)The difference between these two kinds is evident in case of outer joins,
for inner join optimizer puts all of them in class "b". The remote query
sent to the foreign server has all those in ON clause. Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
l.a_1))
(3 rows)The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)which should match the result obtained by substituting local tables for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)Once we start distinguishing the two kinds of quals, there is some
optimization possible. For pushing down a join it's essential that all the
quals in "a" are safe to be pushed down. But a join can be pushed down, even
if quals in "a" are not safe to be pushed down. But more clauses one pushed
down to foreign server, lesser are the rows fetched from the foreign server.
In postgresGetForeignJoinPath, instead of checking all the restriction
clauses to be safe to be pushed down, we need to check only those which are
join quals (class "a").
The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses. This is similar to
what create_nestloop_plan and siblings do.
Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
ft1.val + ft1.val2 = ft2.val;QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM public.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1) ON ((
(r.a_0 + r.a_1) = l.a_0))Output just specified val + val2, it doesn't tell, where those val and val2
come from, neither it's evident from the rest of the context.
Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Here is the v5 patch of Join push-down support for foreign tables.
Changes since v4:
- Separete remote conditions into ON and WHERE, per Ashutosh.
- Add regression test cases for foreign join.
- Don't skip reversed relation combination in OUTER join cases.
I'm now working on two issues from Kaigai-san and Ashutosu, whole-row
reference handling and use of get_joinrel_parampathinfo().
2015-03-05 22:00 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
Hi Ashutosh, thanks for the review.
2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
1588
required_outer);
Now, that the same function is being used for creating foreign scan paths
for joins, we should be calling get_joinrel_parampathinfo() on a join rel
and get_baserel_parampathinfo() on base rel.Got it. Please let me check the difference.
The patch seems to handle all the restriction clauses in the same way. There
are two kinds of restriction clauses - a. join quals (specified using ON
clause; optimizer might move them to the other class if that doesn't affect
correctness) and b. quals on join relation (specified in the WHERE clause,
optimizer might move them to the other class if that doesn't affect
correctness). The quals in "a" are applied while the join is being computed
whereas those in "b" are applied after the join is computed. For example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)The difference between these two kinds is evident in case of outer joins,
for inner join optimizer puts all of them in class "b". The remote query
sent to the foreign server has all those in ON clause. Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
l.a_1))
(3 rows)The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)which should match the result obtained by substituting local tables for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)Once we start distinguishing the two kinds of quals, there is some
optimization possible. For pushing down a join it's essential that all the
quals in "a" are safe to be pushed down. But a join can be pushed down, even
if quals in "a" are not safe to be pushed down. But more clauses one pushed
down to foreign server, lesser are the rows fetched from the foreign server.
In postgresGetForeignJoinPath, instead of checking all the restriction
clauses to be safe to be pushed down, we need to check only those which are
join quals (class "a").The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses. This is similar to
what create_nestloop_plan and siblings do.Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
ft1.val + ft1.val2 = ft2.val;QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM public.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1) ON ((
(r.a_0 + r.a_1) = l.a_0))Output just specified val + val2, it doesn't tell, where those val and val2
come from, neither it's evident from the rest of the context.Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?--
Shigeru HANADA
--
Shigeru HANADA
Attachments:
foreign_join_v5.patchapplication/octet-stream; name=foreign_join_v5.patchDownload
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 59cb053..8f86e50 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -44,7 +44,9 @@
#include "catalog/pg_proc.h"
#include "catalog/pg_type.h"
#include "commands/defrem.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "nodes/plannodes.h"
#include "optimizer/clauses.h"
#include "optimizer/var.h"
#include "parser/parsetree.h"
@@ -89,6 +91,8 @@ typedef struct deparse_expr_cxt
RelOptInfo *foreignrel; /* the foreign relation we are planning for */
StringInfo buf; /* output buffer to append to */
List **params_list; /* exprs that will become remote Params */
+ List *outertlist; /* outer child's target list */
+ List *innertlist; /* inner child's target list */
} deparse_expr_cxt;
/*
@@ -250,7 +254,7 @@ foreign_expr_walker(Node *node,
* Param's collation, ie it's not safe for it to have a
* non-default collation.
*/
- if (var->varno == glob_cxt->foreignrel->relid &&
+ if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
var->varlevelsup == 0)
{
/* Var belongs to foreign table */
@@ -743,18 +747,22 @@ deparseTargetList(StringInfo buf,
if (attr->attisdropped)
continue;
+ if (!first)
+ appendStringInfoString(buf, ", ");
+ first = false;
+
if (have_wholerow ||
bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
attrs_used))
{
- if (!first)
- appendStringInfoString(buf, ", ");
- first = false;
deparseColumnRef(buf, rtindex, i, root);
- *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
+ else
+ appendStringInfoString(buf, "NULL");
+
+ *retrieved_attrs = lappend_int(*retrieved_attrs, i);
}
/*
@@ -794,12 +802,14 @@ deparseTargetList(StringInfo buf,
* so Params and other-relation Vars should be replaced by dummy values.
*/
void
-appendWhereClause(StringInfo buf,
- PlannerInfo *root,
- RelOptInfo *baserel,
- List *exprs,
- bool is_first,
- List **params)
+appendConditions(StringInfo buf,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
+ List *exprs,
+ const char *prefix,
+ List **params)
{
deparse_expr_cxt context;
int nestlevel;
@@ -813,6 +823,8 @@ appendWhereClause(StringInfo buf,
context.foreignrel = baserel;
context.buf = buf;
context.params_list = params;
+ context.outertlist = outertlist;
+ context.innertlist = innertlist;
/* Make sure any constants in the exprs are printed portably */
nestlevel = set_transmission_modes();
@@ -820,24 +832,197 @@ appendWhereClause(StringInfo buf,
foreach(lc, exprs)
{
RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ Expr *expr;
- /* Connect expressions with "AND" and parenthesize each condition. */
- if (is_first)
- appendStringInfoString(buf, " WHERE ");
+ /* List element is a RestrictInfo or an Expr */
+ if (IsA(ri, RestrictInfo))
+ expr = ri->clause;
else
- appendStringInfoString(buf, " AND ");
+ expr = (Expr *) lfirst(lc);
+
+ /* Connect expressions with "AND" and parenthesize each condition. */
+ if (prefix)
+ appendStringInfo(buf, "%s", prefix);
appendStringInfoChar(buf, '(');
- deparseExpr(ri->clause, &context);
+ deparseExpr(expr, &context);
appendStringInfoChar(buf, ')');
- is_first = false;
+ prefix= " AND ";
}
reset_transmission_modes(nestlevel);
}
/*
+ * Deparse given Var into buf.
+ */
+static TargetEntry *
+deparseJoinVar(Var *node, deparse_expr_cxt *context)
+{
+ const char *side;
+ ListCell *lc2;
+ TargetEntry *tle = NULL;
+ int j;
+
+ j = 0;
+ foreach(lc2, context->outertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "l";
+ break;
+ }
+ j++;
+ }
+ if (tle == NULL)
+ {
+ j = 0;
+ foreach(lc2, context->innertlist)
+ {
+ TargetEntry *childtle = (TargetEntry *) lfirst(lc2);
+
+ if (equal(childtle->expr, node))
+ {
+ tle = copyObject(childtle);
+ side = "r";
+ break;
+ }
+ j++;
+ }
+ }
+ Assert(tle);
+
+ if (node->varattno == 0)
+ appendStringInfo(context->buf, "%s", side);
+ else
+ appendStringInfo(context->buf, "%s.a_%d", side, j);
+
+ return tle;
+}
+
+static void
+deparseColumnAliases(StringInfo buf, List *targetlist)
+{
+ int i;
+ ListCell *lc;
+
+ i = 0;
+ foreach(lc, targetlist)
+ {
+ TargetEntry *tle = (TargetEntry *) lfirst(lc);
+ Var *var = (Var *) tle->expr;
+
+ Assert(IsA(var, Var));
+
+ /* Skip whole-row reference */
+ if (var->varattno == 0)
+ continue;
+
+ /* Deparse column alias for the subquery */
+ if (i > 0)
+ appendStringInfoString(buf, ", ");
+ appendStringInfo(buf, "a_%d", i);
+ i++;
+ }
+}
+
+/*
+ * Construct a SELECT statement which contains join clause.
+ *
+ * We also create an TargetEntry List of the columns being retrieved, which is
+ * returned to *fdw_ps_tlist.
+ *
+ * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+ * statement of the outer child relation. postfix _i means those for the inner
+ * child relation. jointype and joinclauses are information of join method.
+ * fdw_ps_tlist is output parameter to pass target list of the pseudo scan to
+ * caller.
+ */
+void
+deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *joinclauses,
+ List *otherclauses,
+ List **fdw_ps_tlist)
+{
+ StringInfoData selbuf; /* buffer for SELECT clause */
+ StringInfoData abuf_o; /* buffer for column alias list of outer */
+ StringInfoData abuf_i; /* buffer for column alias list of inner */
+ int i;
+ ListCell *lc;
+ const char *jointype_str;
+ deparse_expr_cxt context;
+
+ context.root = root;
+ context.foreignrel = baserel;
+ context.buf = &selbuf;
+ context.params_list = NULL;
+ context.outertlist = plan_o->scan.plan.targetlist;
+ context.innertlist = plan_i->scan.plan.targetlist;
+
+ jointype_str = jointype == JOIN_INNER ? "INNER" :
+ jointype == JOIN_LEFT ? "LEFT" :
+ jointype == JOIN_RIGHT ? "RIGHT" :
+ jointype == JOIN_FULL ? "FULL" : "";
+
+ /* print SELECT clause of the join scan */
+ /* XXX: should extend deparseTargetList()? */
+ initStringInfo(&selbuf);
+ i = 0;
+ foreach(lc, baserel->reltargetlist)
+ {
+ Var *var = (Var *) lfirst(lc);
+ TargetEntry *tle;
+
+ if (i > 0)
+ appendStringInfoString(&selbuf, ", ");
+ deparseJoinVar(var, &context);
+
+ tle = makeTargetEntry((Expr *) copyObject(var),
+ i + 1, pstrdup(""), false);
+ if (fdw_ps_tlist)
+ *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+ i++;
+ }
+
+ /* Deparse column alias portion of subquery in FROM clause. */
+ initStringInfo(&abuf_o);
+ deparseColumnAliases(&abuf_o, plan_o->scan.plan.targetlist);
+ initStringInfo(&abuf_i);
+ deparseColumnAliases(&abuf_i, plan_i->scan.plan.targetlist);
+
+ /* Construct SELECT statement */
+ appendStringInfo(sql, "SELECT %s FROM", selbuf.data);
+ appendStringInfo(sql, " (%s) l (%s) %s JOIN (%s) r (%s)",
+ sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+ /* Append ON clause */
+ if (joinclauses)
+ appendConditions(sql, root, baserel,
+ plan_o->scan.plan.targetlist,
+ plan_i->scan.plan.targetlist,
+ joinclauses, " ON ", NULL);
+ /* Append WHERE clause */
+ if (otherclauses)
+ appendConditions(sql, root, baserel,
+ plan_o->scan.plan.targetlist,
+ plan_i->scan.plan.targetlist,
+ otherclauses, " WHERE ", NULL);
+}
+
+/*
* deparse remote INSERT statement
*
* The statement text is appended to buf, and we also create an integer List
@@ -1261,6 +1446,8 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
/*
* Deparse given Var node into context->buf.
*
+ * If context has valid innerrel, this is invoked for a join conditions.
+ *
* If the Var belongs to the foreign relation, just print its remote name.
* Otherwise, it's effectively a Param (and will in fact be a Param at
* run time). Handle it the same way we handle plain Params --- see
@@ -1271,39 +1458,46 @@ deparseVar(Var *node, deparse_expr_cxt *context)
{
StringInfo buf = context->buf;
- if (node->varno == context->foreignrel->relid &&
- node->varlevelsup == 0)
+ if (context->foreignrel->reloptkind == RELOPT_JOINREL)
{
- /* Var belongs to foreign table */
- deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ deparseJoinVar(node, context);
}
else
{
- /* Treat like a Param */
- if (context->params_list)
+ if (node->varno == context->foreignrel->relid &&
+ node->varlevelsup == 0)
{
- int pindex = 0;
- ListCell *lc;
-
- /* find its index in params_list */
- foreach(lc, *context->params_list)
+ /* Var belongs to foreign table */
+ deparseColumnRef(buf, node->varno, node->varattno, context->root);
+ }
+ else
+ {
+ /* Treat like a Param */
+ if (context->params_list)
{
- pindex++;
- if (equal(node, (Node *) lfirst(lc)))
- break;
+ int pindex = 0;
+ ListCell *lc;
+
+ /* find its index in params_list */
+ foreach(lc, *context->params_list)
+ {
+ pindex++;
+ if (equal(node, (Node *) lfirst(lc)))
+ break;
+ }
+ if (lc == NULL)
+ {
+ /* not in list, so add it */
+ pindex++;
+ *context->params_list = lappend(*context->params_list, node);
+ }
+
+ printRemoteParam(pindex, node->vartype, node->vartypmod, context);
}
- if (lc == NULL)
+ else
{
- /* not in list, so add it */
- pindex++;
- *context->params_list = lappend(*context->params_list, node);
+ printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
-
- printRemoteParam(pindex, node->vartype, node->vartypmod, context);
- }
- else
- {
- printRemotePlaceholder(node->vartype, node->vartypmod, context);
}
}
}
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..78a3c20 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -35,6 +35,18 @@ CREATE TABLE "S 1"."T 2" (
c2 text,
CONSTRAINT t2_pkey PRIMARY KEY (c1)
);
+CREATE TABLE "S 1"."T 4" (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ CONSTRAINT t4_pkey PRIMARY KEY (c1)
+);
+CREATE TABLE "S 1"."T 5" (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c4 text,
+ CONSTRAINT t5_pkey PRIMARY KEY (c1)
+);
INSERT INTO "S 1"."T 1"
SELECT id,
id % 10,
@@ -49,8 +61,22 @@ INSERT INTO "S 1"."T 2"
SELECT id,
'AAA' || to_char(id, 'FM000')
FROM generate_series(1, 100) id;
+INSERT INTO "S 1"."T 4"
+ SELECT id,
+ id + 1,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 4" WHERE c1 % 2 != 0; -- delete for outer join tests
+INSERT INTO "S 1"."T 5"
+ SELECT id,
+ id + 1,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 5" WHERE c1 % 3 != 0; -- delete for outer join tests
ANALYZE "S 1"."T 1";
ANALYZE "S 1"."T 2";
+ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
-- ===================================================================
-- create foreign tables
-- ===================================================================
@@ -78,6 +104,16 @@ CREATE FOREIGN TABLE ft2 (
c8 user_enum
) SERVER loopback;
ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
+CREATE FOREIGN TABLE ft4 (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
+CREATE FOREIGN TABLE ft5 (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
-- ===================================================================
-- tests for validator
-- ===================================================================
@@ -124,7 +160,9 @@ ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
--------+-------+----------+---------------------------------------+-------------
public | ft1 | loopback | (schema_name 'S 1', table_name 'T 1') |
public | ft2 | loopback | (schema_name 'S 1', table_name 'T 1') |
-(2 rows)
+ public | ft4 | loopback | (schema_name 'S 1', table_name 'T 4') |
+ public | ft5 | loopback | (schema_name 'S 1', table_name 'T 5') |
+(4 rows)
-- Now we should be able to run ANALYZE.
-- To exercise multiple code paths, we use local stats on ft1
@@ -277,22 +315,6 @@ SELECT COUNT(*) FROM ft1 t1;
1000
(1 row)
--- join two tables
-SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
- c1
------
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
-(10 rows)
-
-- subquery
SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
@@ -489,17 +511,12 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo'; -- can't
-- parameterized remote path
EXPLAIN (VERBOSE, COSTS false)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------
- Nested Loop
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- -> Foreign Scan on public.ft2 a
- Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
- -> Foreign Scan on public.ft2 b
- Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
-(8 rows)
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
+ Output: c1, c2, c3, c4, c5, c6, c7, c8, c1, c2, c3, c4, c5, c6, c7, c8
+ Remote SQL: SELECT r.a_0, r.a_1, r.a_2, r.a_3, r.a_4, r.a_5, r.a_6, r.a_7, l.a_0, l.a_1, l.a_2, l.a_3, l.a_4, l.a_5, l.a_6, l.a_7 FROM (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1") l (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) INNER JOIN (SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))) r (a_0, a_1, a_2, a_3, a_4, a_5, a_6, a_7) ON ((r.a_1 = l.a_0))
+(3 rows)
SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
@@ -651,21 +668,306 @@ SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
(4 rows)
-- ===================================================================
+-- JOIN queries
+-- ===================================================================
+-- join two tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c1, c3
+ -> Sort
+ Output: c1, c1, c3
+ Sort Key: c3, c1
+ -> Foreign Scan
+ Output: c1, c1, c3
+ Remote SQL: SELECT r.a_0, r.a_1, l.a_0 FROM (SELECT "C 1", NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1") l (a_0) INNER JOIN (SELECT "C 1", NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1") r (a_0, a_1) ON ((r.a_0 = l.a_0))
+(8 rows)
+
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ c1 | c1
+-----+-----
+ 101 | 101
+ 102 | 102
+ 103 | 103
+ 104 | 104
+ 105 | 105
+ 106 | 106
+ 107 | 107
+ 108 | 108
+ 109 | 109
+ 110 | 110
+(10 rows)
+
+-- join three tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1, t3.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c1, c1, c3
+ -> Sort
+ Output: c1, c1, c1, c3
+ Sort Key: c3, c1
+ -> Foreign Scan
+ Output: c1, c1, c1, c3
+ Remote SQL: SELECT r.a_0, r.a_1, r.a_2, l.a_0 FROM (SELECT c1, NULL, NULL FROM "S 1"."T 4") l (a_0, a_1, a_2) INNER JOIN (SELECT r.a_0, r.a_1, l.a_0 FROM (SELECT "C 1", NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1") l (a_0) INNER JOIN (SELECT "C 1", NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1") r (a_0, a_1) ON ((r.a_0 = l.a_0))) r (a_0, a_1, a_2) ON ((r.a_0 = l.a_0))
+(8 rows)
+
+SELECT t1.c1, t2.c1, t3.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+ c1 | c1 | c1
+----+----+----
+ 22 | 22 | 22
+ 24 | 24 | 24
+ 26 | 26 | 26
+ 28 | 28 | 28
+ 30 | 30 | 30
+ 32 | 32 | 32
+ 34 | 34 | 34
+ 36 | 36 | 36
+ 38 | 38 | 38
+ 40 | 40 | 40
+(10 rows)
+
+-- left outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c1
+ -> Sort
+ Output: c1, c1
+ Sort Key: c1, c1
+ -> Foreign Scan
+ Output: c1, c1
+ Remote SQL: SELECT l.a_0, r.a_0 FROM (SELECT c1, NULL, NULL FROM "S 1"."T 4") l (a_0, a_1, a_2) LEFT JOIN (SELECT c1, NULL, NULL FROM "S 1"."T 5") r (a_0, a_1, a_2) ON ((l.a_0 = r.a_0))
+(8 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1
+----+----
+ 22 |
+ 24 | 24
+ 26 |
+ 28 |
+ 30 | 30
+ 32 |
+ 34 |
+ 36 | 36
+ 38 |
+ 40 |
+(10 rows)
+
+-- right outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c1
+ -> Sort
+ Output: c1, c1
+ Sort Key: c1
+ -> Foreign Scan
+ Output: c1, c1
+ Remote SQL: SELECT l.a_0, r.a_0 FROM (SELECT c1, NULL, NULL FROM "S 1"."T 5") l (a_0, a_1, a_2) LEFT JOIN (SELECT c1, NULL, NULL FROM "S 1"."T 4") r (a_0, a_1, a_2) ON ((r.a_0 = l.a_0))
+(8 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1
+----+----
+ | 33
+ 36 | 36
+ | 39
+ 42 | 42
+ | 45
+ 48 | 48
+ | 51
+ 54 | 54
+ | 57
+ 60 | 60
+(10 rows)
+
+-- full outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: c1, c1
+ Sort Key: c1, c1
+ -> Foreign Scan
+ Output: c1, c1
+ Remote SQL: SELECT l.a_0, r.a_0 FROM (SELECT c1, NULL, NULL FROM "S 1"."T 4") l (a_0, a_1, a_2) FULL JOIN (SELECT c1, NULL, NULL FROM "S 1"."T 5") r (a_0, a_1, a_2) ON ((l.a_0 = r.a_0))
+(6 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+ c1 | c1
+-----+----
+ 2 |
+ 4 |
+ 6 | 6
+ 8 |
+ 10 |
+ 12 | 12
+ 14 |
+ 16 |
+ 18 | 18
+ 20 |
+ 22 |
+ 24 | 24
+ 26 |
+ 28 |
+ 30 | 30
+ 32 |
+ 34 |
+ 36 | 36
+ 38 |
+ 40 |
+ 42 | 42
+ 44 |
+ 46 |
+ 48 | 48
+ 50 |
+ 52 |
+ 54 | 54
+ 56 |
+ 58 |
+ 60 | 60
+ 62 |
+ 64 |
+ 66 | 66
+ 68 |
+ 70 |
+ 72 | 72
+ 74 |
+ 76 |
+ 78 | 78
+ 80 |
+ 82 |
+ 84 | 84
+ 86 |
+ 88 |
+ 90 | 90
+ 92 |
+ 94 |
+ 96 | 96
+ 98 |
+ 100 |
+ | 3
+ | 9
+ | 15
+ | 21
+ | 27
+ | 33
+ | 39
+ | 45
+ | 51
+ | 57
+ | 63
+ | 69
+ | 75
+ | 81
+ | 87
+ | 93
+ | 99
+(67 rows)
+
+-- full outer join + WHERE clause, only matched rows
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: c1, c1
+ Sort Key: c1, c1
+ -> Foreign Scan
+ Output: c1, c1
+ Remote SQL: SELECT l.a_0, r.a_0 FROM (SELECT c1, NULL, NULL FROM "S 1"."T 4") l (a_0, a_1, a_2) FULL JOIN (SELECT c1, NULL, NULL FROM "S 1"."T 5") r (a_0, a_1, a_2) ON ((l.a_0 = r.a_0)) WHERE (((l.a_0 = r.a_0) OR (l.a_0 IS NULL)))
+(6 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1;
+ c1 | c1
+----+----
+ 6 | 6
+ 12 | 12
+ 18 | 18
+ 24 | 24
+ 30 | 30
+ 36 | 36
+ 42 | 42
+ 48 | 48
+ 54 | 54
+ 60 | 60
+ 66 | 66
+ 72 | 72
+ 78 | 78
+ 84 | 84
+ 90 | 90
+ 96 | 96
+ | 3
+ | 9
+ | 15
+ | 21
+ | 27
+ | 33
+ | 39
+ | 45
+ | 51
+ | 57
+ | 63
+ | 69
+ | 75
+ | 81
+ | 87
+ | 93
+ | 99
+(33 rows)
+
+-- join at WHERE clause
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: c1, c1
+ -> Sort
+ Output: c1, c1
+ Sort Key: c1
+ -> Foreign Scan
+ Output: c1, c1
+ Remote SQL: SELECT l.a_0, r.a_0 FROM (SELECT c1, NULL, NULL FROM "S 1"."T 4") l (a_0, a_1, a_2) INNER JOIN (SELECT c1, NULL, NULL FROM "S 1"."T 5") r (a_0, a_1, a_2) ON ((l.a_0 = r.a_0))
+(8 rows)
+
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ c1 | c1
+----+----
+ 66 | 66
+ 72 | 72
+ 78 | 78
+ 84 | 84
+ 90 | 90
+ 96 | 96
+(6 rows)
+
+-- ===================================================================
-- parameterized queries
-- ===================================================================
-- simple join
PREPARE st1(int, int) AS SELECT t1.c3, t2.c3 FROM ft1 t1, ft2 t2 WHERE t1.c1 = $1 AND t2.c1 = $2;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st1(1, 2);
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Nested Loop
Output: t1.c3, t2.c3
-> Foreign Scan on public.ft1 t1
Output: t1.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 1))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 1))
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 2))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" = 2))
(8 rows)
EXECUTE st1(1, 1);
@@ -683,8 +985,8 @@ EXECUTE st1(101, 101);
-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -699,7 +1001,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st2(10, 20);
-> Foreign Scan on public.ft2 t2
Output: t2.c3
Filter: (date(t2.c4) = '01-17-1970'::date)
- Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10))
+ Remote SQL: SELECT NULL, NULL, c3, c4, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10))
(15 rows)
EXECUTE st2(10, 20);
@@ -717,8 +1019,8 @@ EXECUTE st2(101, 121);
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
Sort Key: t1.c1
@@ -732,7 +1034,7 @@ EXPLAIN (VERBOSE, COSTS false) EXECUTE st3(10, 20);
Output: t2.c3
-> Foreign Scan on public.ft2 t2
Output: t2.c3
- Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
+ Remote SQL: SELECT NULL, NULL, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE (("C 1" > 10)) AND ((date(c5) = '1970-01-17'::date))
(14 rows)
EXECUTE st3(10, 20);
@@ -1085,7 +1387,7 @@ INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
Output: ((ft2_1.c1 + 1000)), ((ft2_1.c2 + 100)), ((ft2_1.c3 || ft2_1.c3))
-> Foreign Scan on public.ft2 ft2_1
Output: (ft2_1.c1 + 1000), (ft2_1.c2 + 100), (ft2_1.c3 || ft2_1.c3)
- Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1"
+ Remote SQL: SELECT "C 1", c2, c3, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1"
(9 rows)
INSERT INTO ft2 (c1,c2,c3) SELECT c1+1000,c2+100, c3 || c3 FROM ft2 LIMIT 20;
@@ -1219,7 +1521,7 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, NULL, c8, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -1231,14 +1533,14 @@ UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
EXPLAIN (verbose, costs off)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
- QUERY PLAN
-----------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------
Delete on public.ft2
Output: c1, c4
- Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c4
+ Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", NULL, NULL, c4, NULL, NULL, NULL, NULL
-> Foreign Scan on public.ft2
Output: ctid
- Remote SQL: SELECT ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 5)) FOR UPDATE
(6 rows)
DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
@@ -1360,7 +1662,7 @@ DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
Hash Cond: (ft2.c2 = ft1.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.c2
- Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
+ Remote SQL: SELECT NULL, c2, NULL, NULL, NULL, NULL, NULL, NULL, ctid FROM "S 1"."T 1" FOR UPDATE
-> Hash
Output: ft1.*, ft1.c1
-> Foreign Scan on public.ft1
@@ -2594,12 +2896,12 @@ select c2, count(*) from "S 1"."T 1" where c2 < 500 group by 1 order by 1;
-- Consistent check constraints provide consistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2positive CHECK (c2 >= 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 < 0;
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 < 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 < 0;
@@ -2638,12 +2940,12 @@ ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c2positive;
-- But inconsistent check constraints provide inconsistent results
ALTER FOREIGN TABLE ft1 ADD CONSTRAINT ft1_c2negative CHECK (c2 < 0);
EXPLAIN (VERBOSE, COSTS false) SELECT count(*) FROM ft1 WHERE c2 >= 0;
- QUERY PLAN
---------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------
Aggregate
Output: count(*)
-> Foreign Scan on public.ft1
- Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
+ Remote SQL: SELECT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL FROM "S 1"."T 1" WHERE ((c2 >= 0))
(4 rows)
SELECT count(*) FROM ft1 WHERE c2 >= 0;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..c35cd54 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -48,7 +48,8 @@ PG_MODULE_MAGIC;
/*
* FDW-specific planner information kept in RelOptInfo.fdw_private for a
- * foreign table. This information is collected by postgresGetForeignRelSize.
+ * foreign table or foreign join. This information is collected by
+ * postgresGetForeignRelSize, or calculated from join source relations.
*/
typedef struct PgFdwRelationInfo
{
@@ -78,10 +79,31 @@ typedef struct PgFdwRelationInfo
ForeignTable *table;
ForeignServer *server;
UserMapping *user; /* only set in use_remote_estimate mode */
+ Oid checkAsUser;
} PgFdwRelationInfo;
/*
- * Indexes of FDW-private information stored in fdw_private lists.
+ * Indexes of FDW-private information stored in fdw_private of ForeignPath.
+ * We use fdw_private of a ForeighPath when the path represents a join which
+ * can be pushed down to remote side.
+ *
+ * 1) Outer child path node
+ * 2) Inner child path node
+ * 3) Join type number(as an Integer node)
+ * 4) RestrictInfo list of join conditions
+ */
+enum FdwPathPrivateIndex
+{
+ FdwPathPrivateOuterPath,
+ FdwPathPrivateInnerPath,
+ FdwPathPrivateJoinType,
+ FdwPathPrivateJoinClauses,
+ FdwPathPrivateOtherClauses,
+};
+
+/*
+ * Indexes of FDW-private information stored in fdw_private of ForeignScan of
+ * a simple foreign table scan for a SELECT statement.
*
* We store various information in ForeignScan.fdw_private to pass it from
* planner to executor. Currently we store:
@@ -98,7 +120,11 @@ enum FdwScanPrivateIndex
/* SQL statement to execute remotely (as a String node) */
FdwScanPrivateSelectSql,
/* Integer list of attribute numbers retrieved by the SELECT */
- FdwScanPrivateRetrievedAttrs
+ FdwScanPrivateRetrievedAttrs,
+ /* Integer value of server for the scan */
+ FdwScanPrivateServerOid,
+ /* Integer value of checkAsUser for the scan */
+ FdwScanPrivatecheckAsUser,
};
/*
@@ -129,6 +155,7 @@ enum FdwModifyPrivateIndex
typedef struct PgFdwScanState
{
Relation rel; /* relcache entry for the foreign table */
+ TupleDesc tupdesc; /* tuple descriptor of the scan */
AttInMetadata *attinmeta; /* attribute datatype conversion metadata */
/* extracted fdw_private data */
@@ -288,6 +315,15 @@ static bool postgresAnalyzeForeignTable(Relation relation,
BlockNumber *totalpages);
static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
Oid serverOid);
+static void postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlisti,
+ Relids extra_lateral_rels);
/*
* Helper functions
@@ -324,6 +360,7 @@ static void analyze_row_processor(PGresult *res, int row,
static HeapTuple make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context);
@@ -368,6 +405,9 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
/* Support functions for IMPORT FOREIGN SCHEMA */
routine->ImportForeignSchema = postgresImportForeignSchema;
+ /* Support functions for join push-down */
+ routine->GetForeignJoinPaths = postgresGetForeignJoinPaths;
+
PG_RETURN_POINTER(routine);
}
@@ -385,6 +425,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
{
PgFdwRelationInfo *fpinfo;
ListCell *lc;
+ RangeTblEntry *rte;
/*
* We use PgFdwRelationInfo to pass various information to subsequent
@@ -428,6 +469,13 @@ postgresGetForeignRelSize(PlannerInfo *root,
}
/*
+ * Retrieve RTE to obtain checkAsUser. checkAsUser is used to determine
+ * the user to use to obtain user mapping.
+ */
+ rte = planner_rt_fetch(baserel->relid, root);
+ fpinfo->checkAsUser = rte->checkAsUser;
+
+ /*
* If the table or the server is configured to use remote estimates,
* identify which user to do remote access as during planning. This
* should match what ExecCheckRTEPerms() does. If we fail due to lack of
@@ -435,7 +483,6 @@ postgresGetForeignRelSize(PlannerInfo *root,
*/
if (fpinfo->use_remote_estimate)
{
- RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
Oid userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
@@ -752,6 +799,8 @@ postgresGetForeignPlan(PlannerInfo *root,
List *retrieved_attrs;
StringInfoData sql;
ListCell *lc;
+ List *fdw_ps_tlist = NIL;
+ ForeignScan *scan;
/*
* Separate the scan_clauses into those that can be executed remotely and
@@ -769,7 +818,7 @@ postgresGetForeignPlan(PlannerInfo *root,
* This code must match "extract_actual_clauses(scan_clauses, false)"
* except for the additional decision about remote versus local execution.
* Note however that we only strip the RestrictInfo nodes from the
- * local_exprs list, since appendWhereClause expects a list of
+ * local_exprs list, since appendConditions expects a list of
* RestrictInfos.
*/
foreach(lc, scan_clauses)
@@ -797,64 +846,127 @@ postgresGetForeignPlan(PlannerInfo *root,
* expressions to be sent as parameters.
*/
initStringInfo(&sql);
- deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
- &retrieved_attrs);
- if (remote_conds)
- appendWhereClause(&sql, root, baserel, remote_conds,
- true, ¶ms_list);
-
- /*
- * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
- * initial row fetch, rather than later on as is done for local tables.
- * The extra roundtrips involved in trying to duplicate the local
- * semantics exactly don't seem worthwhile (see also comments for
- * RowMarkType).
- *
- * Note: because we actually run the query as a cursor, this assumes that
- * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
- */
- if (baserel->relid == root->parse->resultRelation &&
- (root->parse->commandType == CMD_UPDATE ||
- root->parse->commandType == CMD_DELETE))
- {
- /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
- appendStringInfoString(&sql, " FOR UPDATE");
- }
- else
+ if (scan_relid > 0)
{
- RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+ deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
+ &retrieved_attrs);
+ if (remote_conds)
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_conds, " WHERE ", ¶ms_list);
- if (rc)
+ /*
+ * Add FOR UPDATE/SHARE if appropriate. We apply locking during the
+ * initial row fetch, rather than later on as is done for local tables.
+ * The extra roundtrips involved in trying to duplicate the local
+ * semantics exactly don't seem worthwhile (see also comments for
+ * RowMarkType).
+ *
+ * Note: because we actually run the query as a cursor, this assumes
+ * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+ * before 8.3.
+ */
+ if (baserel->relid == root->parse->resultRelation &&
+ (root->parse->commandType == CMD_UPDATE ||
+ root->parse->commandType == CMD_DELETE))
{
- /*
- * Relation is specified as a FOR UPDATE/SHARE target, so handle
- * that.
- *
- * For now, just ignore any [NO] KEY specification, since (a) it's
- * not clear what that means for a remote table that we don't have
- * complete information about, and (b) it wouldn't work anyway on
- * older remote servers. Likewise, we don't worry about NOWAIT.
- */
- switch (rc->strength)
+ /* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+ appendStringInfoString(&sql, " FOR UPDATE");
+ }
+ else
+ {
+ RowMarkClause *rc = get_parse_rowmark(root->parse, baserel->relid);
+
+ if (rc)
{
- case LCS_FORKEYSHARE:
- case LCS_FORSHARE:
- appendStringInfoString(&sql, " FOR SHARE");
- break;
- case LCS_FORNOKEYUPDATE:
- case LCS_FORUPDATE:
- appendStringInfoString(&sql, " FOR UPDATE");
- break;
+ /*
+ * Relation is specified as a FOR UPDATE/SHARE target, so handle
+ * that.
+ *
+ * For now, just ignore any [NO] KEY specification, since (a)
+ * it's not clear what that means for a remote table that we
+ * don't have complete information about, and (b) it wouldn't
+ * work anyway on older remote servers. Likewise, we don't
+ * worry about NOWAIT.
+ */
+ switch (rc->strength)
+ {
+ case LCS_FORKEYSHARE:
+ case LCS_FORSHARE:
+ appendStringInfoString(&sql, " FOR SHARE");
+ break;
+ case LCS_FORNOKEYUPDATE:
+ case LCS_FORUPDATE:
+ appendStringInfoString(&sql, " FOR UPDATE");
+ break;
+ }
}
}
}
+ else
+ {
+ /* Join case */
+ Path *path_o;
+ Path *path_i;
+ const char *sql_o;
+ const char *sql_i;
+ ForeignScan *plan_o;
+ ForeignScan *plan_i;
+ JoinType jointype;
+ List *joinclauses;
+ List *otherclauses;
+ int i;
+
+ /*
+ * Retrieve infomation from fdw_private.
+ */
+ path_o = list_nth(best_path->fdw_private, FdwPathPrivateOuterPath);
+ path_i = list_nth(best_path->fdw_private, FdwPathPrivateInnerPath);
+ jointype = intVal(list_nth(best_path->fdw_private,
+ FdwPathPrivateJoinType));
+ joinclauses = list_nth(best_path->fdw_private,
+ FdwPathPrivateJoinClauses);
+ otherclauses = list_nth(best_path->fdw_private,
+ FdwPathPrivateOtherClauses);
+
+ /*
+ * Construct remote query from bottom to the top. ForeignScan plan
+ * node of underlying scans are node necessary for execute the plan
+ * tree, but it is handy to construct remote query recursively.
+ */
+ plan_o = (ForeignScan *) create_plan_recurse(root, path_o);
+ Assert(IsA(plan_o, ForeignScan));
+ sql_o = strVal(list_nth(plan_o->fdw_private, FdwScanPrivateSelectSql));
+
+ plan_i = (ForeignScan *) create_plan_recurse(root, path_i);
+ Assert(IsA(plan_i, ForeignScan));
+ sql_i = strVal(list_nth(plan_i->fdw_private, FdwScanPrivateSelectSql));
+
+ deparseJoinSql(&sql, root, baserel, path_o, path_i, plan_o, plan_i,
+ sql_o, sql_i, jointype, joinclauses, otherclauses,
+ &fdw_ps_tlist);
+ retrieved_attrs = NIL;
+ for (i = 0; i < list_length(fdw_ps_tlist); i++)
+ retrieved_attrs = lappend_int(retrieved_attrs, i + 1);
+ }
/*
* Build the fdw_private list that will be available to the executor.
* Items in the list must match enum FdwScanPrivateIndex, above.
*/
- fdw_private = list_make2(makeString(sql.data),
- retrieved_attrs);
+ fdw_private = list_make2(makeString(sql.data), retrieved_attrs);
+
+ /*
+ * In pseudo scan case such as join push-down, add OID of server and
+ * checkAsUser as extra information.
+ * XXX: passing serverid and checkAsUser might simplify code through
+ * all cases, simple scans and join push-down.
+ */
+ if (scan_relid == 0)
+ {
+ fdw_private = lappend(fdw_private,
+ makeInteger(fpinfo->server->serverid));
+ fdw_private = lappend(fdw_private, makeInteger(fpinfo->checkAsUser));
+ }
/*
* Create the ForeignScan node from target list, local filtering
@@ -864,11 +976,18 @@ postgresGetForeignPlan(PlannerInfo *root,
* field of the finished plan node; we can't keep them in private state
* because then they wouldn't be subject to later planner processing.
*/
- return make_foreignscan(tlist,
+ scan = make_foreignscan(tlist,
local_exprs,
scan_relid,
params_list,
fdw_private);
+
+ /*
+ * set fdw_ps_tlist to handle tuples generated by this scan.
+ */
+ scan->fdw_ps_tlist = fdw_ps_tlist;
+
+ return scan;
}
/*
@@ -881,9 +1000,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
PgFdwScanState *fsstate;
- RangeTblEntry *rte;
+ Oid serverid;
Oid userid;
- ForeignTable *table;
ForeignServer *server;
UserMapping *user;
int numParams;
@@ -903,22 +1021,51 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
node->fdw_state = (void *) fsstate;
/*
- * Identify which user to do the remote access as. This should match what
- * ExecCheckRTEPerms() does.
+ * Initialize fsstate.
+ *
+ * These values should be determined.
+ * - fsstate->rel, NULL if no actual relation
+ * - serverid, OID of forign server to use for the scan
+ * - userid, searching user mapping
*/
- rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+ if (fsplan->scan.scanrelid > 0)
+ {
+ /* Simple foreign table scan */
+ RangeTblEntry *rte;
+ ForeignTable *table;
- /* Get info about foreign table. */
- fsstate->rel = node->ss.ss_currentRelation;
- table = GetForeignTable(RelationGetRelid(fsstate->rel));
- server = GetForeignServer(table->serverid);
- user = GetUserMapping(userid, server->serverid);
+ /*
+ * Identify which user to do the remote access as. This should match
+ * what ExecCheckRTEPerms() does.
+ */
+ rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
+ userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
+
+ /* Get info about foreign table. */
+ fsstate->rel = node->ss.ss_currentRelation;
+ table = GetForeignTable(RelationGetRelid(fsstate->rel));
+ serverid = table->serverid;
+ }
+ else
+ {
+ Oid checkAsUser;
+
+ /* Join */
+ fsstate->rel = NULL; /* No actual relation to scan */
+
+ serverid = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivateServerOid));
+ checkAsUser = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivatecheckAsUser));
+ userid = checkAsUser ? checkAsUser : GetUserId();
+ }
/*
* Get connection to the foreign server. Connection manager will
* establish new connection if necessary.
*/
+ server = GetForeignServer(serverid);
+ user = GetUserMapping(userid, server->serverid);
fsstate->conn = GetConnection(server, user, false);
/* Assign a unique ID for my cursor */
@@ -929,7 +1076,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
fsstate->query = strVal(list_nth(fsplan->fdw_private,
FdwScanPrivateSelectSql));
fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
- FdwScanPrivateRetrievedAttrs);
+ FdwScanPrivateRetrievedAttrs);
/* Create contexts for batches of tuples and per-tuple temp workspace. */
fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -944,7 +1091,11 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
ALLOCSET_SMALL_MAXSIZE);
/* Get info we'll need for input data conversion. */
- fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
+ if (fsplan->scan.scanrelid > 0)
+ fsstate->tupdesc = RelationGetDescr(fsstate->rel);
+ else
+ fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
/* Prepare for output conversion of parameters used in remote query. */
numParams = list_length(fsplan->fdw_exprs);
@@ -1747,11 +1898,13 @@ estimate_path_cost_size(PlannerInfo *root,
deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
&retrieved_attrs);
if (fpinfo->remote_conds)
- appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
- true, NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ fpinfo->remote_conds, " WHERE ", NULL);
if (remote_join_conds)
- appendWhereClause(&sql, root, baserel, remote_join_conds,
- (fpinfo->remote_conds == NIL), NULL);
+ appendConditions(&sql, root, baserel, NULL, NULL,
+ remote_join_conds,
+ fpinfo->remote_conds == NIL ? " WHERE " : " AND ",
+ NULL);
/* Get the remote estimate */
conn = GetConnection(fpinfo->server, fpinfo->user, false);
@@ -2052,6 +2205,7 @@ fetch_more_data(ForeignScanState *node)
fsstate->tuples[i] =
make_tuple_from_result_row(res, i,
fsstate->rel,
+ fsstate->tupdesc,
fsstate->attinmeta,
fsstate->retrieved_attrs,
fsstate->temp_cxt);
@@ -2270,6 +2424,7 @@ store_returning_result(PgFdwModifyState *fmstate,
newtup = make_tuple_from_result_row(res, 0,
fmstate->rel,
+ RelationGetDescr(fmstate->rel),
fmstate->attinmeta,
fmstate->retrieved_attrs,
fmstate->temp_cxt);
@@ -2562,6 +2717,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
astate->rows[pos] = make_tuple_from_result_row(res, row,
astate->rel,
+ RelationGetDescr(astate->rel),
astate->attinmeta,
astate->retrieved_attrs,
astate->temp_cxt);
@@ -2835,6 +2991,246 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
}
/*
+ * Construct PgFdwRelationInfo from two join sources
+ */
+static PgFdwRelationInfo *
+merge_fpinfo(PgFdwRelationInfo *fpinfo_o,
+ PgFdwRelationInfo *fpinfo_i,
+ JoinType jointype)
+{
+ PgFdwRelationInfo *fpinfo;
+
+ fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+ copyObject(fpinfo_i->remote_conds));
+ fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+ copyObject(fpinfo_i->local_conds));
+
+ fpinfo->attrs_used = NULL; /* Use fdw_ps_tlist */
+ fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+ fpinfo_i->local_conds_cost.startup;
+ fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+ fpinfo_i->local_conds_cost.per_tuple;
+ fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+ fpinfo_i->local_conds_sel;
+ if (jointype == JOIN_INNER)
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ else
+ fpinfo->rows = Max(fpinfo_o->rows, fpinfo_i->rows);
+ fpinfo->rows = Min(fpinfo_o->rows, fpinfo_i->rows);
+ /* XXX we should consider only columns in fdw_ps_tlist */
+ fpinfo->width = fpinfo_o->width + fpinfo_i->width;
+ /* XXX we should estimate better costs */
+
+ fpinfo->use_remote_estimate = false; /* Never use in join case */
+ fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+
+ fpinfo->startup_cost = fpinfo->fdw_startup_cost;
+ fpinfo->total_cost =
+ fpinfo->startup_cost + fpinfo->fdw_tuple_cost * fpinfo->rows;
+
+ fpinfo->table = NULL; /* always NULL in join case */
+ fpinfo->server = fpinfo_o->server;
+ fpinfo->user = fpinfo_o->user ? fpinfo_o->user : fpinfo_i->user;
+ /* checkAsuser must be identical */
+ fpinfo->checkAsUser = fpinfo_o->checkAsUser;
+
+ return fpinfo;
+}
+
+/*
+ * postgresGetForeignJoinPaths
+ * Add possible ForeignPath to joinrel.
+ *
+ * Joins satify conditions below can be pushed down to remote PostgreSQL server.
+ *
+ * 1) Join type is inner or outer
+ * 2) Join conditions consist of remote-safe expressions.
+ * 3) Join source relations don't have any local filter.
+ */
+static void
+postgresGetForeignJoinPaths(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ SemiAntiJoinFactors *semifactors,
+ List *restrictlist,
+ Relids extra_lateral_rels)
+{
+ ForeignPath *joinpath;
+ ForeignPath *path_o = (ForeignPath *) outerrel->cheapest_total_path;
+ ForeignPath *path_i = (ForeignPath *) innerrel->cheapest_total_path;
+ PgFdwRelationInfo *fpinfo_o;
+ PgFdwRelationInfo *fpinfo_i;
+ PgFdwRelationInfo *fpinfo;
+ double rows;
+ Cost startup_cost;
+ Cost total_cost;
+ ListCell *lc;
+ List *fdw_private;
+ List *joinclauses;
+ List *otherclauses;
+
+ /*
+ * Both outer and inner path should be ForeignPath.
+ * If either of underlying relations is a SubqueryScan or something,
+ * RelOptInfo.fdw_handler of such relation is set to InvalidOid to indicate
+ * that the node can't be handled by FDW.
+ */
+ Assert(IsA(path_o, ForeignPath) && IsA(path_i, ForeignPath));
+
+ joinclauses = restrictlist;
+ if (IS_OUTER_JOIN(jointype))
+ {
+ extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
+ }
+ else
+ {
+ joinclauses = extract_actual_clauses(joinclauses, false);
+ otherclauses = NIL;
+ }
+
+ /*
+ * Currently we don't push-down joins in query for UPDATE/DELETE. This
+ * restriction might be relaxed in a later release.
+ */
+ if (root->parse->commandType != CMD_SELECT)
+ {
+ ereport(DEBUG3, (errmsg("command type is not SELECT")));
+ return;
+ }
+
+ /*
+ * Skip considering reversed combination of inner join.
+ * Other kinds of join have different meaning when outer and inner are
+ * reversed.
+ */
+ if (jointype == JOIN_INNER && outerrel->relid < innerrel->relid)
+ {
+ ereport(DEBUG3, (errmsg("reversed combination of INNER JOIN")));
+ return;
+ }
+
+ /*
+ * Both relations in the join must belong to same server.
+ */
+ fpinfo_o = path_o->path.parent->fdw_private;
+ fpinfo_i = path_i->path.parent->fdw_private;
+ if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ {
+ ereport(DEBUG3, (errmsg("server unmatch")));
+ return;
+ }
+
+ /*
+ * We support all outer joins in addition to inner join.
+ */
+ if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ {
+ ereport(DEBUG3, (errmsg("unsupported join type (SEMI, ANTI)")));
+ return;
+ }
+
+ /*
+ * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ * with empty joinclauses. Pushing down CROSS JOIN produces more result
+ * than retrieving each tables separately, so we don't push down such joins.
+ */
+ if (jointype == JOIN_INNER && joinclauses == NIL)
+ {
+ ereport(DEBUG3, (errmsg("unsupported join type (CROSS)")));
+ return;
+ }
+
+ /*
+ * Neither source relation can have local conditions. This can be relaxed
+ * if the join is an inner join and local conditions don't contain volatile
+ * function/operator, but as of now we leave it as future enhancement.
+ */
+ if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+ {
+ ereport(DEBUG3, (errmsg("join with local filter is not supported")));
+ return;
+ }
+
+ /*
+ * Join condition must be safe to push down.
+ */
+ foreach(lc, joinclauses)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (!is_foreign_expr(root, joinrel, expr))
+ {
+ ereport(DEBUG3, (errmsg("one of join conditions is not safe to push-down")));
+ return;
+ }
+ }
+
+ /*
+ * Other condition evaluated on remote side must be safe to push down.
+ */
+ foreach(lc, otherclauses)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (!is_foreign_expr(root, joinrel, expr))
+ {
+ ereport(DEBUG3, (errmsg("one of filter conditions is not safe to push-down")));
+ return;
+ }
+ }
+
+ /*
+ * checkAsUser of source pathes should match.
+ */
+ if (fpinfo_o->checkAsUser != fpinfo_i->checkAsUser)
+ {
+ ereport(DEBUG3, (errmsg("unmatch checkAsUser")));
+ return;
+ }
+
+ /* Here we know that this join can be pushed-down to remote side. */
+
+ /* Construct fpinfo for the join relation */
+ fpinfo = merge_fpinfo(fpinfo_o, fpinfo_i, jointype);
+ joinrel->fdw_private = fpinfo;
+
+ /* TODO determine cost and rows of the join. */
+ rows = fpinfo->rows;
+ startup_cost = fpinfo->startup_cost;
+ total_cost = fpinfo->total_cost;
+
+ fdw_private = list_make4(path_o,
+ path_i,
+ makeInteger(jointype),
+ joinclauses);
+ fdw_private = lappend(fdw_private, otherclauses);
+
+ /*
+ * Create a new join path and add it to the joinrel which represents a join
+ * between foreign tables.
+ */
+ joinpath = create_foreignscan_path(root,
+ joinrel,
+ rows,
+ startup_cost,
+ total_cost,
+ NIL, /* no pathkeys */
+ NULL, /* no required_outer */
+ fdw_private);
+
+ /* Add generated path into joinrel by add_path(). */
+ add_path(joinrel, (Path *) joinpath);
+
+ /* TODO consider parameterized paths */
+}
+
+/*
* Create a tuple from the specified row of the PGresult.
*
* rel is the local representation of the foreign table, attinmeta is
@@ -2846,12 +3242,12 @@ static HeapTuple
make_tuple_from_result_row(PGresult *res,
int row,
Relation rel,
+ TupleDesc tupdesc,
AttInMetadata *attinmeta,
List *retrieved_attrs,
MemoryContext temp_context)
{
HeapTuple tuple;
- TupleDesc tupdesc = RelationGetDescr(rel);
Datum *values;
bool *nulls;
ItemPointer ctid = NULL;
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 950c6f7..3e0c7fa 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -16,6 +16,7 @@
#include "foreign/foreign.h"
#include "lib/stringinfo.h"
#include "nodes/relation.h"
+#include "nodes/plannodes.h"
#include "utils/relcache.h"
#include "libpq-fe.h"
@@ -52,12 +53,27 @@ extern void deparseSelectSql(StringInfo buf,
RelOptInfo *baserel,
Bitmapset *attrs_used,
List **retrieved_attrs);
-extern void appendWhereClause(StringInfo buf,
+extern void appendConditions(StringInfo buf,
PlannerInfo *root,
RelOptInfo *baserel,
+ List *outertlist,
+ List *innertlist,
List *exprs,
- bool is_first,
+ const char *prefix,
List **params);
+extern void deparseJoinSql(StringInfo sql,
+ PlannerInfo *root,
+ RelOptInfo *baserel,
+ Path *path_o,
+ Path *path_i,
+ ForeignScan *plan_o,
+ ForeignScan *plan_i,
+ const char *sql_o,
+ const char *sql_i,
+ JoinType jointype,
+ List *joinclauses,
+ List *otherclauses,
+ List **retrieved_attrs);
extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
List *targetAttrs, List *returningList,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83e8fa7..db95e4e 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -39,6 +39,18 @@ CREATE TABLE "S 1"."T 2" (
c2 text,
CONSTRAINT t2_pkey PRIMARY KEY (c1)
);
+CREATE TABLE "S 1"."T 4" (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text,
+ CONSTRAINT t4_pkey PRIMARY KEY (c1)
+);
+CREATE TABLE "S 1"."T 5" (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c4 text,
+ CONSTRAINT t5_pkey PRIMARY KEY (c1)
+);
INSERT INTO "S 1"."T 1"
SELECT id,
@@ -54,9 +66,23 @@ INSERT INTO "S 1"."T 2"
SELECT id,
'AAA' || to_char(id, 'FM000')
FROM generate_series(1, 100) id;
+INSERT INTO "S 1"."T 4"
+ SELECT id,
+ id + 1,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 4" WHERE c1 % 2 != 0; -- delete for outer join tests
+INSERT INTO "S 1"."T 5"
+ SELECT id,
+ id + 1,
+ 'AAA' || to_char(id, 'FM000')
+ FROM generate_series(1, 100) id;
+DELETE FROM "S 1"."T 5" WHERE c1 % 3 != 0; -- delete for outer join tests
ANALYZE "S 1"."T 1";
ANALYZE "S 1"."T 2";
+ANALYZE "S 1"."T 4";
+ANALYZE "S 1"."T 5";
-- ===================================================================
-- create foreign tables
@@ -87,6 +113,18 @@ CREATE FOREIGN TABLE ft2 (
) SERVER loopback;
ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
+CREATE FOREIGN TABLE ft4 (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
+
+CREATE FOREIGN TABLE ft5 (
+ c1 int NOT NULL,
+ c2 int NOT NULL,
+ c3 text
+) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 5');
+
-- ===================================================================
-- tests for validator
-- ===================================================================
@@ -158,8 +196,6 @@ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
-- aggregate
SELECT COUNT(*) FROM ft1 t1;
--- join two tables
-SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-- subquery
SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
-- subquery+MAX
@@ -216,6 +252,38 @@ SELECT * FROM ft1 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft2 WHERE c1 < 5));
SELECT * FROM ft2 WHERE c1 = ANY (ARRAY(SELECT c1 FROM ft1 WHERE c1 < 5));
-- ===================================================================
+-- JOIN queries
+-- ===================================================================
+-- join two tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+-- join three tables
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1, t3.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1, t3.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+-- left outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+-- right outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 RIGHT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t2.c1 OFFSET 10 LIMIT 10;
+-- full outer join
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+-- full outer join + WHERE clause, only matched rows
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1;
+SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1;
+-- join at WHERE clause
+EXPLAIN (COSTS false, VERBOSE)
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+
+-- ===================================================================
-- parameterized queries
-- ===================================================================
-- simple join
Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?
A fundamental reason why we need to have symbolic aliases here is that
postgres_fdw has remote query in cstring form. It makes implementation
complicated to deconstruct/construct a query that is once constructed
on the underlying foreign-path level.
If ForeignScan keeps items to construct remote query in expression node
form (and construction of remote query is delayed to beginning of the
executor, probably), we will be able to construct more human readable
remote query.
However, I don't recommend to work on this great refactoring stuff
within the scope of join push-down support project.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru Hanada
Sent: Thursday, March 05, 2015 10:00 PM
To: Ashutosh Bapat
Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi Ashutosh, thanks for the review.
2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
1588
required_outer);
Now, that the same function is being used for creating foreign scan paths
for joins, we should be calling get_joinrel_parampathinfo() on a join rel
and get_baserel_parampathinfo() on base rel.Got it. Please let me check the difference.
The patch seems to handle all the restriction clauses in the same way. There
are two kinds of restriction clauses - a. join quals (specified using ON
clause; optimizer might move them to the other class if that doesn't affect
correctness) and b. quals on join relation (specified in the WHERE clause,
optimizer might move them to the other class if that doesn't affect
correctness). The quals in "a" are applied while the join is being computed
whereas those in "b" are applied after the join is computed. For example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)The difference between these two kinds is evident in case of outer joins,
for inner join optimizer puts all of them in class "b". The remote query
sent to the foreign server has all those in ON clause. Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
--------Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
l.a_1))
(3 rows)The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)which should match the result obtained by substituting local tables for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)Once we start distinguishing the two kinds of quals, there is some
optimization possible. For pushing down a join it's essential that all the
quals in "a" are safe to be pushed down. But a join can be pushed down, even
if quals in "a" are not safe to be pushed down. But more clauses one pushed
down to foreign server, lesser are the rows fetched from the foreign server.
In postgresGetForeignJoinPath, instead of checking all the restriction
clauses to be safe to be pushed down, we need to check only those which are
join quals (class "a").The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses. This is similar to
what create_nestloop_plan and siblings do.Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
ft1.val + ft1.val2 = ft2.val;QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROM public.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1) ON ((
(r.a_0 + r.a_1) = l.a_0))Output just specified val + val2, it doesn't tell, where those val and val2
come from, neither it's evident from the rest of the context.Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?--
Shigeru HANADA--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Kaigai-san, Hanada-san,
Attached please find a patch to print the column names prefixed by the
relation names. I haven't tested the patch fully. The same changes will be
needed for CustomPlan node specific code.
Now I am able to make sense out of the Output information
postgres=# explain verbose select * from ft1 join ft2 using (val);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=12)
Output: ft1.val, ft1.val2, ft2.val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_1 FROM (SELECT val, val2 FROM
public.lt) l (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r
(a_0, a_1)
ON ((r.a_0 = l.a_0))
(3 rows)
On Fri, Mar 6, 2015 at 6:41 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?A fundamental reason why we need to have symbolic aliases here is that
postgres_fdw has remote query in cstring form. It makes implementation
complicated to deconstruct/construct a query that is once constructed
on the underlying foreign-path level.
If ForeignScan keeps items to construct remote query in expression node
form (and construction of remote query is delayed to beginning of the
executor, probably), we will be able to construct more human readable
remote query.However, I don't recommend to work on this great refactoring stuff
within the scope of join push-down support project.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru Hanada
Sent: Thursday, March 05, 2015 10:00 PM
To: Ashutosh Bapat
Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi Ashutosh, thanks for the review.
2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com>:
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info = get_baserel_parampathinfo(root,rel,
1588
required_outer);
Now, that the same function is being used for creating foreign scanpaths
for joins, we should be calling get_joinrel_parampathinfo() on a join
rel
and get_baserel_parampathinfo() on base rel.
Got it. Please let me check the difference.
The patch seems to handle all the restriction clauses in the same way.
There
are two kinds of restriction clauses - a. join quals (specified using
ON
clause; optimizer might move them to the other class if that doesn't
affect
correctness) and b. quals on join relation (specified in the WHERE
clause,
optimizer might move them to the other class if that doesn't affect
correctness). The quals in "a" are applied while the join is beingcomputed
whereas those in "b" are applied after the join is computed. For
example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)The difference between these two kinds is evident in case of outer
joins,
for inner join optimizer puts all of them in class "b". The remote
query
sent to the foreign server has all those in ON clause. Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')postgres=# explain verbose select * from ft1 left join ft2 on
(ft1.val2 =
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val,val2 FROM
public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt)
r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND
((r.a_1 =
l.a_1))
(3 rows)The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2)where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)which should match the result obtained by substituting local tables for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)Once we start distinguishing the two kinds of quals, there is some
optimization possible. For pushing down a join it's essential that allthe
quals in "a" are safe to be pushed down. But a join can be pushed
down, even
if quals in "a" are not safe to be pushed down. But more clauses one
pushed
down to foreign server, lesser are the rows fetched from the foreign
server.
In postgresGetForeignJoinPath, instead of checking all the restriction
clauses to be safe to be pushed down, we need to check only thosewhich are
join quals (class "a").
The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses. This is similar to
what create_nestloop_plan and siblings do.Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2where
ft1.val + ft1.val2 = ft2.val;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
--------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROMpublic.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1)
ON ((
(r.a_0 + r.a_1) = l.a_0))
Output just specified val + val2, it doesn't tell, where those val and
val2
come from, neither it's evident from the rest of the context.
Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?--
Shigeru HANADA--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
explain_relnames.patchtext/x-patch; charset=US-ASCII; name=explain_relnames.patchDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9281874..840890e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -723,29 +723,37 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_SeqScan:
case T_IndexScan:
case T_IndexOnlyScan:
case T_BitmapHeapScan:
case T_TidScan:
case T_SubqueryScan:
case T_FunctionScan:
case T_ValuesScan:
case T_CteScan:
case T_WorkTableScan:
- case T_ForeignScan:
case T_CustomScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
case T_ModifyTable:
*rels_used = bms_add_member(*rels_used,
((ModifyTable *) plan)->nominalRelation);
break;
+ case T_ForeignScan:
+ {
+ ForeignScan *foreign_scan = (ForeignScan *)plan;
+ if (foreign_scan->scan.scanrelid != 0)
+ *rels_used = bms_add_member(*rels_used, foreign_scan->scan.scanrelid);
+ else
+ *rels_used = bms_add_members(*rels_used, foreign_scan->relids);
+ }
+ break;
default:
break;
}
/* initPlan-s */
if (planstate->initPlan)
ExplainPreScanSubPlans(planstate->initPlan, rels_used);
/* lefttree */
if (outerPlanState(planstate))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index b4b2dc5..0a9bf37 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2008,20 +2008,23 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
foreach (lc, scan_plan->fdw_ps_tlist)
{
TargetEntry *tle = lfirst(lc);
if (tle->resjunk)
found_resjunk = true;
else if (found_resjunk)
elog(ERROR, "junk TLE should not apper prior to valid one");
}
+
+ /* Set the relids that are represented by this foreign scan for Explain */
+ scan_plan->relids = best_path->path.parent->relids;
}
/* Copy cost data from Path to Plan; no need to make FDW do this */
copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
/* Track FDW server-id; no need to make FDW do this */
scan_plan->fdw_handler = rel->fdw_handler;
/*
* Replace any outer-relation variables with nestloop params in the qual
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 213034b..ba278de 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -484,20 +484,23 @@ typedef struct WorkTableScan
* ----------------
*/
typedef struct ForeignScan
{
Scan scan;
Oid fdw_handler; /* OID of FDW handler */
List *fdw_exprs; /* expressions that FDW may evaluate */
List *fdw_ps_tlist; /* optional pseudo-scan tlist for FDW */
List *fdw_private; /* private data for FDW */
bool fsSystemCol; /* true if any "system column" is needed */
+ Bitmapset *relids; /* When scan.scanrelid is 0, the list of
+ * relations represented by this node
+ */
} ForeignScan;
/* ----------------
* CustomScan node
*
* The comments for ForeignScan's fdw_exprs, fdw_varmap and fdw_private fields
* apply equally to custom_exprs, custom_ps_tlist and custom_private.
* Note that since Plan trees can be copied, custom scan providers *must*
* fit all plan data they need into those fields; embedding CustomScan in
* a larger struct will not work.
Hi Ashutosh,
Thanks for finding out what we oversight.
Here is still a problem because the new 'relids' field is not updated
on setrefs.c (scanrelid is incremented by rtoffset here).
It is easy to shift the bitmapset by rtoffset, however, I also would
like to see another approach.
My idea adds 'List *fdw_sub_paths' field in ForeignPath to inform
planner underlying foreign-scan paths (with scanrelid > 0).
The create_foreignscan_plan() will call create_plan_recurse() to
construct plan nodes based on the path nodes being attached.
Even though these foreign-scan nodes are not actually executed,
setrefs.c can update scanrelid in usual way and ExplainPreScanNode
does not need to take exceptional handling on Foreign/CustomScan
nodes.
In addition, it allows to keep information about underlying foreign
table scan, even if planner will need some other information in the
future version (not only relids).
How about your thought?
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ashutosh Bapat
Sent: Friday, March 06, 2015 7:26 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Shigeru Hanada; Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi Kaigai-san, Hanada-san,
Attached please find a patch to print the column names prefixed by the relation
names. I haven't tested the patch fully. The same changes will be needed for
CustomPlan node specific code.Now I am able to make sense out of the Output information
postgres=# explain verbose select * from ft1 join ft2 using (val);
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
-----------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=12)
Output: ft1.val, ft1.val2, ft2.val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_1 FROM (SELECT val, val2 FROM public.lt)
l (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1)
ON ((r.a_0 = l.a_0))
(3 rows)On Fri, Mar 6, 2015 at 6:41 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?A fundamental reason why we need to have symbolic aliases here is that
postgres_fdw has remote query in cstring form. It makes implementation
complicated to deconstruct/construct a query that is once constructed
on the underlying foreign-path level.
If ForeignScan keeps items to construct remote query in expression node
form (and construction of remote query is delayed to beginning of the
executor, probably), we will be able to construct more human readable
remote query.However, I don't recommend to work on this great refactoring stuff
within the scope of join push-down support project.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of ShigeruHanada
Sent: Thursday, March 05, 2015 10:00 PM
To: Ashutosh Bapat
Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi Ashutosh, thanks for the review.
2015-03-04 19:17 GMT+09:00 Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com>:
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info =get_baserel_parampathinfo(root, rel,
1588
required_outer);
Now, that the same function is being used for creating foreign scanpaths
for joins, we should be calling get_joinrel_parampathinfo() on a join
rel
and get_baserel_parampathinfo() on base rel.
Got it. Please let me check the difference.
The patch seems to handle all the restriction clauses in the same
way. There
are two kinds of restriction clauses - a. join quals (specified using
ON
clause; optimizer might move them to the other class if that doesn't
affect
correctness) and b. quals on join relation (specified in the WHERE
clause,
optimizer might move them to the other class if that doesn't affect
correctness). The quals in "a" are applied while the join is beingcomputed
whereas those in "b" are applied after the join is computed. For
example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)postgres=# select * from lt left join lt2 on (true) where (lt.val2
=
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)The difference between these two kinds is evident in case of outer
joins,
for inner join optimizer puts all of them in class "b". The remote
query
sent to the foreign server has all those in ON clause. Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2
=
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val,val2 FROM
public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt)
r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND
((r.a_1 =
l.a_1))
(3 rows)The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2)where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)which should match the result obtained by substituting local tables
for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2)where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)Once we start distinguishing the two kinds of quals, there is some
optimization possible. For pushing down a join it's essential thatall the
quals in "a" are safe to be pushed down. But a join can be pushed
down, even
if quals in "a" are not safe to be pushed down. But more clauses one
pushed
down to foreign server, lesser are the rows fetched from the foreign
server.
In postgresGetForeignJoinPath, instead of checking all the
restriction
clauses to be safe to be pushed down, we need to check only those
which are
join quals (class "a").
The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses. This is similar to
what create_nestloop_plan and siblings do.Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2where
ft1.val + ft1.val2 = ft2.val;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
--------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROMpublic.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1)
ON ((
(r.a_0 + r.a_1) = l.a_0))
Output just specified val + val2, it doesn't tell, where those val
and val2
come from, neither it's evident from the rest of the context.
Actually val and val2 come from public.lt in "r" side, but as you say
it's too difficult to know that from EXPLAIN output. Do you have any
idea to make the "Output" item more readable?--
Shigeru HANADA--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 9, 2015 at 5:46 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Hi Ashutosh,
Thanks for finding out what we oversight.
Here is still a problem because the new 'relids' field is not updated
on setrefs.c (scanrelid is incremented by rtoffset here).
It is easy to shift the bitmapset by rtoffset, however, I also would
like to see another approach.
I just made it work for explain, but other parts still need work. Sorry
about that. If we follow INDEX_VAR, we should be able to get there.
My idea adds 'List *fdw_sub_paths' field in ForeignPath to inform
planner underlying foreign-scan paths (with scanrelid > 0).
The create_foreignscan_plan() will call create_plan_recurse() to
construct plan nodes based on the path nodes being attached.
Even though these foreign-scan nodes are not actually executed,
setrefs.c can update scanrelid in usual way and ExplainPreScanNode
does not need to take exceptional handling on Foreign/CustomScan
nodes.
In addition, it allows to keep information about underlying foreign
table scan, even if planner will need some other information in the
future version (not only relids).How about your thought?
I am not sure about keeping planner nodes, which are not turned into
execution nodes. There's no precedence for that in current code. It could
be risky.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ashutosh Bapat
Sent: Friday, March 06, 2015 7:26 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Shigeru Hanada; Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi Kaigai-san, Hanada-san,
Attached please find a patch to print the column names prefixed by the
relation
names. I haven't tested the patch fully. The same changes will be needed
for
CustomPlan node specific code.
Now I am able to make sense out of the Output information
postgres=# explain verbose select * from ft1 join ft2 using (val);
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
-----------------------
Foreign Scan (cost=100.00..125.60 rows=2560 width=12)
Output: ft1.val, ft1.val2, ft2.val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_1 FROM (SELECT val, val2 FROMpublic.lt)
l (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r (a_0, a_1)
ON ((r.a_0 = l.a_0))
(3 rows)On Fri, Mar 6, 2015 at 6:41 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
wrote:
Actually val and val2 come from public.lt in "r" side, but as
you say
it's too difficult to know that from EXPLAIN output. Do you
have any
idea to make the "Output" item more readable?
A fundamental reason why we need to have symbolic aliases here is
that
postgres_fdw has remote query in cstring form. It makes
implementation
complicated to deconstruct/construct a query that is once
constructed
on the underlying foreign-path level.
If ForeignScan keeps items to construct remote query in expressionnode
form (and construction of remote query is delayed to beginning of
the
executor, probably), we will be able to construct more human
readable
remote query.
However, I don't recommend to work on this great refactoring stuff
within the scope of join push-down support project.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of ShigeruHanada
Sent: Thursday, March 05, 2015 10:00 PM
To: Ashutosh Bapat
Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
Subject: Re: [HACKERS] Join push-down support for foreign tablesHi Ashutosh, thanks for the review.
2015-03-04 19:17 GMT+09:00 Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com>:
In create_foreignscan_path() we have lines like -
1587 pathnode->path.param_info =get_baserel_parampathinfo(root, rel,
1588
required_outer);
Now, that the same function is being used for creating foreignscan
paths
for joins, we should be calling get_joinrel_parampathinfo() on
a join
rel
and get_baserel_parampathinfo() on base rel.
Got it. Please let me check the difference.
The patch seems to handle all the restriction clauses in the
same
way. There
are two kinds of restriction clauses - a. join quals
(specified using
ON
clause; optimizer might move them to the other class if that
doesn't
affect
correctness) and b. quals on join relation (specified in the
WHERE
clause,
optimizer might move them to the other class if that doesn't
affect
correctness). The quals in "a" are applied while the join is
being
computed
whereas those in "b" are applied after the join is computed.
For
example,
postgres=# select * from lt;
val | val2
-----+------
1 | 2
1 | 3
(2 rows)postgres=# select * from lt2;
val | val2
-----+------
1 | 2
(1 row)postgres=# select * from lt left join lt2 on (lt.val2 =
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
1 | 3 | |
(2 rows)postgres=# select * from lt left join lt2 on (true) where
(lt.val2
=
lt2.val2);
val | val2 | val | val2
-----+------+-----+------
1 | 2 | 1 | 2
(1 row)The difference between these two kinds is evident in case of
outer
joins,
for inner join optimizer puts all of them in class "b". The
remote
query
sent to the foreign server has all those in ON clause.
Consider foreign
tables ft1 and ft2 pointing to local tables on the same server.
postgres=# \d ft1
Foreign table "public.ft1"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt')postgres=# \d ft2
Foreign table "public.ft2"
Column | Type | Modifiers | FDW Options
--------+---------+-----------+-------------
val | integer | |
val2 | integer | |
Server: loopback
FDW Options: (table_name 'lt2')postgres=# explain verbose select * from ft1 left join ft2 on
(ft1.val2
=
ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is
null;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
Foreign Scan (cost=100.00..125.60 rows=2560 width=16)
Output: val, val2, val, val2
Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECTval,
val2 FROM
public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM
public.lt)
r (a
_0, a_1) ON ((((r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL)))
AND
((r.a_1 =
l.a_1))
(3 rows)The result is then wrong
postgres=# select * from ft1 left join ft2 on (ft1.val2 =ft2.val2)
where
ft1.val + ft2.val > ft1.val2 or ft2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 2 | |
1 | 3 | |
(2 rows)which should match the result obtained by substituting local
tables
for
foreign ones
postgres=# select * from lt left join lt2 on (lt.val2 =lt2.val2)
where
lt.val + lt2.val > lt.val2 or lt2.val is null;
val | val2 | val | val2
-----+------+-----+------
1 | 3 | |
(1 row)Once we start distinguishing the two kinds of quals, there is
some
optimization possible. For pushing down a join it's essential
that
all the
quals in "a" are safe to be pushed down. But a join can be
pushed
down, even
if quals in "a" are not safe to be pushed down. But more
clauses one
pushed
down to foreign server, lesser are the rows fetched from the
foreign
server.
In postgresGetForeignJoinPath, instead of checking all the
restriction
clauses to be safe to be pushed down, we need to check only
those
which are
join quals (class "a").
The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() toseparate
it into two lists, join_quals and other clauses. This is
similar to
what create_nestloop_plan and siblings do.
Following EXPLAIN output seems to be confusing
ft1 and ft2 both are pointing to same lt on a foreign server.
postgres=# explain verbose select ft1.val + ft1.val2 from ft1,ft2
where
ft1.val + ft1.val2 = ft2.val;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
--------------------------
Foreign Scan (cost=100.00..132.00 rows=2560 width=8)
Output: (val + val2)
Remote SQL: SELECT r.a_0, r.a_1 FROM (SELECT val, NULL FROMpublic.lt) l
(a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r
(a_0, a_1)
ON ((
(r.a_0 + r.a_1) = l.a_0))
Output just specified val + val2, it doesn't tell, where those
val
and val2
come from, neither it's evident from the rest of the context.
Actually val and val2 come from public.lt in "r" side, but as
you say
it's too difficult to know that from EXPLAIN output. Do you
have any
idea to make the "Output" item more readable?
--
Shigeru HANADA--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Thanks for finding out what we oversight.
Here is still a problem because the new 'relids' field is not updated
on setrefs.c (scanrelid is incremented by rtoffset here).
It is easy to shift the bitmapset by rtoffset, however, I also would
like to see another approach.I just made it work for explain, but other parts still need work. Sorry about
that. If we follow INDEX_VAR, we should be able to get there.
I tried to modify your patch a bit as below:
* add adjustment of bitmap fields on setrefs.c
* add support on outfuncs.c and copyfuncs.c.
* add bms_shift_members() in bitmapset.c
I think it is a reasonable enhancement, however, it is not tested with
real-life code, like postgres_fdw.
Hanada-san, could you add a feature to print name of foreign-tables
which are involved in remote queries, on postgresExplainForeignScan()?
ForeignScan->fdw_relids bitmap and ExplainState->rtable_names will
tell you the joined foreign tables replaced by the (pseudo) foreign-scan.
Soon, I'll update the interface patch also.
My idea adds 'List *fdw_sub_paths' field in ForeignPath to inform
planner underlying foreign-scan paths (with scanrelid > 0).
The create_foreignscan_plan() will call create_plan_recurse() to
construct plan nodes based on the path nodes being attached.
Even though these foreign-scan nodes are not actually executed,
setrefs.c can update scanrelid in usual way and ExplainPreScanNode
does not need to take exceptional handling on Foreign/CustomScan
nodes.
In addition, it allows to keep information about underlying foreign
table scan, even if planner will need some other information in the
future version (not only relids).How about your thought?
I am not sure about keeping planner nodes, which are not turned into execution
nodes. There's no precedence for that in current code. It could be risky.
Indeed, it is a fair enough opinion. At this moment, no other code makes plan
node but shall not be executed actually.
Please forget above idea.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
add_fdw_custom_relids.patchapplication/octet-stream; name=add_fdw_custom_relids.patchDownload
src/backend/commands/explain.c | 10 ++++--
src/backend/nodes/bitmapset.c | 57 +++++++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 2 ++
src/backend/nodes/outfuncs.c | 2 ++
src/backend/optimizer/plan/createplan.c | 4 +++
src/backend/optimizer/plan/setrefs.c | 8 +++++
src/include/nodes/bitmapset.h | 1 +
src/include/nodes/plannodes.h | 4 +++
8 files changed, 86 insertions(+), 2 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9281874..8892dca 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -730,11 +730,17 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
case T_ValuesScan:
case T_CteScan:
case T_WorkTableScan:
- case T_ForeignScan:
- case T_CustomScan:
*rels_used = bms_add_member(*rels_used,
((Scan *) plan)->scanrelid);
break;
+ case T_ForeignScan:
+ *rels_used = bms_add_members(*rels_used,
+ ((ForeignScan *) plan)->fdw_relids);
+ break;
+ case T_CustomScan:
+ *rels_used = bms_add_members(*rels_used,
+ ((CustomScan *) plan)->custom_relids);
+ break;
case T_ModifyTable:
*rels_used = bms_add_member(*rels_used,
((ModifyTable *) plan)->nominalRelation);
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index a9c3b4b..4dc3286 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -301,6 +301,63 @@ bms_difference(const Bitmapset *a, const Bitmapset *b)
}
/*
+ * bms_shift_members - move all the bits by shift
+ */
+Bitmapset *
+bms_shift_members(const Bitmapset *a, int shift)
+{
+ Bitmapset *b;
+ bitmapword h_word;
+ bitmapword l_word;
+ int nwords;
+ int w_shift;
+ int b_shift;
+ int i, j;
+
+ /* fast path if result shall be NULL obviously */
+ if (a == NULL || a->nwords * BITS_PER_BITMAPWORD + shift <= 0)
+ return NULL;
+ /* actually, not shift members */
+ if (shift == 0)
+ return bms_copy(a);
+
+ nwords = (a->nwords * BITS_PER_BITMAPWORD + shift +
+ BITS_PER_BITMAPWORD - 1) / BITS_PER_BITMAPWORD;
+ b = palloc(BITMAPSET_SIZE(nwords));
+ b->nwords = nwords;
+
+ if (shift > 0)
+ {
+ /* Left shift */
+ w_shift = WORDNUM(shift);
+ b_shift = BITNUM(shift);
+
+ for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+ {
+ h_word = (j >= 0 && j < a->nwords ? a->words[j] : 0);
+ l_word = (j-1 >= 0 && j-1 < a->nwords ? a->words[j-1] : 0);
+ b->words[i] = ((h_word << b_shift) |
+ (l_word >> (BITS_PER_BITMAPWORD - b_shift)));
+ }
+ }
+ else
+ {
+ /* Right shift */
+ w_shift = WORDNUM(-shift);
+ b_shift = BITNUM(-shift);
+
+ for (i=0, j=-w_shift; i < b->nwords; i++, j++)
+ {
+ h_word = (j+1 >= 0 && j+1 < a->nwords ? a->words[j+1] : 0);
+ l_word = (j >= 0 && j < a->nwords ? a->words[j] : 0);
+ b->words[i] = ((h_word >> (BITS_PER_BITMAPWORD - b_shift)) |
+ (l_word << b_shift));
+ }
+ }
+ return b;
+}
+
+/*
* bms_is_subset - is A a subset of B?
*/
bool
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9300b70..7c85943 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -596,6 +596,7 @@ _copyForeignScan(const ForeignScan *from)
COPY_NODE_FIELD(fdw_exprs);
COPY_NODE_FIELD(fdw_ps_tlist);
COPY_NODE_FIELD(fdw_private);
+ COPY_BITMAPSET_FIELD(fdw_relids);
COPY_SCALAR_FIELD(fsSystemCol);
return newnode;
@@ -621,6 +622,7 @@ _copyCustomScan(const CustomScan *from)
COPY_NODE_FIELD(custom_exprs);
COPY_NODE_FIELD(custom_ps_tlist);
COPY_NODE_FIELD(custom_private);
+ COPY_BITMAPSET_FIELD(custom_relids);
/*
* NOTE: The method field of CustomScan is required to be a pointer to a
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f3676ec..edeee7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -562,6 +562,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
WRITE_NODE_FIELD(fdw_exprs);
WRITE_NODE_FIELD(fdw_ps_tlist);
WRITE_NODE_FIELD(fdw_private);
+ WRITE_BITMAPSET_FIELD(fdw_relids);
WRITE_BOOL_FIELD(fsSystemCol);
}
@@ -576,6 +577,7 @@ _outCustomScan(StringInfo str, const CustomScan *node)
WRITE_NODE_FIELD(custom_exprs);
WRITE_NODE_FIELD(custom_ps_tlist);
WRITE_NODE_FIELD(custom_private);
+ WRITE_BITMAPSET_FIELD(custom_relids);
appendStringInfoString(str, " :methods ");
_outToken(str, node->methods->CustomName);
if (node->methods->TextOutCustomScan)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 7a37824..514fcd9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2013,6 +2013,8 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
elog(ERROR, "junk TLE should not apper prior to valid one");
}
}
+ /* Set the relids that are represented by this foreign scan for Explain */
+ scan_plan->fdw_relids = best_path->path.parent->relids;
/* Copy cost data from Path to Plan; no need to make FDW do this */
copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
@@ -2119,6 +2121,8 @@ create_customscan_plan(PlannerInfo *root, CustomPath *best_path,
elog(ERROR, "junk TLE should not apper prior to valid one");
}
}
+ /* Set the relids that are represented by this custom scan for Explain */
+ cplan->custom_relids = best_path->path.parent->relids;
/*
* Copy cost data from Path to Plan; no need to make custom-plan providers
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a41c4f0..2961f44 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -568,6 +568,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
{
ForeignScan *splan = (ForeignScan *) plan;
+ if (rtoffset > 0)
+ splan->fdw_relids =
+ bms_shift_members(splan->fdw_relids, rtoffset);
+
if (splan->scan.scanrelid == 0)
{
indexed_tlist *pscan_itlist =
@@ -610,6 +614,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
{
CustomScan *splan = (CustomScan *) plan;
+ if (rtoffset > 0)
+ splan->custom_relids =
+ bms_shift_members(splan->custom_relids, rtoffset);
+
if (splan->scan.scanrelid == 0)
{
indexed_tlist *pscan_itlist =
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 3a556ee..3ca9791 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -66,6 +66,7 @@ extern void bms_free(Bitmapset *a);
extern Bitmapset *bms_union(const Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_intersect(const Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_difference(const Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_shift_members(const Bitmapset *a, int shift);
extern bool bms_is_subset(const Bitmapset *a, const Bitmapset *b);
extern BMS_Comparison bms_subset_compare(const Bitmapset *a, const Bitmapset *b);
extern bool bms_is_member(int x, const Bitmapset *a);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 213034b..0f1e94c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -490,6 +490,8 @@ typedef struct ForeignScan
List *fdw_exprs; /* expressions that FDW may evaluate */
List *fdw_ps_tlist; /* optional pseudo-scan tlist for FDW */
List *fdw_private; /* private data for FDW */
+ Bitmapset *fdw_relids; /* set of relid (index of range-tables)
+ * represented by this node */
bool fsSystemCol; /* true if any "system column" is needed */
} ForeignScan;
@@ -523,6 +525,8 @@ typedef struct CustomScan
List *custom_exprs; /* expressions that custom code may evaluate */
List *custom_ps_tlist;/* optional pseudo-scan target list */
List *custom_private; /* private data for custom code */
+ Bitmapset *custom_relids; /* set of relid (index of range-tables)
+ * represented by this node */
const CustomScanMethods *methods;
} CustomScan;
Hi Hanada-san,
I noticed that the patch doesn't have any tests for testing FDW join in
postgres_fdw. While you are updating the patch, can you please add few
tests for the same. I will suggest adding tests for a combination of these
dimensions
1. Types of joins
2. Joins between multiple foreign and local tables together, to test
whether we are pushing maximum of join tree with mixed tables.
3. Join/Where conditions with un/safe-to-push expressions
4. Queries with sorting/aggregation on top of join to test working of
setref.
5. Joins between foreign tables on different foreign servers (to check that
those do not get accidently pushed down).
I have attached a file with some example queries on those lines.
On Tue, Mar 10, 2015 at 8:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Thanks for finding out what we oversight.
Here is still a problem because the new 'relids' field is notupdated
on setrefs.c (scanrelid is incremented by rtoffset here).
It is easy to shift the bitmapset by rtoffset, however, I alsowould
like to see another approach.
I just made it work for explain, but other parts still need work. Sorry
about
that. If we follow INDEX_VAR, we should be able to get there.
I tried to modify your patch a bit as below:
* add adjustment of bitmap fields on setrefs.c
* add support on outfuncs.c and copyfuncs.c.
* add bms_shift_members() in bitmapset.cI think it is a reasonable enhancement, however, it is not tested with
real-life code, like postgres_fdw.Hanada-san, could you add a feature to print name of foreign-tables
which are involved in remote queries, on postgresExplainForeignScan()?
ForeignScan->fdw_relids bitmap and ExplainState->rtable_names will
tell you the joined foreign tables replaced by the (pseudo) foreign-scan.Soon, I'll update the interface patch also.
My idea adds 'List *fdw_sub_paths' field in ForeignPath to inform
planner underlying foreign-scan paths (with scanrelid > 0).
The create_foreignscan_plan() will call create_plan_recurse() to
construct plan nodes based on the path nodes being attached.
Even though these foreign-scan nodes are not actually executed,
setrefs.c can update scanrelid in usual way and ExplainPreScanNode
does not need to take exceptional handling on Foreign/CustomScan
nodes.
In addition, it allows to keep information about underlying foreign
table scan, even if planner will need some other information in the
future version (not only relids).How about your thought?
I am not sure about keeping planner nodes, which are not turned into
execution
nodes. There's no precedence for that in current code. It could be risky.
Indeed, it is a fair enough opinion. At this moment, no other code makes
plan
node but shall not be executed actually.
Please forget above idea.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
On Wed, Mar 4, 2015 at 4:26 AM, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
Here is v4 patch of Join push-down support for foreign tables. This
patch requires Custom/Foreign join patch v7 posted by Kaigai-san.
Hi,
I just want to point out to the folks on this thread that the action
in this area is happening on the other thread, about the
custom/foreign join patch, and that Tom and I are suspecting that we
do not have the right design here. Your input is needed.
From my end, I am quite skeptical about the way
postgresGetForeignJoinPath in this patch works. It looks only at the
cheapest total path of the relations to be joined, which seems like it
could easily be wrong. What about some other path that is more
expensive but provides a convenient sort order? What about something
like A LEFT JOIN (B JOIN C ON B.x = C.x) ON A.y = B.y AND A.z = C.z,
which can't make a legal join until level 3? Tom's proposed hook
placement would instead invoke the FDW once per joinrel, passing root
and the joinrel. Then, you could cost a path based on the idea of
pushing that join entirely to the remote side, or exit without doing
anything if pushdown is not feasible.
Please read the other thread and then respond either there or here
with thoughts on that design. If you don't provide some input on
this, both of these patches are going to get rejected as lacking
consensus, and we'll move on to other things. I'd really rather not
ship yet another release without this important feature, but that's
where we're heading if we can't talk this through.
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 16, 2015 at 9:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 4, 2015 at 4:26 AM, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
Here is v4 patch of Join push-down support for foreign tables. This
patch requires Custom/Foreign join patch v7 posted by Kaigai-san.Hi,
I just want to point out to the folks on this thread that the action
in this area is happening on the other thread, about the
custom/foreign join patch, and that Tom and I are suspecting that we
do not have the right design here. Your input is needed.[...]
Moved for now to the next CF as it was in state "Need review".
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers