postgres_fdw bug in 9.6

Started by Jeff Janesabout 9 years ago128 messages

Jeff Janes

jeff.janes@gmail.com

about 9 years ago

I have a setup where a 9.6.1 server uses postgres_fdw to connect to a 9.4.9
hot standby server.

I have a DML statement which triggers the error:

ERROR: XX000: outer pathkeys do not match mergeclauses
LOCATION: create_mergejoin_plan, createplan.c:3722

The error first starts appearing with this commit (on the local side):

commit aa09cd242fa7e3a694a31f8aed521e80d1e626a4
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Mar 9 10:51:49 2016 -0500

postgres_fdw: Consider foreign joining and foreign sorting together.

The version of the remote side does not seem to matter. I've also promoted
a test instance of the remote from hot standby to master and then upgraded
to 9.6.1, and neither step fixes the issue.

The statement is like this:

explain update foo_local set col3=foo_remote.col3 from foo_remote where
foo_local.id=foo_remote.id and foo_local.id in ('aaa','bbb','ccc','ddd');

Where foo_remote is a pretty complicated view (defined locally) over the
join of 8 foreign tables.

I am having trouble producing a self-contained, disclosable test case for
this. Small changes causes the error to go away. On the local side, it
doesn't seem to depend on the contents of the table, only the structure.
But on the remote side, truncating the central table for the query makes
the error go away.

Any tips on investigating this further in situ? Or is the best option just
to work harder on a minimal and disclosable test case?

Cheers,

Jeff

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Jeff Janes (#1)

Re: postgres_fdw bug in 9.6

Jeff Janes <jeff.janes@gmail.com> writes:

I have a DML statement which triggers the error:
ERROR: XX000: outer pathkeys do not match mergeclauses
LOCATION: create_mergejoin_plan, createplan.c:3722

Hmm.

Any tips on investigating this further in situ? Or is the best option just
to work harder on a minimal and disclosable test case?

I think we need a test case --- not minimal necessarily, but something
other people can reproduce. You might find that setting enable_hashjoin
and/or enable_nestloop to false makes it easier to provoke the error,
since evidently this requires that we (a) generate a faulty mergejoin Path
and then (b) choose it as the cheapest one, since the error occurs while
converting it to a Plan.

BTW, if you're not doing this in a debug (--enable-cassert) build, it'd
be useful to try it in one. I'm a little suspicious that the root cause
might be a memory-stomp type of problem, ie somebody scribbling on a
pathkey data structure without accounting for it being shared with another
path. It's possible that cassert memory checking would help catch that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Tom Lane (#2)

Re: postgres_fdw bug in 9.6

On Thu, Dec 8, 2016 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

I have a DML statement which triggers the error:
ERROR: XX000: outer pathkeys do not match mergeclauses
LOCATION: create_mergejoin_plan, createplan.c:3722

Hmm.

Any tips on investigating this further in situ? Or is the best option just
to work harder on a minimal and disclosable test case?

I think we need a test case --- not minimal necessarily, but something
other people can reproduce. You might find that setting enable_hashjoin
and/or enable_nestloop to false makes it easier to provoke the error,
since evidently this requires that we (a) generate a faulty mergejoin Path
and then (b) choose it as the cheapest one, since the error occurs while
converting it to a Plan.

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Robert Haas (#3)

Re: postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

I'll be astonished if that's sufficient evidence. We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight. But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jeff Janes

jeff.janes@gmail.com

about 9 years ago

In reply to: Tom Lane (#4)

Re: postgres_fdw bug in 9.6

On Thu, Dec 8, 2016 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

I'll be astonished if that's sufficient evidence. We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight. But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.

I didn't get an asserts on the assert-enabled build.

I have a test case where I made the fdw connect back to itself, and
stripped out all the objects that I could and still reproduce the case. It
is large, 21MB compressed, 163MB uncompressed, so I am linking it here:

https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

When running this against and default configured 9.6.1 or 10devel-a73491e,
I get the error.

psql -U postgres -f <(xzcat combined_anon.sql.xz)

...

ERROR: outer pathkeys do not match mergeclauses
STATEMENT: explain update local1 set local19 = jj_join.local19 from
jj_join where local1.local12=jj_join.local12 and local1.local12 in
('ddd','bbb','aaa','abc');

Cheers,

Jeff

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Jeff Janes (#5)

Re: postgres_fdw bug in 9.6

Jeff Janes <jeff.janes@gmail.com> writes:

I have a test case where I made the fdw connect back to itself, and
stripped out all the objects that I could and still reproduce the case. It
is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

Thanks for the test case. I believe I understand the fundamental problem,
or one of the fundamental problems --- I'm not exactly convinced there
aren't several here.

The key issue is that GetExistingLocalJoinPath believes it can return
any random join path as what to use for the fdw_outerpath of a foreign
join. You can get away with that, perhaps, as long as you only consider
two-way joins. But in a nest of foreign joins, this fails to consider
the interrelationships of join paths and their children. In particular,
what we have happening in this example is that GetExistingLocalJoinPath
seizes randomly on a MergePath for an upper join relation, clones it,
sees that the outer child is a join ForeignPath, and replaces that outer
child with its fdw_outerpath ... which was chosen equally cavalierly by
some previous execution of GetExistingLocalJoinPath, and does not have
the sort ordering expected by the MergePath. So we'd generate an invalid
EPQ plan, except that the debug cross-check in create_mergejoin_plan
notices the discrepancy.

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not. Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins. If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

* Just skip anything else. We don't know if corresponding
* plan would build the output row from whole-row references
* of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

We've probably managed to not notice this so far because foreign joins
generally ought to dominate any local join method, so that there wouldn't
often be cases where the surviving paths use local joins for input
sub-joins. But Jeff's test case proves it can happen.

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins. However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill. We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply. At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join. So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top". This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it. But what we've got here is busted enough that I'm not sure there
are good alternatives. Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

BTW, what's "existing" about the result of GetExistingLocalJoinPath?
And for that matter, what's "outer" about the content of fdw_outerpath?
Good luck figuring that out from the documentation of the field, all
zero words of it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Tom Lane (#6)

Re: postgres_fdw bug in 9.6

On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

I have a test case where I made the fdw connect back to itself, and
stripped out all the objects that I could and still reproduce the case. It
is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc

Thanks for the test case. I believe I understand the fundamental problem,
or one of the fundamental problems --- I'm not exactly convinced there
aren't several here.

The key issue is that GetExistingLocalJoinPath believes it can return
any random join path as what to use for the fdw_outerpath of a foreign
join. You can get away with that, perhaps, as long as you only consider
two-way joins. But in a nest of foreign joins, this fails to consider
the interrelationships of join paths and their children. In particular,
what we have happening in this example is that GetExistingLocalJoinPath
seizes randomly on a MergePath for an upper join relation, clones it,
sees that the outer child is a join ForeignPath, and replaces that outer
child with its fdw_outerpath ... which was chosen equally cavalierly by
some previous execution of GetExistingLocalJoinPath, and does not have
the sort ordering expected by the MergePath. So we'd generate an invalid
EPQ plan, except that the debug cross-check in create_mergejoin_plan
notices the discrepancy.

Thanks for the detailed explanation.

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not. Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins. If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths. Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

* Just skip anything else. We don't know if corresponding
* plan would build the output row from whole-row references
* of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

Similarly, here we assume that any custom plan would tell us whether
the given row survives EPQ recheck. As long as a custom plan doing
that correctly, it shouldn't be a problem.

We've probably managed to not notice this so far because foreign joins
generally ought to dominate any local join method, so that there wouldn't
often be cases where the surviving paths use local joins for input
sub-joins. But Jeff's test case proves it can happen.

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins. However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill. We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply. At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join. So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top". This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

(Hm, that does sound a lot like "throw it away and start again", doesn't
it. But what we've got here is busted enough that I'm not sure there
are good alternatives. Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from GetExistingLocalJoinPath().
postgres_fdw wouldn't push down the join in that case and we will
loose some optimization, but that might be better than throwing an
error.

I agree that a join tree consisting of nested loop joins would be more
efficient in this case.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Tom Lane (#6)

Re: postgres_fdw bug in 9.6

On Mon, Dec 12, 2016 at 5:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill. We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply. At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join. So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top". This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it. But what we've got here is busted enough that I'm not sure there
are good alternatives. Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

I think there was an earlier version of the patch that actually built
up NestLoop joins as it went, but at some point (possibly at my
suggestion) it got rewritten to try to fish out existing paths
instead. Apparently, the other idea was better.

BTW, what's "existing" about the result of GetExistingLocalJoinPath?

It's an existing path - i.e. already added to the RelOptInfo - to
perform the join locally.

And for that matter, what's "outer" about the content of fdw_outerpath?

It's got the same meaning here that "outer path" does in general.
Right now, a Foreign Scan only has an outer subpath if it needs one
for EPQ rechecks, but in theory I suppose it could be used for
something else; every Plan has a lefttree, whether or not we choose to
populate it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#7)

Re: postgres_fdw bug in 9.6

On 2016/12/13 23:13, Ashutosh Bapat wrote:

On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not. Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins. If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths.

I think Ashutosh is right.

Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

* Just skip anything else. We don't know if corresponding
* plan would build the output row from whole-row references
* of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

I wrote the first version of GetExistingLocalJoinPath (and
postgresRecheckForeignScan), to verify that the changes to core for
pushing down foreign/custom joins in 9.5 would work well for EPQ
rechecks. And I rejected custom paths in that version because I
intended to use that function not only for foreign joins but custom
joins. (IIRC, for 9.6, I proposed to put GetExistingLocalJoinPath in a
more common area such as src/backend/optimizer/path/joinpath.c, but that
was ignored...)

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins. However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill. We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply. At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join. So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top". This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

I'm not able to understand that, either.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it. But what we've got here is busted enough that I'm not sure there
are good alternatives. Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

An alternative I was thinking was (1) to create an equivalent nestloop
join path for each foreign/custom join path and store it in
fdw_outerpath as-is, except when that join path implements a full join,
in which case a merge or hash join path is created, and (2) to apply
postgresRecheckForeignScan to the outer subplan created from the
fdw_outerpath when executing EPQ rechecks. So, I was thinking to
provide a helper function that creates the equivalent local join path
for a given foreign/custom join path in #1.

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from GetExistingLocalJoinPath().
postgres_fdw wouldn't push down the join in that case and we will
loose some optimization, but that might be better than throwing an
error.

Seems reasonable.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Etsuro Fujita (#9)

Re: postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/13 23:13, Ashutosh Bapat wrote:

On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill. We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

I'm not able to understand that, either.

Yeah, you'd have to reproduce the outer join structure if any.

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath.

Seems reasonable.

No, because GetExistingLocalJoinPath is called once per relation not once
per path. Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Tom Lane (#10)

Re: postgres_fdw bug in 9.6

On 2016/12/16 1:39, Tom Lane wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/13 23:13, Ashutosh Bapat wrote:

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath.

Seems reasonable.

No, because GetExistingLocalJoinPath is called once per relation not once
per path. Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.

Hmm, I agree on that point that giving it up might result in a bad plan.

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is. What do you think about that?

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Etsuro Fujita (#11)

Re: postgres_fdw bug in 9.6

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is. What do you think about
that?

Let me explain about #1 and #2 a bit more. What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
(a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
(b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
(c) Likewise for higher level foreign joins.
(d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.

I think that that would need special handling for foreign joins with
sort orders; add an explicit sort to the local join paths, if needed.

I am thinking to provide a helper function that creates a local join
path for (a), (b), and (c).

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Etsuro Fujita (#12)

Re: postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is. What do you think about
that?

Let me explain about #1 and #2 a bit more. What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
(a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
(b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
(c) Likewise for higher level foreign joins.
(d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.

Hm, isn't this overcomplicated? As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway. All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places. So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair. Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

The important point here is that we avoid using a merge join because that
has assumptions about input ordering that likely won't be satisfied by
the child paths chosen through this method. (I guess you could fall back
to it for the case of no quals in a fulljoin, because then the ordering
assumptions are vacuous anyway.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Tom Lane (#13)

Re: postgres_fdw bug in 9.6

On 2016/12/17 1:13, Tom Lane wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is. What do you think about
that?

Let me explain about #1 and #2 a bit more. What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
(a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
(b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
(c) Likewise for higher level foreign joins.
(d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.

Hm, isn't this overcomplicated? As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway. All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places.

Exactly. I thought the EPQ trees without lower foreign joins would be
better because that would be easier to understand.

So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.

Seems reasonable.

Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan. If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck). So, I
think we would need to add an explicit sort to the fdw_outerpath for the
foreign join.

The important point here is that we avoid using a merge join because that
has assumptions about input ordering that likely won't be satisfied by
the child paths chosen through this method. (I guess you could fall back
to it for the case of no quals in a fulljoin, because then the ordering
assumptions are vacuous anyway.)

I agree on that point. I'll create a patch.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Tom Lane (#13)

Re: postgres_fdw bug in 9.6

On Fri, Dec 16, 2016 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is. What do you think about
that?

Let me explain about #1 and #2 a bit more. What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
(a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
(b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
(c) Likewise for higher level foreign joins.
(d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.

Hm, isn't this overcomplicated? As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway. All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places. So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair. Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

We could use fdw_outerpath of the left and right inputs instead of
looking for the cheapest paths. For base relations as left or right
relations, use the unparameterized cheapest foreign path. This will
ensure that all joins in fdw_outerpath are local joins, making EPQ
checks slightly efficient by avoiding redirection at every foreign
path.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#15)

Re: postgres_fdw bug in 9.6

On 2016/12/19 13:59, Ashutosh Bapat wrote:

On Fri, Dec 16, 2016 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/16 11:25, Etsuro Fujita wrote:

As I said upthread, an alternative I am thinking is (1) to create an
equivalent nestloop join path using inner/outer paths of a foreign join
path, except when that join path implements a full join, in which case a
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer
subplan created from the fdw_outerpath as-is. What do you think about
that?

Let me explain about #1 and #2 a bit more. What I have in mind is:

* modify postgresGetForeignPaths so that a simple foreign table scan
path is stored into the base relation's PgFdwRelationInfo.
* modify postgresGetForeignJoinPaths so that
(a) a local join path for a 2-way foreign join is created using
simple foreign table scan paths stored in the base relations'
PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo.
(b) a local join path for a 3-way foreign join, whose left input is
a 2-way foreign join, is created using a local join path stored in the
left input join relation's PgFdwRelationInfo and a simple foreign table
scan path stored into the right input base relation's PgFdwRelationInfo.
(c) Likewise for higher level foreign joins.
(d) local join paths created are passed to create_foreignscan_path
and stored into the fdw_outerpaths of the resulting ForeignPahts.

Hm, isn't this overcomplicated? As you said earlier, we don't really
care all that much whether the fdw_outerpath is free of lower foreign
joins, because EPQ setup will select those lower join's substitute EPQ
plans anyway. All that we need is that the EPQ tree be a legal
implementation of the join order with join quals applied at the right
places. So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair. Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

We could use fdw_outerpath of the left and right inputs instead of
looking for the cheapest paths. For base relations as left or right
relations, use the unparameterized cheapest foreign path. This will
ensure that all joins in fdw_outerpath are local joins, making EPQ
checks slightly efficient by avoiding redirection at every foreign
path.

That seems close to what I had in mind described above. One additional
work required for that would to store the fdw_outerpath into the
RelOptInfo's fdw_private such as PgFdwRelationInfo before add_path for
the foreign-join path containing the fdw_outerpath, because add_path
might reject the foreign-join path. I think the additional work would
make that rather complicated.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Etsuro Fujita (#14)

Re: postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/17 1:13, Tom Lane wrote:

So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.

Seems reasonable.

Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan. If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck).

How so? We only recheck one row at a time, therefore it can be claimed to
have any sort order you care about.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Tom Lane (#17)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2016/12/20 0:37, Tom Lane wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/17 1:13, Tom Lane wrote:

So I think the rule could be

"When first asked to produce a path for a given foreign joinrel, collect
the cheapest paths for its left and right inputs, and make a nestloop path
(or hashjoin path, if full join) from those, using the join quals needed
for the current input relation pair.

Seems reasonable.

Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan. If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck).

How so? We only recheck one row at a time, therefore it can be claimed to
have any sort order you care about.

I'll have second thoughts about that. I agree with you except for that,
so I've created a patch; I removed GetExistingLocalJoinPath and added a
helper function, CreateLocalJoinPath, that generates a local join path
for a given foreign join, as described above. Please find attached a patch.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin.patchtext/x-patch; name=epqpath-for-foreignjoin.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1519,1540 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1563,1584 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1557,1572 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1608,1629 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1596,1611 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1652,1673 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1634,1649 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1684,1689 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1660,1726 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4016,4033 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4053,4068 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4159,4176 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4194,4209 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4365,4374 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4365,4375 ----
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 445,450 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 445,455 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1007 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
!      which case a foreign data wrapper may build the local path by itself or
!      may choose not to create access paths for that join.)
      </para>
     </sect2>
  
--- 995,1008 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used as the alternative local
!      join plan.  <literal>CreateLocalJoinPath</> builds a nested loop join
!      path for the specified join relation, except when the join type is
!      <literal>FULL</>, in which case a merge or hash join path is built.
!      (If it does not build such a path, it returns NULL, in which case a
!      foreign data wrapper may build the local path by itself or may choose
!      not to create access paths for that join.)
      </para>
     </sect2>
  
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,29 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 693,805 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
   *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
! 
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
! 
! 	foreach(lc, joinrel->pathlist)
! 	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
! 
! 		/* Skip parameterised paths. */
! 		if (path->param_info != NULL)
! 			continue;
! 
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
! 
! 			case T_NestLoop:
! 				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
! 				}
! 				break;
  
! 			case T_MergeJoin:
! 				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
! 				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
  
  		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
  		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
  		{
! 			ForeignPath *foreign_path;
  
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
  		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
  		{
! 			ForeignPath *foreign_path;
  
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
  		}
  
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 695,806 ----
  	return oid;
  }
  
+ 
  /*
!  * Generate a local join path for a foreign join
   *
!  * Note: currently, this function only supports unparameterized foreign joins.
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	JoinCostWorkspace workspace;
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
! 	Path	   *result;
  
! 	/* Should only be called in these cases. */
! 	Assert(jointype == JOIN_INNER ||
! 		   jointype == JOIN_LEFT ||
! 		   jointype == JOIN_FULL ||
! 		   jointype == JOIN_SEMI ||
! 		   jointype == JOIN_ANTI);
  
! 	/* Give up if the cheapest-total-cost paths are parameterized. */
! 	if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
! 		!bms_is_empty(PATH_REQ_OUTER(inner_path)))
! 		return NULL;
  
+ 	/*
+ 	 * If the jointype is JOIN_FULL, try to create a hashjoin join path from
+ 	 * the cheapest-total-cost paths; otherwise, create a nestloop join path.
+ 	 */
+ 	if (jointype == JOIN_FULL)
+ 	{
  		/*
! 		 * Special corner case: for "x FULL JOIN y ON true", there will be no
! 		 * join clauses at all.  Note that mergejoin is our only join type
! 		 * that supports FULL JOIN without any join clauses, and in that case
! 		 * it doesn't require the input paths to be well ordered, so generate
! 		 * a clauseless mergejoin path from the cheapest-total-cost paths.
  		 */
! 		if (extra->mergejoin_allowed && !extra->mergeclause_list)
  		{
! 			/* Get the initial cost estimates. */
! 			initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 								   outer_path, inner_path, NIL, NIL,
! 								   extra->sjinfo);
  
! 			/* Generate a mergejoin join path. */
! 			result = (Path *) create_mergejoin_path(root,
! 													joinrel,
! 													jointype,
! 													&workspace,
! 													extra->sjinfo,
! 													outer_path,
! 													inner_path,
! 													extra->restrictlist,
! 													NIL, NULL,
! 													NIL, NIL, NIL);
  		}
! 		else if (extra->hashclause_list)
  		{
! 			/* Get the initial cost estimates. */
! 			initial_cost_hashjoin(root, &workspace, jointype,
! 								  extra->hashclause_list,
! 								  outer_path, inner_path,
! 								  extra->sjinfo, &extra->semifactors);
  
! 			/* Generate a hashjoin join path. */
! 			result = (Path *) create_hashjoin_path(root,
! 												   joinrel,
! 												   jointype,
! 												   &workspace,
! 												   extra->sjinfo,
! 												   &extra->semifactors,
! 												   outer_path,
! 												   inner_path,
! 												   extra->restrictlist,
! 												   NULL,
! 												   extra->hashclause_list);
  		}
+ 		else
+ 			result = NULL;
+ 	}
+ 	else
+ 	{
+ 		/* Get the initial cost estimates. */
+ 		initial_cost_nestloop(root, &workspace, jointype,
+ 						  outer_path, inner_path,
+ 						  extra->sjinfo, &extra->semifactors);
  
! 		/* Generate a nestloop join path. */
! 		result = (Path *) create_nestloop_path(root,
! 											   joinrel,
! 											   jointype,
! 											   &workspace,
! 											   extra->sjinfo,
! 											   &extra->semifactors,
! 											   outer_path,
! 											   inner_path,
! 											   extra->restrictlist,
! 											   NIL, NULL);
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 90,95 **** add_paths_to_joinrel(PlannerInfo *root,
--- 90,97 ----
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.hashclause_list = NIL;
+ 	extra.mergejoin_allowed = false;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
***************
*** 100,105 **** add_paths_to_joinrel(PlannerInfo *root,
--- 102,108 ----
  	 * it's a full join.
  	 */
  	if (enable_mergejoin || jointype == JOIN_FULL)
+ 	{
  		extra.mergeclause_list = select_mergejoin_clauses(root,
  														  joinrel,
  														  outerrel,
***************
*** 107,112 **** add_paths_to_joinrel(PlannerInfo *root,
--- 110,117 ----
  														  restrictlist,
  														  jointype,
  														  &mergejoin_allowed);
+ 		extra.mergejoin_allowed = mergejoin_allowed;
+ 	}
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 1320,1325 **** hash_inner_and_outer(PlannerInfo *root,
--- 1325,1333 ----
  		hashclauses = lappend(hashclauses, restrictinfo);
  	}
  
+ 	/* Save hashclauses for possible use by the FDW */
+ 	extra->hashclause_list = hashclauses;
+ 
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
  	{
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 235,240 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 235,242 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 2006,2011 **** typedef struct SemiAntiJoinFactors
--- 2006,2014 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * hashclause_list is a list of RestrictInfo nodes for available
+  *		hashjoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
***************
*** 2014,2019 **** typedef struct JoinPathExtraData
--- 2017,2024 ----
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	List	   *hashclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;

#19

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#18)

1 attachment(s)

Re: postgres_fdw bug in 9.6

Some review comments

1. postgres_fdw doesn't push down semi and anti joins so you may want to
discount these two too.
+           jointype == JOIN_SEMI ||
+           jointype == JOIN_ANTI);

2. We should try to look for other not-so-cheap paths if the cheapest one is
paramterized. You might want to use get_cheapest_path_for_pathkeys() to find a
suitable unparameterized path by passing NULL for required_outer and NIL for
pathkeys, that's a very strange usage, but I think it will serve the purpose.
On the thread we discussed that we should save the epq_path create for lower
join and use it here. That may be better than searching for a path.
+    /* Give up if the cheapest-total-cost paths are parameterized. */
+    if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+        !bms_is_empty(PATH_REQ_OUTER(inner_path)))
+        return NULL;

3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
to the structure even when there is no "FULL foreign join which requires EPQ"
involved in the query. That may not be so much of memory overhead since the
structure is used locally to add_paths_to_joinrel(), but it might be something
to think about. Instead, what if we call select_mergejoin_clauses() within
CreateLocalJoinPath() to get that information?

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available in
the list of paths for the given relation. Instead of building the same path
anew, should we use the existing path like GetExistingLocalJoinPath() for that
case? In fact, I am wondering whether we should look for an existing nestloop
path for all joins except full, in which case we should look for merge or hash
join path. We go on building a new path, if only there isn't an existing one.
That will certainly save some CPU cycles spent in costing the path.

4. Following comment mentions only hash join, but the code creates a merge join
or hash join path.
* If the jointype is JOIN_FULL, try to create a hashjoin join path from

5. Per comment below a clauseless full join can be implemented using only merge
join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true here?
/*
* Special corner case: for "x FULL JOIN y ON true", there will be no
* join clauses at all. Note that mergejoin is our only join type
* that supports FULL JOIN without any join clauses, and in that case
* it doesn't require the input paths to be well ordered, so generate
* a clauseless mergejoin path from the cheapest-total-cost paths.
*/
if (extra->mergejoin_allowed && !extra->mergeclause_list)

Rethinking about the problem, the error comes because the inner or outer plan
of a merge join plan doesn't have pathkeys required by the merge join. This
happens because the merge join path had foreign path with pathkeys as inner or
outer path and corresponding fdw_outerpath didn't have those pathkeys. I am
wondering whether the easy and possibly correct solution here is to not replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the local
plan available as outer plan. Attached patch with that code removed.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

epq_plan_failure.patchtext/x-patch; charset=US-ASCII; name=epq_plan_failure.patchDownload

diff --git a/src/backend/foreign/foreign.c b/src/backend/foreign/foreign.c
index 242d6d2..872d2a1 100644
--- a/src/backend/foreign/foreign.c
+++ b/src/backend/foreign/foreign.c
@@ -775,30 +775,6 @@ GetExistingLocalJoinPath(RelOptInfo *joinrel)
 		if (!joinpath)
 			continue;
 
-		/*
-		 * If either inner or outer path is a ForeignPath corresponding to a
-		 * pushed down join, replace it with the fdw_outerpath, so that we
-		 * maintain path for EPQ checks built entirely of local join
-		 * strategies.
-		 */
-		if (IsA(joinpath->outerjoinpath, ForeignPath))
-		{
-			ForeignPath *foreign_path;
-
-			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
-			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
-				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
-		}
-
-		if (IsA(joinpath->innerjoinpath, ForeignPath))
-		{
-			ForeignPath *foreign_path;
-
-			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
-			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
-				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
-		}
-
 		return (Path *) joinpath;
 	}
 	return NULL;

#20

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Ashutosh Bapat (#19)

Re: postgres_fdw bug in 9.6

On Wed, Dec 21, 2016 at 11:04 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Some review comments

1. postgres_fdw doesn't push down semi and anti joins so you may want to
discount these two too.
+           jointype == JOIN_SEMI ||
+           jointype == JOIN_ANTI);

But in the future, it might. We shouldn't randomly leave foot-guns
lying around if there's an easy alternative.

3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
to the structure even when there is no "FULL foreign join which requires EPQ"
involved in the query. That may not be so much of memory overhead since the
structure is used locally to add_paths_to_joinrel(), but it might be something
to think about. Instead, what if we call select_mergejoin_clauses() within
CreateLocalJoinPath() to get that information?

I think that's exactly backwards. The few bytes of storage don't
matter, but extra CPU cycles might.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Etsuro Fujita (#18)

Re: postgres_fdw bug in 9.6

On 2016/12/21 21:44, Etsuro Fujita wrote:

On 2016/12/20 0:37, Tom Lane wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

On 2016/12/17 1:13, Tom Lane wrote:

So I think the rule could be

"When first asked to produce a path for a given foreign joinrel,
collect
the cheapest paths for its left and right inputs, and make a
nestloop path
(or hashjoin path, if full join) from those, using the join quals
needed
for the current input relation pair.

Seems reasonable.

Use this as the fdw_outerpath for
all foreign paths made for the joinrel."

I'm not sure that would work well for foreign joins with sort orders.
Consider a merge join, whose left input is a 2-way foreign join with a
sort order that implements a full join and whose right input is a sorted
local table scan. If the EPQ subplan for the foreign join wouldn't
produce the right sort order, the merge join might break during EPQ
rechecks (note that in this case the EPQ subplan for the foreign join
might produce more than a single row during an EPQ recheck).

How so? We only recheck one row at a time, therefore it can be
claimed to
have any sort order you care about.

I'll have second thoughts about that.

I noticed I was wrong and you are right. Sorry for the noise.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Robert Haas (#20)

Re: postgres_fdw bug in 9.6

On 2016/12/23 1:04, Robert Haas wrote:

On Wed, Dec 21, 2016 at 11:04 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
Some review comments
1. postgres_fdw doesn't push down semi and anti joins so you may want to
discount these two too.
+           jointype == JOIN_SEMI ||
+           jointype == JOIN_ANTI);

But in the future, it might.

I plan to work on adding those cases to postgres_fdw.

We shouldn't randomly leave foot-guns
lying around if there's an easy alternative.

Some FDWs might have already supported pushing down semi/anti joins. So
I think it's better to handle those joins as well.

3. Adding new members to JoinPathExtraData may save some work for postgres_fdw
and other FDWs which would use CreateLocalJoinPath(), but it will add few bytes
to the structure even when there is no "FULL foreign join which requires EPQ"
involved in the query. That may not be so much of memory overhead since the
structure is used locally to add_paths_to_joinrel(), but it might be something
to think about. Instead, what if we call select_mergejoin_clauses() within
CreateLocalJoinPath() to get that information?

I think that's exactly backwards. The few bytes of storage don't
matter, but extra CPU cycles might.

I agree with Robert.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#19)

Re: postgres_fdw bug in 9.6

On 2016/12/22 1:04, Ashutosh Bapat wrote:

Some review comments

Thanks for the review!

2. We should try to look for other not-so-cheap paths if the cheapest one is
paramterized. You might want to use get_cheapest_path_for_pathkeys() to find a
suitable unparameterized path by passing NULL for required_outer and NIL for
pathkeys, that's a very strange usage, but I think it will serve the purpose.
On the thread we discussed that we should save the epq_path create for lower
join and use it here. That may be better than searching for a path.
+    /* Give up if the cheapest-total-cost paths are parameterized. */
+    if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+        !bms_is_empty(PATH_REQ_OUTER(inner_path)))
+        return NULL;

I did that because I think that would work well for postgres_fdw, but I
agree with you. Will revise.

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available in
the list of paths for the given relation. Instead of building the same path
anew, should we use the existing path like GetExistingLocalJoinPath() for that
case?

Hm, that might be an idea, but my concern about that is: the existing
path wouldn't always be guaranteed to be unprameterized.

In fact, I am wondering whether we should look for an existing nestloop
path for all joins except full, in which case we should look for merge or hash
join path. We go on building a new path, if only there isn't an existing one.
That will certainly save some CPU cycles spent in costing the path.

Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be
removed from the rel's pathlist by dominated merge or hash join paths,
so searching the pathlist might cause a useless overhead.

4. Following comment mentions only hash join, but the code creates a merge join
or hash join path.
* If the jointype is JOIN_FULL, try to create a hashjoin join path from

Will revise.

5. Per comment below a clauseless full join can be implemented using only merge
join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true here?
/*
* Special corner case: for "x FULL JOIN y ON true", there will be no
* join clauses at all. Note that mergejoin is our only join type
* that supports FULL JOIN without any join clauses, and in that case
* it doesn't require the input paths to be well ordered, so generate
* a clauseless mergejoin path from the cheapest-total-cost paths.
*/
if (extra->mergejoin_allowed && !extra->mergeclause_list)

See the comments for select_mergejoin_clauses:

* select_mergejoin_clauses
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
* this is a right/full join and there are nonmergejoinable join clauses.
* The executor's mergejoin machinery cannot handle such cases, so we have
* to avoid generating a mergejoin plan. (Note that this flag does NOT
* consider whether there are actually any mergejoinable clauses. This is
* correct because in some cases we need to build a clauseless mergejoin.
* Simply returning NIL is therefore not enough to distinguish safe from
* unsafe cases.)

Rethinking about the problem, the error comes because the inner or outer plan
of a merge join plan doesn't have pathkeys required by the merge join. This
happens because the merge join path had foreign path with pathkeys as inner or
outer path and corresponding fdw_outerpath didn't have those pathkeys. I am
wondering whether the easy and possibly correct solution here is to not replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the local
plan available as outer plan. Attached patch with that code removed.

That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join
path for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might
cause an overhead at EPQ rechecks, and (2) choosing a local join path
randomly from the rel's pathlist wouldn't be easy to understand.

I'll add this to the next CF.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#23)

Re: postgres_fdw bug in 9.6

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available
in
the list of paths for the given relation. Instead of building the same
path
anew, should we use the existing path like GetExistingLocalJoinPath() for
that
case?

Hm, that might be an idea, but my concern about that is: the existing path
wouldn't always be guaranteed to be unprameterized.

Why? We retain all the parameterizations (including no
parameterization) available in the pathlist, so if it's possible to
create an unparameterized path, there will be one.

In fact, I am wondering whether we should look for an existing nestloop
path for all joins except full, in which case we should look for merge or
hash
join path. We go on building a new path, if only there isn't an existing
one.
That will certainly save some CPU cycles spent in costing the path.

Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be
removed from the rel's pathlist by dominated merge or hash join paths, so
searching the pathlist might cause a useless overhead.

Fare point.

5. Per comment below a clauseless full join can be implemented using only
merge
join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true
here?
/*
* Special corner case: for "x FULL JOIN y ON true", there will be
no
* join clauses at all. Note that mergejoin is our only join type
* that supports FULL JOIN without any join clauses, and in that
case
* it doesn't require the input paths to be well ordered, so
generate
* a clauseless mergejoin path from the cheapest-total-cost paths.
*/
if (extra->mergejoin_allowed && !extra->mergeclause_list)

See the comments for select_mergejoin_clauses:

* select_mergejoin_clauses
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
* this is a right/full join and there are nonmergejoinable join clauses.
* The executor's mergejoin machinery cannot handle such cases, so we have
* to avoid generating a mergejoin plan. (Note that this flag does NOT
* consider whether there are actually any mergejoinable clauses. This is
* correct because in some cases we need to build a clauseless mergejoin.
* Simply returning NIL is therefore not enough to distinguish safe from
* unsafe cases.)

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

Rethinking about the problem, the error comes because the inner or outer
plan
of a merge join plan doesn't have pathkeys required by the merge join.
This
happens because the merge join path had foreign path with pathkeys as
inner or
outer path and corresponding fdw_outerpath didn't have those pathkeys. I
am
wondering whether the easy and possibly correct solution here is to not
replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the
local
plan available as outer plan. Attached patch with that code removed.

That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join path
for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
overhead at EPQ rechecks, and

The reason we chose to pick up an existing path was that the
discussion in thread [1]/messages/by-id/565E638E.8020703@lab.ntt.co.jp -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

(2) choosing a local join path randomly from
the rel's pathlist wouldn't be easy to understand.

Easy to understand for whom? Can you please elaborate?

[1]: /messages/by-id/565E638E.8020703@lab.ntt.co.jp -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#24)

Re: postgres_fdw bug in 9.6

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

Right, maybe we can do that by doing similar things as in
match_unsort_outer and/or sort_inner_and_outer. But as you mentioned,
the case is rare, so the problem would be whether it's worth
complicating the code (and if it's worth, whether we should do that at
the first version of the function).

I
am
wondering whether the easy and possibly correct solution here is to not
replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the
local
plan available as outer plan.

That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join path
for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
overhead at EPQ rechecks, and

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

No, I won't. Usually, the overhead would be negligible, but in some
cases where there are many concurrent updates, the overhead might not be
negligible due to many EPQ rechecks. So it would be better to have an
efficient local plan.

(2) choosing a local join path randomly from
the rel's pathlist wouldn't be easy to understand.

Easy to understand for whom? Can you please elaborate?

Users. I think the ease of understanding for users is important.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#25)

Re: postgres_fdw bug in 9.6

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

Right, maybe we can do that by doing similar things as in match_unsort_outer
and/or sort_inner_and_outer. But as you mentioned, the case is rare, so the
problem would be whether it's worth complicating the code (and if it's
worth, whether we should do that at the first version of the function).

All I am requesting is changing the order of conditions. That doesn't
complicate the code.

I
am
wondering whether the easy and possibly correct solution here is to not
replace
a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
don't do
that, there won't be error building merge join plan and
postgresRecheckForeignScan() would correctly route the EPQ checks to the
local
plan available as outer plan.

That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join
path
for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
overhead at EPQ rechecks, and

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

No, I won't. Usually, the overhead would be negligible, but in some cases
where there are many concurrent updates, the overhead might not be
negligible due to many EPQ rechecks. So it would be better to have an
efficient local plan.

All that the EPQ rechecks do is apply the join and other quals again
on the base relation rows. Will choice of plan affect the efficiency?

(2) choosing a local join path randomly from
the rel's pathlist wouldn't be easy to understand.

Easy to understand for whom? Can you please elaborate?

Users. I think the ease of understanding for users is important.

I doubt users care much about whether an existing path is returned or
a new one created as long as they get one to stuff in fdw_outerpath.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#26)

Re: postgres_fdw bug in 9.6

On 2016/12/28 15:54, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

Right, maybe we can do that by doing similar things as in match_unsort_outer
and/or sort_inner_and_outer. But as you mentioned, the case is rare, so the
problem would be whether it's worth complicating the code (and if it's
worth, whether we should do that at the first version of the function).

All I am requesting is changing the order of conditions. That doesn't
complicate the code.

I might have misunderstood your words, but you are saying we should
consider mergejoin paths with some mergeclauses in the case where
hashclauses is NIL, right? To do so, we would need to consider the sort
orders of outer/inner paths, which would make the code complicated.

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

No, I won't. Usually, the overhead would be negligible, but in some cases
where there are many concurrent updates, the overhead might not be
negligible due to many EPQ rechecks. So it would be better to have an
efficient local plan.

All that the EPQ rechecks do is apply the join and other quals again
on the base relation rows. Will choice of plan affect the efficiency?

Merge or hash joins would need extra steps to start that work (for
example, building a hash table from the inner relation for a hash join.)

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#27)

Re: postgres_fdw bug in 9.6

On Wed, Dec 28, 2016 at 1:29 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/28 15:54, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

Right, maybe we can do that by doing similar things as in
match_unsort_outer
and/or sort_inner_and_outer. But as you mentioned, the case is rare, so
the
problem would be whether it's worth complicating the code (and if it's
worth, whether we should do that at the first version of the function).

All I am requesting is changing the order of conditions. That doesn't
complicate the code.

I might have misunderstood your words, but you are saying we should consider
mergejoin paths with some mergeclauses in the case where hashclauses is NIL,
right? To do so, we would need to consider the sort orders of outer/inner
paths, which would make the code complicated.

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

No, I won't. Usually, the overhead would be negligible, but in some
cases
where there are many concurrent updates, the overhead might not be
negligible due to many EPQ rechecks. So it would be better to have an
efficient local plan.

All that the EPQ rechecks do is apply the join and other quals again
on the base relation rows. Will choice of plan affect the efficiency?

Merge or hash joins would need extra steps to start that work (for example,
building a hash table from the inner relation for a hash join.)

Hmm, I agree.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Etsuro Fujita (#23)

Re: postgres_fdw bug in 9.6

On 2016/12/27 16:41, Etsuro Fujita wrote:

On 2016/12/22 1:04, Ashutosh Bapat wrote:

2. We should try to look for other not-so-cheap paths if the cheapest
one is
paramterized. You might want to use get_cheapest_path_for_pathkeys()
to find a
suitable unparameterized path by passing NULL for required_outer and
NIL for
pathkeys, that's a very strange usage, but I think it will serve the
purpose.

+    /* Give up if the cheapest-total-cost paths are parameterized. */
+    if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+        !bms_is_empty(PATH_REQ_OUTER(inner_path)))
+        return NULL;

I did that because I think that would work well for postgres_fdw, but I
agree with you. Will revise.

While working on this, I noticed that in that case
get_cheapest_path_for_pathkeys() would return NULL because if the
cheapest-total-cost path is parameterized, then there are no
unparameterized paths in the rel's pathlist (see set_cheapest).

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#24)

Re: postgres_fdw bug in 9.6

On 2016/12/27 22:03, Ashutosh Bapat wrote:

You wrote:

3. Talking about saving some CPU cycles - if a clauseless full join can be
implemented only using merge join, probably that's the only path available
in
the list of paths for the given relation. Instead of building the same
path
anew, should we use the existing path like GetExistingLocalJoinPath() for
that
case?

I wrote:

Hm, that might be an idea, but my concern about that is: the existing path
wouldn't always be guaranteed to be unprameterized.

Why? We retain all the parameterizations (including no
parameterization) available in the pathlist, so if it's possible to
create an unparameterized path, there will be one.

OK, but another concern is: in cases when we consider parameterized
paths, it might be inefficient to search the pathlist because the
unparameterized path might be at the rear of the lengthy pathlist. Note
that patameterized paths would produce fewer rows and have reduced
transfer and hence total cost, so they would be in the more front of the
pathlist, and the unparameterized path would be in the more rear. (Note
that the pathlist is kept sorted by total cost.)

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#28)

Re: postgres_fdw bug in 9.6

On 2016/12/28 17:34, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 1:29 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/28 15:54, Ashutosh Bapat wrote:

On Wed, Dec 28, 2016 at 9:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/27 22:03, Ashutosh Bapat wrote:

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

Right, maybe we can do that by doing similar things as in
match_unsort_outer
and/or sort_inner_and_outer. But as you mentioned, the case is rare, so
the
problem would be whether it's worth complicating the code (and if it's
worth, whether we should do that at the first version of the function).

All I am requesting is changing the order of conditions. That doesn't
complicate the code.

I might have misunderstood your words, but you are saying we should consider
mergejoin paths with some mergeclauses in the case where hashclauses is NIL,
right? To do so, we would need to consider the sort orders of outer/inner
paths, which would make the code complicated.

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

Ok, will revise, but as I mentioned upthread, I'm not sure it's a good
idea to search the pathlist to get a merge join even in this case. I'd
vote for creating a merge join path from the inner/outer paths in this
case as well. I think that would simplify the code as well.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#31)

Re: postgres_fdw bug in 9.6

On Thu, Jan 5, 2017 at 8:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2016/12/27 16:41, Etsuro Fujita wrote:

On 2016/12/22 1:04, Ashutosh Bapat wrote:

2. We should try to look for other not-so-cheap paths if the cheapest
one is
paramterized. You might want to use get_cheapest_path_for_pathkeys()
to find a
suitable unparameterized path by passing NULL for required_outer and
NIL for
pathkeys, that's a very strange usage, but I think it will serve the
purpose.
+    /* Give up if the cheapest-total-cost paths are parameterized. */
+    if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
+        !bms_is_empty(PATH_REQ_OUTER(inner_path)))
+        return NULL;
I did that because I think that would work well for postgres_fdw, but I
agree with you. Will revise.

While working on this, I noticed that in that case
get_cheapest_path_for_pathkeys() would return NULL because if the
cheapest-total-cost path is parameterized, then there are no unparameterized
paths in the rel's pathlist (see set_cheapest).

I guess, that happens when there are lateral references, and every
path for that relation in parameterized. Please correct me if I am
wrong. If that's true, we should be searching for minimum
parameterization for that relation instead of no parameterization.
IIUC the bit about outer relations in the following comment in
add_path()

* We also remove from the rel's pathlist any old paths that are dominated
* by new_path --- that is, new_path is cheaper, at least as well ordered,
* generates no more rows, requires no outer rels not required by the old
* path, and is no less parallel-safe.

we do not discard an unparameterized path in favour of a cheaper
parameterized path. So, if the cheapest path is unparameterized one,
it's parameterized by minimum required outer relations.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

#33

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Ashutosh Bapat (#32)

Re: postgres_fdw bug in 9.6

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

Ok, will revise, but as I mentioned upthread, I'm not sure it's a good idea
to search the pathlist to get a merge join even in this case. I'd vote for
creating a merge join path from the inner/outer paths in this case as well.
I think that would simplify the code as well.

Creating a new path 1. requires memory 2. spends CPU cycles in costing
and creating it 3. requires a search in inner and outer relations'
pathlists (see my earlier reply). Searching for an existing path just
requires a search in one relation's pathlist. The path should be
there. Why do we want to spend extra resources in creating a new path
when an old one exists and searching it is more efficient.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Resolved by subject fallback

#34

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#33)

Re: postgres_fdw bug in 9.6

On 2017/01/05 13:19, Ashutosh Bapat wrote:

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

Ok, will revise, but as I mentioned upthread, I'm not sure it's a good idea
to search the pathlist to get a merge join even in this case. I'd vote for
creating a merge join path from the inner/outer paths in this case as well.
I think that would simplify the code as well.

Creating a new path 1. requires memory

The search approach would require memory for saving the path, too.

2. spends CPU cycles in costing
and creating it

The search approach would also need extra cycles in the cases mentioned
in [1]/messages/by-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee@lab.ntt.co.jp, wouldn't it? Since it would be useless to cost the
fdw_outerpath of a foreign join, we could skip that for the
fdw_outerpath if necessary.

3. requires a search in inner and outer relations'
pathlists (see my earlier reply).

What I'm thinking is basically to use the cheapest-total-cost paths of
the inner/outer relations, which wouldn't require any search.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee@lab.ntt.co.jp
/messages/by-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Etsuro Fujita (#31)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/01/05 12:10, Etsuro Fujita wrote:

On 2016/12/28 17:34, Ashutosh Bapat wrote:

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

Ok, will revise, but as I mentioned upthread, I'm not sure it's a good
idea to search the pathlist to get a merge join even in this case. I'd
vote for creating a merge join path from the inner/outer paths in this
case as well.

Done. Attached is the new version of the patch.

* I'm still not sure the search approach is the right way to go, so I
modified CreateLocalJoinPath so that it creates a mergejoin path that
explicitly sorts both the outer and inner relations as in
sort_inner_and_outer, by using the information saved in that function.
I think we could try to create a sort-free mergejoin as in
match_unsorted_outer, but I'm not sure it's worth complicating the code.
* I modified CreateLocalJoinPath so that it handles the cheapest-total
paths for the outer/inner relations that are parameterized if possible.
* I adjusted the code and revised some comments.

As I said upthread, we could skip costing for a local join path in
CreateLocalJoinPath, for efficiency, but I'm not sure we should do that.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-2.patchtext/x-patch; name=epqpath-for-foreignjoin-2.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1519,1540 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1563,1584 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1557,1572 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1608,1629 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1596,1611 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1652,1673 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1634,1649 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1684,1689 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1660,1726 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4016,4033 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4053,4068 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4159,4176 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4194,4209 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4365,4374 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4365,4375 ----
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 445,450 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 445,455 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1007 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
!      which case a foreign data wrapper may build the local path by itself or
!      may choose not to create access paths for that join.)
      </para>
     </sect2>
  
--- 995,1008 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used as the alternative local
!      join plan.  <literal>CreateLocalJoinPath</> builds a nested loop join
!      path for the specified join relation, except when the join type is
!      <literal>FULL</>, in which case a merge or hash join path is built.
!      (If it does not build such a path, it returns NULL, in which case a
!      foreign data wrapper may build the local path by itself or may choose
!      not to create access paths for that join.)
      </para>
     </sect2>
  
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,29 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 693,805 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
   *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
! 
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
! 
! 		/* Skip parameterised paths. */
! 		if (path->param_info != NULL)
! 			continue;
! 
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
! 
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 695,880 ----
  	return oid;
  }
  
+ 
  /*
!  * Make a local join path for a foreign join
   *
!  * Note: currently, this function only supports unparameterized foreign joins.
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
! 	Path	   *result;
! 	Relids		required_outer;
! 	JoinCostWorkspace workspace;
  
! 	/*
! 	 * Generate a nestloop path (or hashjoin/mergejoin path, if full join)
! 	 * from the cheapest-total paths for the outer and inner relations.  (The
! 	 * point here is to avoid using a mergejoin, if possible, because that
! 	 * has assumptions about input ordering that likely won't be satisfied by
! 	 * the cheapest-total paths for the outer and inner relations.)
! 	 */
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			/*
! 			 * If the outer cheapest-total path is parameterized by the inner
! 			 * rel, we can't generate a nestloop path.  (There's no use
! 			 * looking for alternative outer paths, since it should already be
! 			 * the least-parameterized available path.)
! 			 */
! 			if (PATH_PARAM_BY_REL(outer_path, innerrel))
! 				return NULL;
! 			/* If proposed path is still parameterized, we can't use it. */
! 			required_outer = calc_nestloop_required_outer(outer_path,
! 														  inner_path);
! 			if (required_outer)
! 			{
! 				bms_free(required_outer);
! 				return NULL;
! 			}
! 			/* Get initial estimates */
! 			initial_cost_nestloop(root, &workspace, jointype,
! 								  outer_path, inner_path,
! 								  extra->sjinfo, &extra->semifactors);
! 			/* Generate a nestloop path */
! 			result = (Path *) create_nestloop_path(root,
! 												   joinrel,
! 												   jointype,
! 												   &workspace,
! 												   extra->sjinfo,
! 												   &extra->semifactors,
! 												   outer_path,
! 												   inner_path,
! 												   extra->restrictlist,
! 												   NIL, NULL);
! 			break;
! 		case JOIN_FULL:
! 			/*
! 			 * If either cheapest-total path is parameterized by the other
! 			 * rel, we can't generate a hashjoin/mergejoin path.  (There's no
! 			 * use looking for alternative input paths, since these should
! 			 * already be the least-parameterized available paths.)
! 			 */
! 			if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
! 				PATH_PARAM_BY_REL(inner_path, outerrel))
! 				return NULL;
! 			/* If proposed path is still parameterized, we can't use it. */
! 			required_outer = calc_non_nestloop_required_outer(outer_path,
! 															  inner_path);
! 			if (required_outer)
! 			{
! 				bms_free(required_outer);
! 				return NULL;
! 			}
! 			/*
! 			 * Create a hashjoin path, if possible.  Else create a mergejoin
! 			 * path, if possible.
! 			 */
! 			if (extra->hashclause_list)
! 			{
! 				/* Get initial estimates */
! 				initial_cost_hashjoin(root, &workspace, jointype,
! 									  extra->hashclause_list,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a hashjoin path */
! 				result = (Path *) create_hashjoin_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NULL,
! 													   extra->hashclause_list);
! 			}
! 			else if (extra->mergejoin_allowed)
! 			{
! 				/*
! 				 * If special case: for "x FULL JOIN y ON true", there will be
! 				 * no join clauses at all; create a clauseless mergejoin path.
! 				 * Else create a mergejoin path by explicitly sorting both the
! 				 * outer and inner relations as in sort_inner_and_outer.
! 				 */
! 				if (!extra->mergeclause_list)
  				{
! 					/* Get initial estimates */
! 					initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 										   outer_path, inner_path, NIL, NIL,
! 										   extra->sjinfo);
! 					/* Generate a mergejoin path */
! 					result = (Path *) create_mergejoin_path(root,
! 															joinrel,
! 															jointype,
! 															&workspace,
! 															extra->sjinfo,
! 															outer_path,
! 															inner_path,
! 															extra->restrictlist,
! 															NIL, NULL,
! 															NIL, NIL, NIL);
  				}
! 				else
  				{
! 					List	   *outerkeys = extra->merge_outerkeys;
! 					List	   *innerkeys = extra->merge_innerkeys;
  
! 					/*
! 					 * If the paths are already well enough ordered, we can
! 					 * skip doing an explicit sort.
! 					 */
! 					if (outerkeys &&
! 						pathkeys_contained_in(outerkeys, outer_path->pathkeys))
! 						outerkeys = NIL;
! 					if (innerkeys &&
! 						pathkeys_contained_in(innerkeys, inner_path->pathkeys))
! 						innerkeys = NIL;
! 					/* Get initial estimates */
! 					initial_cost_mergejoin(root, &workspace, jointype,
! 										   extra->mergeclauses,
! 										   outer_path, inner_path,
! 										   outerkeys, innerkeys,
! 										   extra->sjinfo);
! 					/* Generate a mergejoin path */
! 					result = (Path *) create_mergejoin_path(root,
! 															joinrel,
! 															jointype,
! 															&workspace,
! 															extra->sjinfo,
! 															outer_path,
! 															inner_path,
! 															extra->restrictlist,
! 															extra->merge_pathkeys,
! 															NULL,
! 															extra->mergeclauses,
! 															outerkeys,
! 															innerkeys);
  				}
! 			}
! 			else
! 				result = NULL;
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
  					 JoinType jointype, JoinPathExtraData *extra);
--- 25,30 ----
***************
*** 85,97 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 82,99 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.hashclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.mergeclauses = NIL;
+ 	extra.merge_pathkeys = NIL;
+ 	extra.merge_outerkeys = NIL;
+ 	extra.merge_innerkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 106,112 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 108,114 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 169,175 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 171,177 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 180,186 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 182,188 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 197,203 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 199,205 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 773,778 **** sort_inner_and_outer(PlannerInfo *root,
--- 775,789 ----
  						   innerkeys,
  						   jointype,
  						   extra);
+ 
+ 		/* Save first mergejoin information for possible use by the FDW */
+ 		if (outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->merge_pathkeys = merge_pathkeys;
+ 			extra->merge_outerkeys = outerkeys;
+ 			extra->merge_innerkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1320,1325 **** hash_inner_and_outer(PlannerInfo *root,
--- 1331,1339 ----
  		hashclauses = lappend(hashclauses, restrictinfo);
  	}
  
+ 	/* Save hashclauses for possible use by the FDW */
+ 	extra->hashclause_list = hashclauses;
+ 
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
  	{
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 235,240 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 235,242 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 904,909 **** typedef struct Path
--- 904,913 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2006,2022 **** typedef struct SemiAntiJoinFactors
--- 2010,2040 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * hashclause_list is a list of RestrictInfo nodes for available
+  *		hashjoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  * mergeclauses are the RestrictInfo nodes to be used as merge clauses for
+  *		a mergejoin path
+  * merge_pathkeys are the pathkeys of the mergejoin path
+  * merge_outerkeys are the sort pathkeys for the outer relation
+  * merge_innerkeys are the sort pathkeys for the inner relation
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	List	   *hashclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	List	   *mergeclauses;
+ 	List	   *merge_pathkeys;
+ 	List	   *merge_outerkeys;
+ 	List	   *merge_innerkeys;
  } JoinPathExtraData;
  
  /*

#36

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#35)

Re: postgres_fdw bug in 9.6

Done. Attached is the new version of the patch.

* I'm still not sure the search approach is the right way to go, so I
modified CreateLocalJoinPath so that it creates a mergejoin path that
explicitly sorts both the outer and inner relations as in
sort_inner_and_outer, by using the information saved in that function. I
think we could try to create a sort-free mergejoin as in
match_unsorted_outer, but I'm not sure it's worth complicating the code.

Why is this so?
* If the outer cheapest-total path is parameterized by the inner
* rel, we can't generate a nestloop path.
and
* If either cheapest-total path is parameterized by the other
* rel, we can't generate a hashjoin/mergejoin path. (There's no

If the inner and/or outer paths are not ordered as required, we will need to
order them. Code below doesn't seem to handle that case.
/*
* If the paths are already well enough ordered, we can
* skip doing an explicit sort.
*/
if (outerkeys &&
pathkeys_contained_in(outerkeys, outer_path->pathkeys))
outerkeys = NIL;
if (innerkeys &&
pathkeys_contained_in(innerkeys, inner_path->pathkeys))
innerkeys = NIL;

* I modified CreateLocalJoinPath so that it handles the cheapest-total paths
for the outer/inner relations that are parameterized if possible.

I don't think we need to provide details of what kind of path the function
builds.
+     join plan.  <literal>CreateLocalJoinPath</> builds a nested loop join
+     path for the specified join relation, except when the join type is
+     <literal>FULL</>, in which case a merge or hash join path is built.

I am not able to understand the code or the comment below
+
+        /* Save first mergejoin information for possible use by the FDW */
+        if (outerkeys == all_pathkeys)
+        {
+            extra->mergeclauses = cur_mergeclauses;
+            extra->merge_pathkeys = merge_pathkeys;
+            extra->merge_outerkeys = outerkeys;
+            extra->merge_innerkeys = innerkeys;
+        }

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#36)

Re: postgres_fdw bug in 9.6

On 2017/01/09 22:36, Ashutosh Bapat wrote:

I wrote:

Done. Attached is the new version of the patch.

Why is this so?
* If the outer cheapest-total path is parameterized by the inner
* rel, we can't generate a nestloop path.

That parameterization means that for each row scanned from the inner
rel, the nestloop has to pass down the param values of the row into the
scan of the outer rel; but it's not possible for the nestloop to work
like that. You can find the same tests in match_unsorted_outer.

* If either cheapest-total path is parameterized by the other
* rel, we can't generate a hashjoin/mergejoin path.

The above applies to hashjoins and mergejoins. In addition, those joins
cannot pass down the param values of each outer-rel row into the
inner-rel scan, like nestloop joins, so the inner path mustn't be
parameterized by the outer rel in those joins. See the same tests in
sort_inner_and_outer and hash_inner_and_outer.

While re-reading the patch, I noticed this could be simplified:

! case JOIN_FULL:
! /*
! * If either cheapest-total path is parameterized by the other
! * rel, we can't generate a hashjoin/mergejoin path. (There's no
! * use looking for alternative input paths, since these should
! * already be the least-parameterized available paths.)
! */
! if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
! PATH_PARAM_BY_REL(inner_path, outerrel))
! return NULL;
! /* If proposed path is still parameterized, we can't use it. */
! required_outer = calc_non_nestloop_required_outer(outer_path,
! inner_path);

This would mean that both the outer and inner paths must be
unparameterized. Will simplify if it's ok.

I don't think we need to provide details of what kind of path the function
builds.
+     join plan.  <literal>CreateLocalJoinPath</> builds a nested loop join
+     path for the specified join relation, except when the join type is
+     <literal>FULL</>, in which case a merge or hash join path is built.

Ok, will remove.

I am not able to understand the code or the comment below
+
+        /* Save first mergejoin information for possible use by the FDW */
+        if (outerkeys == all_pathkeys)
+        {
+            extra->mergeclauses = cur_mergeclauses;
+            extra->merge_pathkeys = merge_pathkeys;
+            extra->merge_outerkeys = outerkeys;
+            extra->merge_innerkeys = innerkeys;
+        }

In the case when mergejoin is allowed and we have merge clauses but no
hash clauses, CreateLocalJoinPath creates a mergejoin from the
cheapest-total paths for the outer and inner rels, by explicitly sorting
both the outer and inner rels. The above is for creating the mergejoin
in CreateLocalJoinPath; since sort_inner_and_outer also creates
mergejoins from the cheapest-total paths, the above code saves data
about one of mergejoins created there so that CreateLocalJoinPath uses
that data to create the mergejoin.

On second thought, I noticed that this could be simplified a bit as
well; since the output sort order (merge_pathkeys) of the mergejoin
implementing a FULL join is NIL (see build_join_pathkeys), we wouldn't
need to save that info in merge_pathkeys. Will simplify if it's ok.

If the inner and/or outer paths are not ordered as required, we will need to
order them. Code below doesn't seem to handle that case.
/*
* If the paths are already well enough ordered, we can
* skip doing an explicit sort.
*/
if (outerkeys &&
pathkeys_contained_in(outerkeys, outer_path->pathkeys))
outerkeys = NIL;
if (innerkeys &&
pathkeys_contained_in(innerkeys, inner_path->pathkeys))
innerkeys = NIL;

I might be missing something, but if the outer/inner paths are already
well sorted, we wouldn't need an explicit sort, so we can set the
outerkeys/innerkeys to NIL safely. (You can find the same optimization
in try_mergejoin_path.)

Thanks for the review!

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#37)

Re: postgres_fdw bug in 9.6

If the inner and/or outer paths are not ordered as required, we will need
to
order them. Code below doesn't seem to handle that case.
/*
* If the paths are already well enough ordered, we
can
* skip doing an explicit sort.
*/
if (outerkeys &&
pathkeys_contained_in(outerkeys,
outer_path->pathkeys))
outerkeys = NIL;
if (innerkeys &&
pathkeys_contained_in(innerkeys,
inner_path->pathkeys))
innerkeys = NIL;

I might be missing something, but if the outer/inner paths are already well
sorted, we wouldn't need an explicit sort, so we can set the
outerkeys/innerkeys to NIL safely. (You can find the same optimization in
try_mergejoin_path.)

Actually I didn't notice that create_mergejoin_path saves those keys
in the MergePath and then adds sorting steps during planning. Sorry
for the trouble.

Another concern here is that we are copying pieces of logic in
add_paths_to_joinrel() without arranging it as neatly as in that
function.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#35)

Re: postgres_fdw bug in 9.6

On Fri, Jan 6, 2017 at 6:31 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/05 12:10, Etsuro Fujita wrote:

On 2016/12/28 17:34, Ashutosh Bapat wrote:

Hmm. If I understand the patch correctly, it does not return any path
when merge join is allowed and there are merge clauses but no hash
clauses. In this case we will not create a foreign join path, loosing
some optimization. If we remove GetExistingLocalJoinPath, which
returns a path in those cases as well, we have a regression in
performance.

Ok, will revise, but as I mentioned upthread, I'm not sure it's a good
idea to search the pathlist to get a merge join even in this case. I'd
vote for creating a merge join path from the inner/outer paths in this
case as well.

Done. Attached is the new version of the patch.

* I'm still not sure the search approach is the right way to go, so I
modified CreateLocalJoinPath so that it creates a mergejoin path that
explicitly sorts both the outer and inner relations as in
sort_inner_and_outer, by using the information saved in that function. I
think we could try to create a sort-free mergejoin as in
match_unsorted_outer, but I'm not sure it's worth complicating the code.
* I modified CreateLocalJoinPath so that it handles the cheapest-total paths
for the outer/inner relations that are parameterized if possible.
* I adjusted the code and revised some comments.

As I said upthread, we could skip costing for a local join path in
CreateLocalJoinPath, for efficiency, but I'm not sure we should do that.

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join. Or it can
have foreign paths in there, which will need redirection. That's not
very good. Neither it's good to start constructing a nestloop join
tree top to bottom every time. We have to choose one of the following
alternatives

1. develop a mechanism to remember epq path for joining relations so
that it's available to CreateLocalJoinPath
2.let the caller pass epq local paths for joining relations to
CreateLocalJoinPath. How it remembers those, is FDW's problem.
2. Fix existing code by applying patch from [1]/messages/by-id/CAFjFpRd9oZEkur9aOrMh2Toxq4NYuUSw2kB9S+O=uvZ4xcR=Sw@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company

[1]: /messages/by-id/CAFjFpRd9oZEkur9aOrMh2Toxq4NYuUSw2kB9S+O=uvZ4xcR=Sw@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#39)

Re: postgres_fdw bug in 9.6

On 2017/01/11 13:40, Ashutosh Bapat wrote:

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join.

In many cases the cheapest-total-cost outer and inner paths for a higher
foreign-join relation would be foreign join paths, which would have
nestloop paths as their fdw_outerpaths if not full joins. So by
redirection, the plan tree for EPQ would be mostly constructed by
nestloop joins. No?

Or it can
have foreign paths in there, which will need redirection. That's not
very good.

Maybe I'm missing something, but redirection isn't a problem.

We have to choose one of the following
alternatives

1. develop a mechanism to remember epq path for joining relations so
that it's available to CreateLocalJoinPath
2.let the caller pass epq local paths for joining relations to
CreateLocalJoinPath. How it remembers those, is FDW's problem.

I thought such a thing, but I think that would be overcomplicated, as
pointed out by Tom [2]/messages/by-id/12565.1481904788@sss.pgh.pa.us.

2. Fix existing code by applying patch from [1]

As I said before, that might be fine for 9.6, but I don't think it's a
good idea to search the pathlist because once we support parameterized
foreign join paths, which is on my TODOs, we would have to traverse
through the possibly-lengthy pathlist to find a local-join path, as
mentioned in [3]/messages/by-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee@lab.ntt.co.jp.

Best regards,
Etsuro Fujita

[2]: /messages/by-id/12565.1481904788@sss.pgh.pa.us
[3]: /messages/by-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee@lab.ntt.co.jp
/messages/by-id/c1075e4e-8297-5cf6-3f30-cb21266aa2ee@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#40)

Re: postgres_fdw bug in 9.6

On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/11 13:40, Ashutosh Bapat wrote:

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join.

In many cases the cheapest-total-cost outer and inner paths for a higher
foreign-join relation would be foreign join paths, which would have nestloop
paths as their fdw_outerpaths if not full joins. So by redirection, the
plan tree for EPQ would be mostly constructed by nestloop joins. No?

It's not guaranteed that we will always have foreign join paths there.
We have seen this in Jeff's example, which started this thread. We
don't know in what all cases we have a tree entirely consisting of
(cheapest) foreign join paths.

Or it can
have foreign paths in there, which will need redirection. That's not
very good.

Maybe I'm missing something, but redirection isn't a problem.

Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.

We have to choose one of the following
alternatives

1. develop a mechanism to remember epq path for joining relations so
that it's available to CreateLocalJoinPath
2.let the caller pass epq local paths for joining relations to
CreateLocalJoinPath. How it remembers those, is FDW's problem.

I thought such a thing, but I think that would be overcomplicated, as
pointed out by Tom [2].

2. Fix existing code by applying patch from [1]

As I said before, that might be fine for 9.6, but I don't think it's a good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#41)

Re: postgres_fdw bug in 9.6

On 2017/01/11 17:51, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/11 13:40, Ashutosh Bapat wrote:

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join.

In many cases the cheapest-total-cost outer and inner paths for a higher
foreign-join relation would be foreign join paths, which would have nestloop
paths as their fdw_outerpaths if not full joins. So by redirection, the
plan tree for EPQ would be mostly constructed by nestloop joins. No?

It's not guaranteed that we will always have foreign join paths there.
We have seen this in Jeff's example, which started this thread. We
don't know in what all cases we have a tree entirely consisting of
(cheapest) foreign join paths.

Right, but local-join plans need not be efficient since no base table
will return more than one row, as stated in the documentation. (I think
efficient plans without complicating the code would be better, though.)

Or it can
have foreign paths in there, which will need redirection. That's not
very good.

Maybe I'm missing something, but redirection isn't a problem.

Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.

ISTM that's solving a problem that hasn't been proven to be a problem.

2. Fix existing code by applying patch from [1]

As I said before, that might be fine for 9.6, but I don't think it's a good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.

Sorry, I don't understand this part.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 9 years ago

In reply to: Etsuro Fujita (#42)

Re: postgres_fdw bug in 9.6

On Wed, Jan 11, 2017 at 3:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/11 17:51, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/11 13:40, Ashutosh Bapat wrote:

CreateLocalJoinPath tries to construct a nestloop path for the given
join relation because it wants to avoid merge/hash join paths which
require expensive setup not required for EPQ. But it chooses cheap
paths for joining relations which may not be nestloop paths. So,
effectively it could happen that the whole alternate local plan would
be filled with hash/merge joins except the uppermost join.

In many cases the cheapest-total-cost outer and inner paths for a higher
foreign-join relation would be foreign join paths, which would have
nestloop
paths as their fdw_outerpaths if not full joins. So by redirection, the
plan tree for EPQ would be mostly constructed by nestloop joins. No?

It's not guaranteed that we will always have foreign join paths there.
We have seen this in Jeff's example, which started this thread. We
don't know in what all cases we have a tree entirely consisting of
(cheapest) foreign join paths.

Right, but local-join plans need not be efficient since no base table will
return more than one row, as stated in the documentation. (I think
efficient plans without complicating the code would be better, though.)

Or it can
have foreign paths in there, which will need redirection. That's not
very good.

Maybe I'm missing something, but redirection isn't a problem.

Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.

ISTM that's solving a problem that hasn't been proven to be a problem.

A hop will consume a function call worth CPU at least.

2. Fix existing code by applying patch from [1]

As I said before, that might be fine for 9.6, but I don't think it's a
good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.

Sorry, I don't understand this part.

A parameterized join is built if the joining paths are parameterized
as well. Thus building a parameterized local path would require one
to search suitably parameterized paths in joining relations in their
pathlists.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 9 years ago

In reply to: Ashutosh Bapat (#43)

Re: postgres_fdw bug in 9.6

On 2017/01/11 19:26, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 3:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/11 17:51, Ashutosh Bapat wrote:

On Wed, Jan 11, 2017 at 1:15 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/11 13:40, Ashutosh Bapat wrote:

Or it can
have foreign paths in there, which will need redirection. That's not
very good.

Maybe I'm missing something, but redirection isn't a problem.

Peformance wise it is, correctness-wise it is not. Why do we want to
incur a hop, when we can avoid it.

ISTM that's solving a problem that hasn't been proven to be a problem.

A hop will consume a function call worth CPU at least.

Is that a problem in practice?

2. Fix existing code by applying patch from [1]

As I said before, that might be fine for 9.6, but I don't think it's a
good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I don't agree that pathlists will be long enough to make this a
non-attractive solution. For parameterized foreign join paths, with
the approach that this patch takes, we will require to search in two
such pathlists, inner and outer.

Sorry, I don't understand this part.

A parameterized join is built if the joining paths are parameterized
as well. Thus building a parameterized local path would require one
to search suitably parameterized paths in joining relations in their
pathlists.

Yeah, I'm thinking to (1) create parameterized foreign-join paths from
the cheapest_parameterized_paths lists of the outer and inner rels, and
(2) create a local-join paths for any parameterized foreign-join path
from the component paths chosen from the lists in a similar way as
CreateLocalJoinPath creates an unparameterized local-join path from the
cheapest-total-cost paths of the outer and inner rels. That's the real
reason why I'm proposing to replace GetExistingLocalJoinPath with
CreateLocalJoinPath.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Etsuro Fujita (#40)

Re: postgres_fdw bug in 9.6

On Wed, Jan 11, 2017 at 2:45 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

As I said before, that might be fine for 9.6, but I don't think it's a good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I'm not sure that's really going to be a problem. The number of
possible parameterizations that need to be considered isn't usually
very big. I bet the path list will have ten or a few tens of entries
in it, not a hundred or a thousand. Traversing it isn't that
expensive.

That having been said, I haven't read the patches, so I'm not really
up to speed on the bigger issues here. But surely it's more important
to get the overall design right than to worry about the cost of
walking the pathlist or worrying about the cost of an extra function
call (?!).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Robert Haas (#45)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/01/13 0:43, Robert Haas wrote:

On Wed, Jan 11, 2017 at 2:45 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

As I said before, that might be fine for 9.6, but I don't think it's a good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I'm not sure that's really going to be a problem. The number of
possible parameterizations that need to be considered isn't usually
very big. I bet the path list will have ten or a few tens of entries
in it, not a hundred or a thousand. Traversing it isn't that
expensive.

That having been said, I haven't read the patches, so I'm not really
up to speed on the bigger issues here. But surely it's more important
to get the overall design right than to worry about the cost of
walking the pathlist or worrying about the cost of an extra function
call (?!).

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in
which case, fdw_outerpath for a given foreign-join path would need to
have the same parameterization as the foreign-join path, but there might
not be any existing paths with the same parameterization in the path
list. You might think we could get the fdw_outerpath by getting an
existing path with no parameterization as in GetExistingLocalJoinPath
and then modifying the path's param_info to match the parameterization
of the foreign-join path. I don't know that really works, but that
might be inefficient.

What I have in mind to support foreign-join paths with parameterization
for postgres_fdw like this: (1) generate parameterized paths from any
joinable combination of the outer/inner cheapest-parameterized paths
that have pushed down the outer/inner relation to the remote server, in
a similar way as postgresGetForeignJoinPaths creates unparameterized
paths, and (2) create fdw_outerpath for each parameterized path from the
outer/inner paths used to generate the parameterized path, by
create_nestloop_path (or, create_hashjoin_path or create_mergejoin_path
if full join), so that the resulting fdw_outerpath has the same
parameterization as the paramterized path. This would probably work and
might be more efficient. And the patch I proposed would be easily
extended to this, by replacing the outer/inner cheapest-total paths with
the outer/inner cheapest-parameterized paths. Attached is the latest
version of the patch.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-3.patchtext/x-patch; name=epqpath-for-foreignjoin-3.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1519,1540 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1563,1584 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1557,1572 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1608,1629 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1596,1611 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1652,1673 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1634,1649 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1684,1689 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1660,1726 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4016,4033 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4053,4068 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4159,4176 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4194,4209 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4365,4374 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4365,4375 ----
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 445,450 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 445,455 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used as the alternative local
!      join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,29 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 693,805 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
   *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
! 
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
! 
! 		/* Skip parameterised paths. */
! 		if (path->param_info != NULL)
! 			continue;
! 
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
! 
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 695,871 ----
  	return oid;
  }
  
+ 
  /*
!  * Make a local join path for a foreign join
   *
!  * Note: currently, this function only supports unparameterized foreign joins.
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
! 	Path	   *result;
! 	Relids		required_outer;
! 	JoinCostWorkspace workspace;
  
! 	/*
! 	 * Generate a nestloop path (or hashjoin/mergejoin path, if full join)
! 	 * from the cheapest-total paths for the outer and inner relations.
! 	 */
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			/*
! 			 * If the cheapest-total outer path is parameterized by the inner
! 			 * rel, we can't generate a nestloop path.  (There's no use
! 			 * looking for alternative outer paths, since it should already be
! 			 * the least-parameterized available path.)
! 			 */
! 			if (PATH_PARAM_BY_REL(outer_path, innerrel))
! 				return NULL;
! 			/* If proposed path is still parameterized, we can't use it. */
! 			required_outer = calc_nestloop_required_outer(outer_path,
! 														  inner_path);
! 			if (required_outer)
! 			{
! 				bms_free(required_outer);
! 				return NULL;
! 			}
! 			/* Get initial estimates */
! 			initial_cost_nestloop(root, &workspace, jointype,
! 								  outer_path, inner_path,
! 								  extra->sjinfo, &extra->semifactors);
! 			/* Generate a nestloop path */
! 			result = (Path *) create_nestloop_path(root,
! 												   joinrel,
! 												   jointype,
! 												   &workspace,
! 												   extra->sjinfo,
! 												   &extra->semifactors,
! 												   outer_path,
! 												   inner_path,
! 												   extra->restrictlist,
! 												   NIL, NULL);
! 			break;
! 		case JOIN_FULL:
! 			/*
! 			 * (1) if either cheapest-total path is parameterized by the other
! 			 * rel, we can't generate a hashjoin/mergejoin path, and (2) if
! 			 * proposed path is still parameterized (ie, the required_outer
! 			 * set calculated by calc_non_nestloop_required_outer isn't NULL),
! 			 * we can't use it; which means that both the cheapest-total paths
! 			 * should be unparameterized.
! 			 */
! 			if (outer_path->param_info || inner_path->param_info)
! 				return NULL;
! 			/*
! 			 * Create a hashjoin path, if possible.  Else create a mergejoin
! 			 * path, if possible.
! 			 */
! 			if (extra->hashclauses)
! 			{
! 				/* Get initial estimates */
! 				initial_cost_hashjoin(root, &workspace, jointype,
! 									  extra->hashclauses,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a hashjoin path */
! 				result = (Path *) create_hashjoin_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NULL,
! 													   extra->hashclauses);
! 			}
! 			else if (extra->mergejoin_allowed)
! 			{
! 				/*
! 				 * If special case: for "x FULL JOIN y ON true", there will be
! 				 * no join clauses at all; create a clauseless mergejoin path.
! 				 * Else create a mergejoin path by explicitly sorting both the
! 				 * outer and inner relations.
! 				 */
! 				if (!extra->mergeclause_list)
  				{
! 					/* Get initial estimates */
! 					initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 										   outer_path, inner_path, NIL, NIL,
! 										   extra->sjinfo);
! 					/* Generate a mergejoin path */
! 					result = (Path *) create_mergejoin_path(root,
! 															joinrel,
! 															jointype,
! 															&workspace,
! 															extra->sjinfo,
! 															outer_path,
! 															inner_path,
! 															extra->restrictlist,
! 															NIL, NULL,
! 															NIL, NIL, NIL);
  				}
! 				else
  				{
! 					List	   *outerkeys = extra->outersortkeys;
! 					List	   *innerkeys = extra->innersortkeys;
  
! 					/*
! 					 * It's possible that the cheapest-total paths will
! 					 * already be sorted properly; if so, suppress an explicit
! 					 * sort.
! 					 */
! 					if (outerkeys &&
! 						pathkeys_contained_in(outerkeys, outer_path->pathkeys))
! 						outerkeys = NIL;
! 					if (innerkeys &&
! 						pathkeys_contained_in(innerkeys, inner_path->pathkeys))
! 						innerkeys = NIL;
! 					/* Get initial estimates */
! 					initial_cost_mergejoin(root, &workspace, jointype,
! 										   extra->mergeclauses,
! 										   outer_path, inner_path,
! 										   outerkeys, innerkeys,
! 										   extra->sjinfo);
! 					/* Generate a mergejoin path */
! 					result = (Path *) create_mergejoin_path(root,
! 															joinrel,
! 															jointype,
! 															&workspace,
! 															extra->sjinfo,
! 															outer_path,
! 															inner_path,
! 															extra->restrictlist,
! 															NIL,
! 															NULL,
! 															extra->mergeclauses,
! 															outerkeys,
! 															innerkeys);
  				}
! 			}
! 			else
! 				result = NULL;
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
  					 JoinType jointype, JoinPathExtraData *extra);
--- 25,30 ----
***************
*** 85,97 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 82,98 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 106,112 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 107,113 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 169,175 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 170,176 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 180,186 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 181,187 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 197,203 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 198,204 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 631,636 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 632,644 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 773,778 **** sort_inner_and_outer(PlannerInfo *root,
--- 781,794 ----
  						   innerkeys,
  						   jointype,
  						   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1275,1280 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1291,1301 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1320,1325 **** hash_inner_and_outer(PlannerInfo *root,
--- 1341,1349 ----
  		hashclauses = lappend(hashclauses, restrictinfo);
  	}
  
+ 	/* Save hashclauses for possible use by the FDW */
+ 	extra->hashclauses = hashclauses;
+ 
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
  	{
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 235,240 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 235,242 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 904,909 **** typedef struct Path
--- 904,913 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2006,2022 **** typedef struct SemiAntiJoinFactors
--- 2010,2039 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#47

Jeff Janes

jeff.janes@gmail.com

almost 9 years ago

In reply to: Etsuro Fujita (#46)

Re: postgres_fdw bug in 9.6

On Fri, Jan 13, 2017 at 3:22 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

On 2017/01/13 0:43, Robert Haas wrote:

On Wed, Jan 11, 2017 at 2:45 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

As I said before, that might be fine for 9.6, but I don't think it's a
good
idea to search the pathlist because once we support parameterized foreign
join paths, which is on my TODOs, we would have to traverse through the
possibly-lengthy pathlist to find a local-join path, as mentioned in [3].

I'm not sure that's really going to be a problem. The number of

possible parameterizations that need to be considered isn't usually
very big. I bet the path list will have ten or a few tens of entries
in it, not a hundred or a thousand. Traversing it isn't that
expensive.

That having been said, I haven't read the patches, so I'm not really
up to speed on the bigger issues here. But surely it's more important
to get the overall design right than to worry about the cost of
walking the pathlist or worrying about the cost of an extra function
call (?!).

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in
which case, fdw_outerpath for a given foreign-join path would need to have
the same parameterization as the foreign-join path, but there might not be
any existing paths with the same parameterization in the path list. You
might think we could get the fdw_outerpath by getting an existing path with
no parameterization as in GetExistingLocalJoinPath and then modifying the
path's param_info to match the parameterization of the foreign-join path.
I don't know that really works, but that might be inefficient.

What I have in mind to support foreign-join paths with parameterization
for postgres_fdw like this: (1) generate parameterized paths from any
joinable combination of the outer/inner cheapest-parameterized paths that
have pushed down the outer/inner relation to the remote server, in a
similar way as postgresGetForeignJoinPaths creates unparameterized paths,
and (2) create fdw_outerpath for each parameterized path from the
outer/inner paths used to generate the parameterized path, by
create_nestloop_path (or, create_hashjoin_path or create_mergejoin_path if
full join), so that the resulting fdw_outerpath has the same
parameterization as the paramterized path. This would probably work and
might be more efficient. And the patch I proposed would be easily extended
to this, by replacing the outer/inner cheapest-total paths with the
outer/inner cheapest-parameterized paths. Attached is the latest version
of the patch.

I'm afraid this discussion and the C code here are over my head. But I've
tested this patch against 9.6.stable-2d443ae1b0121e15265864d2b21 and
10.dev-0563a3a8b59150bf3cc8, and in both cases it fixes the original
problem.

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'

Cheers,

Jeff

#48

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Jeff Janes (#47)

Re: postgres_fdw bug in 9.6

On 2017/01/14 6:39, Jeff Janes wrote:

I've tested this patch against
9.6.stable-2d443ae1b0121e15265864d2b21 and 10.dev-0563a3a8b59150bf3cc8,
and in both cases it fixes the original problem.

Thanks for testing!

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'

Will fix.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Etsuro Fujita (#48)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/01/16 11:38, Etsuro Fujita wrote:

On 2017/01/14 6:39, Jeff Janes wrote:

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'

Will fix.

Done. Attached is the new version. I also adjusted the code a bit further.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-4.patchtext/x-patch; name=epqpath-for-foreignjoin-4.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1519,1540 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1563,1584 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1557,1572 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1608,1629 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1596,1611 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1652,1673 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1634,1649 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1684,1689 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1660,1726 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4016,4033 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4053,4068 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4159,4176 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4194,4209 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4365,4374 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4365,4375 ----
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 445,450 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 445,455 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used as the alternative local
!      join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 693,805 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
   *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
! 
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
! 
! 		/* Skip parameterised paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
  				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
  				}
! 				break;
  
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 696,878 ----
  	return oid;
  }
  
+ 
  /*
!  * Make a local join path for a foreign join
   *
!  * Note: currently, this function only supports unparameterized foreign joins.
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
  
! 	/*
! 	 * Generate a nestloop path (or hashjoin/mergejoin path, if full join)
! 	 * from the cheapest-total paths for the outer and inner relations.
! 	 */
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				Relids		required_outer;
! 				JoinCostWorkspace workspace;
  
! 				/*
! 				 * If the cheapest-total outer path is parameterized by the
! 				 * inner rel, we can't generate a nestloop path.  (There's no
! 				 * use looking for alternative outer paths, since it should
! 				 * already be the least-parameterized available path.)
! 				 */
! 				if (PATH_PARAM_BY_REL(outer_path, innerrel))
! 					return NULL;
! 				/* If proposed path is still parameterized, don't return it. */
! 				required_outer = calc_nestloop_required_outer(outer_path,
! 															  inner_path);
! 				if (required_outer)
  				{
! 					bms_free(required_outer);
! 					return NULL;
  				}
! 				/* Get initial estimates */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL, NULL);
! 			}
! 			break;
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/*
! 				 * (1) if either cheapest-total path is parameterized by the
! 				 * other rel, we can't generate a hashjoin/mergejoin path, and
! 				 * (2) proposed hashjoin/mergejoin path is still parameterized
! 				 * (ie, the required_outer set calculated by
! 				 * calc_non_nestloop_required_outer isn't NULL), it's not what
! 				 * we want; which means that both the cheapest-total paths
! 				 * should be unparameterized.
! 				 */
! 				if (outer_path->param_info || inner_path->param_info)
! 					return NULL;
! 				/* Create a hashjoin/mergejoin path, if possible. */
! 				if (extra->hashclauses)
  				{
! 					/* Get initial estimates */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclauses,
! 										  outer_path, inner_path,
! 										  extra->sjinfo, &extra->semifactors);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra->sjinfo,
! 														   &extra->semifactors,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   NULL,
! 														   extra->hashclauses);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true", there
! 					 * will be no join clauses at all; create a clauseless
! 					 * mergejoin path.  Else create a mergejoin path by
! 					 * explicitly sorting both the outer and inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get initial estimates */
! 						initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 											   outer_path, inner_path,
! 											   NIL, NIL, extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL, NULL,
! 																NIL, NIL, NIL);
! 					}
! 					else
! 					{
! 						List	   *outerkeys = extra->outersortkeys;
! 						List	   *innerkeys = extra->innersortkeys;
  
! 						/*
! 						 * It's possible that the cheapest-total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
! 						/* Get initial estimates */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   extra->mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys,
! 											   extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL, NULL,
! 																extra->mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
  					 JoinType jointype, JoinPathExtraData *extra);
--- 25,30 ----
***************
*** 85,97 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 82,98 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 106,112 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 107,113 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 169,175 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 170,176 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 180,186 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 181,187 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 197,203 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 198,204 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 631,636 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 632,644 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 773,778 **** sort_inner_and_outer(PlannerInfo *root,
--- 781,794 ----
  						   innerkeys,
  						   jointype,
  						   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1275,1280 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1291,1301 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1320,1325 **** hash_inner_and_outer(PlannerInfo *root,
--- 1341,1349 ----
  		hashclauses = lappend(hashclauses, restrictinfo);
  	}
  
+ 	/* Save hashclauses for possible use by the FDW */
+ 	extra->hashclauses = hashclauses;
+ 
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
  	{
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 235,240 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 235,242 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 904,909 **** typedef struct Path
--- 904,913 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2006,2022 **** typedef struct SemiAntiJoinFactors
--- 2010,2039 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#50

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Etsuro Fujita (#49)

Re: postgres_fdw bug in 9.6

On Wed, Jan 18, 2017 at 8:18 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/01/16 11:38, Etsuro Fujita wrote:

On 2017/01/14 6:39, Jeff Janes wrote:

I do get a compiler warning:

foreign.c: In function 'CreateLocalJoinPath':
foreign.c:832: warning: implicit declaration of function
'pathkeys_contained_in'

Will fix.

Done. Attached is the new version. I also adjusted the code a bit further.

With this patch there are multiple cases, where CreateLocalJoinPath()
would return NULL and hence postgres_fdw would not push a join down
for example
/*
* (1) if either cheapest-total path is parameterized by the
* other rel, we can't generate a hashjoin/mergejoin path, and
* (2) proposed hashjoin/mergejoin path is still parameterized
* (ie, the required_outer set calculated by
* calc_non_nestloop_required_outer isn't NULL), it's not what
* we want; which means that both the cheapest-total paths
* should be unparameterized.
*/
if (outer_path->param_info || inner_path->param_info)
return NULL;
or
/*
* If the cheapest-total outer path is parameterized by the
* inner rel, we can't generate a nestloop path. (There's no
* use looking for alternative outer paths, since it should
* already be the least-parameterized available path.)
*/
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;
/* If proposed path is still parameterized, don't return it. */
required_outer = calc_nestloop_required_outer(outer_path,
inner_path);
if (required_outer)
{
bms_free(required_outer);
return NULL;
}

Am I right?

The earlier version of this function GetExistingLocalJoinPath() used
to return a local path in those cases and hence postgres_fdw was able
to push down corresponding queries. So, we are introducing a
performance regression. We need to plug those holes.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Etsuro Fujita (#46)

Re: postgres_fdw bug in 9.6

On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in which
case, fdw_outerpath for a given foreign-join path would need to have the
same parameterization as the foreign-join path, but there might not be any
existing paths with the same parameterization in the path list.

I agree that this is a problem.

You might
think we could get the fdw_outerpath by getting an existing path with no
parameterization as in GetExistingLocalJoinPath and then modifying the
path's param_info to match the parameterization of the foreign-join path. I
don't know that really works, but that might be inefficient.

I am not sure about this.

What I have in mind to support foreign-join paths with parameterization for
postgres_fdw like this: (1) generate parameterized paths from any joinable
combination of the outer/inner cheapest-parameterized paths that have pushed
down the outer/inner relation to the remote server, in a similar way as
postgresGetForeignJoinPaths creates unparameterized paths, and (2) create
fdw_outerpath for each parameterized path from the outer/inner paths used to
generate the parameterized path, by create_nestloop_path (or,
create_hashjoin_path or create_mergejoin_path if full join), so that the
resulting fdw_outerpath has the same parameterization as the paramterized
path. This would probably work and might be more efficient. And the patch
I proposed would be easily extended to this, by replacing the outer/inner
cheapest-total paths with the outer/inner cheapest-parameterized paths.
Attached is the latest version of the patch.

Yes, I think that's broadly the approach Tom was recommending.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#51)

Re: postgres_fdw bug in 9.6

On Thu, Jan 19, 2017 at 2:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in which
case, fdw_outerpath for a given foreign-join path would need to have the
same parameterization as the foreign-join path, but there might not be any
existing paths with the same parameterization in the path list.

I agree that this is a problem.

Effectively, it means that foreign join path creation will have a
parameterization different (per add_path()) from any local join
produced. But why would it be so? The parameterization bubbles up from
the base relation. The process for creating parameterized local and
foreign paths for a base relation is same. Thus we will have same
parameterizations considered for foreign as well as local joins. Those
different parameterizations will be retained add_path(), so we should
find one there or should be able to expand an existing
parameterization by reparameterization.

Even if such a case exists, the same problem exists while searching
paths from the joining relations. If the join doesn't have a local
path with required parameterization, so would be the joining
relations, parameterized paths from which are used to build
parameterized paths for join.

You might
think we could get the fdw_outerpath by getting an existing path with no
parameterization as in GetExistingLocalJoinPath and then modifying the
path's param_info to match the parameterization of the foreign-join path. I
don't know that really works, but that might be inefficient.

I am not sure about this.

What I have in mind to support foreign-join paths with parameterization for
postgres_fdw like this: (1) generate parameterized paths from any joinable
combination of the outer/inner cheapest-parameterized paths that have pushed
down the outer/inner relation to the remote server, in a similar way as
postgresGetForeignJoinPaths creates unparameterized paths, and (2) create
fdw_outerpath for each parameterized path from the outer/inner paths used to
generate the parameterized path, by create_nestloop_path (or,
create_hashjoin_path or create_mergejoin_path if full join), so that the
resulting fdw_outerpath has the same parameterization as the paramterized
path. This would probably work and might be more efficient. And the patch
I proposed would be easily extended to this, by replacing the outer/inner
cheapest-total paths with the outer/inner cheapest-parameterized paths.
Attached is the latest version of the patch.

Yes, I think that's broadly the approach Tom was recommending.

I don't have problem with that approach, but the latest patch has
following problems.
1. We are copying chunks of path creation logic, instead of reusing
corresponding functions.
2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg302463.html -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company. This will
be performance regression compared to 9.6.

[1]: https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg302463.html -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Ashutosh Bapat (#52)

Re: postgres_fdw bug in 9.6

On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Yes, I think that's broadly the approach Tom was recommending.

I don't have problem with that approach, but the latest patch has
following problems.
1. We are copying chunks of path creation logic, instead of reusing
corresponding functions.

Exactly which functions do you think we ought to be reusing that the
patch currently doesn't reuse?

2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]. This will
be performance regression compared to 9.6.

Some, or many? How many? Which ones? At least some of the problems
you were complaining about look like basic validity checks that were
copied from joinpath.c, so they're probably necessary for correctness.
It's worth remembering that we're trying to fix a bug here; if that
makes some cases perform less well, that's sad, but not as sad as
throwing a bogus error, which is what's happening right now.

I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
{
! bms_free(required_outer);
! return NULL;
}

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ashutosh Bapat (#50)

Re: postgres_fdw bug in 9.6

On 2017/01/18 20:34, Ashutosh Bapat wrote:

On Wed, Jan 18, 2017 at 8:18 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Done. Attached is the new version. I also adjusted the code a bit further.

With this patch there are multiple cases, where CreateLocalJoinPath()
would return NULL and hence postgres_fdw would not push a join down
for example
/*
* (1) if either cheapest-total path is parameterized by the
* other rel, we can't generate a hashjoin/mergejoin path, and
* (2) proposed hashjoin/mergejoin path is still parameterized
* (ie, the required_outer set calculated by
* calc_non_nestloop_required_outer isn't NULL), it's not what
* we want; which means that both the cheapest-total paths
* should be unparameterized.
*/
if (outer_path->param_info || inner_path->param_info)
return NULL;
or
/*
* If the cheapest-total outer path is parameterized by the
* inner rel, we can't generate a nestloop path. (There's no
* use looking for alternative outer paths, since it should
* already be the least-parameterized available path.)
*/
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;
/* If proposed path is still parameterized, don't return it. */
required_outer = calc_nestloop_required_outer(outer_path,
inner_path);
if (required_outer)
{
bms_free(required_outer);
return NULL;
}

Am I right?

The earlier version of this function GetExistingLocalJoinPath() used
to return a local path in those cases and hence postgres_fdw was able
to push down corresponding queries. So, we are introducing a
performance regression.

Really? My understanding is: postgres_fdw will never have those cases
because it can always get the cheapest-total paths that are unparameterized.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ashutosh Bapat (#52)

Re: postgres_fdw bug in 9.6

On 2017/01/19 12:26, Ashutosh Bapat wrote:

On Thu, Jan 19, 2017 at 2:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in which
case, fdw_outerpath for a given foreign-join path would need to have the
same parameterization as the foreign-join path, but there might not be any
existing paths with the same parameterization in the path list.

I agree that this is a problem.

Effectively, it means that foreign join path creation will have a
parameterization different (per add_path()) from any local join
produced. But why would it be so?

I think it's better to give the FDW a chance to do that because the FDW
might have more knowledge about the parameterization for joinrels than core.

The parameterization bubbles up from
the base relation. The process for creating parameterized local and
foreign paths for a base relation is same. Thus we will have same
parameterizations considered for foreign as well as local joins. Those
different parameterizations will be retained add_path(), so we should
find one there

Is that right? I think there would be cases where we can't find one
because add_path removes paths dominated by others.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#53)

Re: postgres_fdw bug in 9.6

On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Yes, I think that's broadly the approach Tom was recommending.

I don't have problem with that approach, but the latest patch has
following problems.
1. We are copying chunks of path creation logic, instead of reusing
corresponding functions.

Exactly which functions do you think we ought to be reusing that the
patch currently doesn't reuse?

2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]. This will
be performance regression compared to 9.6.

Some, or many? How many? Which ones? At least some of the problems
you were complaining about look like basic validity checks that were
copied from joinpath.c, so they're probably necessary for correctness.
It's worth remembering that we're trying to fix a bug here; if that
makes some cases perform less well, that's sad, but not as sad as
throwing a bogus error, which is what's happening right now.

AFAIU, the problem esp. with parameterized paths is this: If rel1 is
required to be parameterized by rel2 (because of lateral references?),
a nested loop join with rel2 as outer relation and rel1 as inner
relation is possible. postgres_fdw and hence this function, however
doesn't consider all the possible join combinations and thus when this
function is presented with rel1 as outer and rel2 as inner would
refuse to create a path. More generally, while creating local paths,
we try many combinations of participating relations, some of which do
not produce local paths and some of them do (AFAIK, it happens in case
of lateral references, but there might be other cases). postgres_fdw,
however considers only a single combination, which may or may not have
produced local path. In such a case, postgres_fdw would create a
foreign join path but won't get a local path and thus bail out.
Obviously, we don't want CreateLocalJoinPath() to try out all the
combinations. That is the simplicity of copying an existing path; all
combinations have been tried already. If we really want a nested loop
join, we may try pulling inner and outer paths from an existing path
and build a nested loop join (that's just a suggestion).

Take example of
/*
* If the cheapest-total outer path is parameterized by the
* inner rel, we can't generate a nestloop path. (There's no
* use looking for alternative outer paths, since it should
* already be the least-parameterized available path.)
*/
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;

We may be able to create a nest loop path if we switch inner and outer
relations.

I have provided a patch in [1]/messages/by-id/CAFjFpRd9oZEkur9aOrMh2Toxq4NYuUSw2kB9S+O=uvZ4xcR=Sw@mail.gmail.com to fix the bogus error. But if we want
to replace one approach (when there's a way to fix bug in that
approach) with another (to fix that bug and improve), the other
approach should be equally efficient.

I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
{
! bms_free(required_outer);
! return NULL;
}

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.

I am not so much worried about this though :).
GetExistingLocalJoinPath() also does not handle that case. The new
function is not making it worse in this case.
731 /* Skip parameterised paths. */
732 if (path->param_info != NULL)
733 continue;

[1]: /messages/by-id/CAFjFpRd9oZEkur9aOrMh2Toxq4NYuUSw2kB9S+O=uvZ4xcR=Sw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ashutosh Bapat (#56)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/01/20 14:24, Ashutosh Bapat wrote:

On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Yes, I think that's broadly the approach Tom was recommending.

I don't have problem with that approach, but the latest patch has
following problems.

2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]. This will
be performance regression compared to 9.6.

Some, or many? How many?

AFAIU, the problem esp. with parameterized paths is this: If rel1 is
required to be parameterized by rel2 (because of lateral references?),
a nested loop join with rel2 as outer relation and rel1 as inner
relation is possible. postgres_fdw and hence this function, however
doesn't consider all the possible join combinations and thus when this
function is presented with rel1 as outer and rel2 as inner would
refuse to create a path. More generally, while creating local paths,
we try many combinations of participating relations, some of which do
not produce local paths and some of them do (AFAIK, it happens in case
of lateral references, but there might be other cases). postgres_fdw,
however considers only a single combination, which may or may not have
produced local path. In such a case, postgres_fdw would create a
foreign join path but won't get a local path and thus bail out.

I had second thoughts; one idea how to build parameterized paths to
avoid this issue is: as in postgresGetForeignPaths, to (1) identify
which outer relations could supply additional safe-to-send-to-remote
join clauses, and (2) build a parameterized path and its alternative
local-join path for each such outer relation. In #1, we could look at
the join relation's ppilist to identify interesting outer relations. In
#2, the local-join path corresponding to each such outer relation could
be built from the cheapest-total paths for the outer and inner
relations, using CreateLocalJoinPath, so that the result path has the
outer relation as its required_outer.

I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
{
! bms_free(required_outer);
! return NULL;
}

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.

I am not so much worried about this though :).
GetExistingLocalJoinPath() also does not handle that case. The new
function is not making it worse in this case.
731 /* Skip parameterised paths. */
732 if (path->param_info != NULL)
733 continue;

One idea to remove such extra checks is to pass the required_outer to
CreateLocalJoinPath like the attached. As described above, the caller
would have that set before calling that function, so we wouldn't need to
calculate that set in that function.

Other changes:

* Also modified CreateLocalJoinPath so that we pass outer/inner paths,
not outer/inner rels, because it would be more flexible for the FDW to
build the local-join path from paths it chose.
* Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-5.patchtext/x-patch; name=epqpath-for-foreignjoin-5.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1519,1540 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1519,1534 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1563,1584 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1557,1572 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1608,1629 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1596,1611 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1652,1673 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1634,1649 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1684,1689 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1660,1726 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4016,4033 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4053,4068 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4159,4176 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4194,4209 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4353,4374 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * reconstruct the row for EvalPlanQual(). Find an alternative local path
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4353,4380 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local-join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		Path	   *outer_path = outerrel->cheapest_total_path;
! 		Path	   *inner_path = innerrel->cheapest_total_path;
! 
! 		/* Should be unparameterized */
! 		Assert(outer_path->param_info == NULL);
! 		Assert(inner_path->param_info == NULL);
! 
! 		/* Create an unparameterized local-join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outer_path, inner_path,
! 									   NULL, jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
***************
*** 4377,4383 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4383,4389 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 445,450 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 445,455 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used as the alternative local
!      join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 693,805 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
! 
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterised paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
  
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
  
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 696,855 ----
  	return oid;
  }
  
+ 
  /*
!  * Build an alternative local-join path to reconstruct a join tuple for the
!  * foreign-join path in EvalPlanQual
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					Path *outer_path,
! 					Path *inner_path,
! 					Relids required_outer,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* outer_path should not require rels from inner_path */
! 				Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
  
! 				/* Get initial estimates */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   required_outer);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Neither path should require rels from the other path */
! 				Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
! 					   !PATH_PARAM_BY_REL(inner_path, outer_path->parent));
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclauses)
  				{
! 					/* Get initial estimates */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclauses,
! 										  outer_path, inner_path,
! 										  extra->sjinfo, &extra->semifactors);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra->sjinfo,
! 														   &extra->semifactors,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   required_outer,
! 														   extra->hashclauses);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true", there
! 					 * will be no join clauses at all; create a clauseless
! 					 * mergejoin path.  Else create a mergejoin path by
! 					 * explicitly sorting both the outer and inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get initial estimates */
! 						initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 											   outer_path, inner_path,
! 											   NIL, NIL, extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																NIL, NIL, NIL);
! 					}
! 					else
! 					{
! 						List	   *outerkeys = extra->outersortkeys;
! 						List	   *innerkeys = extra->innersortkeys;
  
! 						/*
! 						 * It's possible that the cheapest-total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
! 						/* Get initial estimates */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   extra->mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys,
! 											   extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																extra->mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
  					 JoinType jointype, JoinPathExtraData *extra);
--- 25,30 ----
***************
*** 85,97 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 82,98 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 106,112 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 107,113 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 169,175 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 170,176 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 180,186 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 181,187 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 197,203 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 198,204 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 631,636 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 632,644 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 773,778 **** sort_inner_and_outer(PlannerInfo *root,
--- 781,794 ----
  						   innerkeys,
  						   jointype,
  						   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1275,1280 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1291,1301 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1320,1325 **** hash_inner_and_outer(PlannerInfo *root,
--- 1341,1349 ----
  		hashclauses = lappend(hashclauses, restrictinfo);
  	}
  
+ 	/* Save hashclauses for possible use by the FDW */
+ 	extra->hashclauses = hashclauses;
+ 
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
  	{
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 235,240 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 235,242 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					Path *outer_path, Path *inner_path, Relids required_outer,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 904,909 **** typedef struct Path
--- 904,913 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2006,2022 **** typedef struct SemiAntiJoinFactors
--- 2010,2039 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#58

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Etsuro Fujita (#57)

Re: postgres_fdw bug in 9.6

On Mon, Jan 23, 2017 at 6:56 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Other changes:

* Also modified CreateLocalJoinPath so that we pass outer/inner paths, not
outer/inner rels, because it would be more flexible for the FDW to build the
local-join path from paths it chose.
* Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

The patch is moved to CF 2017-03.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

David Steele

david@pgmasters.net

almost 9 years ago

In reply to: Etsuro Fujita (#57)

Re: postgres_fdw bug in 9.6

On 1/23/17 4:56 AM, Etsuro Fujita wrote:

On 2017/01/20 14:24, Ashutosh Bapat wrote:

On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Yes, I think that's broadly the approach Tom was recommending.

I don't have problem with that approach, but the latest patch has
following problems.

2. There are many cases where the new function would return no local
path and hence postgres_fdw doesn't push down the join [1]. This will
be performance regression compared to 9.6.

Some, or many? How many?

AFAIU, the problem esp. with parameterized paths is this: If rel1 is
required to be parameterized by rel2 (because of lateral references?),
a nested loop join with rel2 as outer relation and rel1 as inner
relation is possible. postgres_fdw and hence this function, however
doesn't consider all the possible join combinations and thus when this
function is presented with rel1 as outer and rel2 as inner would
refuse to create a path. More generally, while creating local paths,
we try many combinations of participating relations, some of which do
not produce local paths and some of them do (AFAIK, it happens in case
of lateral references, but there might be other cases). postgres_fdw,
however considers only a single combination, which may or may not have
produced local path. In such a case, postgres_fdw would create a
foreign join path but won't get a local path and thus bail out.

I had second thoughts; one idea how to build parameterized paths to
avoid this issue is: as in postgresGetForeignPaths, to (1) identify
which outer relations could supply additional safe-to-send-to-remote
join clauses, and (2) build a parameterized path and its alternative
local-join path for each such outer relation. In #1, we could look at
the join relation's ppilist to identify interesting outer relations. In
#2, the local-join path corresponding to each such outer relation could
be built from the cheapest-total paths for the outer and inner
relations, using CreateLocalJoinPath, so that the result path has the
outer relation as its required_outer.

I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
{
! bms_free(required_outer);
! return NULL;
}

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.

I am not so much worried about this though :).
GetExistingLocalJoinPath() also does not handle that case. The new
function is not making it worse in this case.
731 /* Skip parameterised paths. */
732 if (path->param_info != NULL)
733 continue;

One idea to remove such extra checks is to pass the required_outer to
CreateLocalJoinPath like the attached. As described above, the caller
would have that set before calling that function, so we wouldn't need to
calculate that set in that function.

Other changes:

* Also modified CreateLocalJoinPath so that we pass outer/inner paths,
not outer/inner rels, because it would be more flexible for the FDW to
build the local-join path from paths it chose.
* Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

This patch does not apply cleanly at cccbdde:

$ patch -p1 < ../other/epqpath-for-foreignjoin-5.patch
patching file contrib/postgres_fdw/expected/postgres_fdw.out
Hunk #6 succeeded at 4134 (offset 81 lines).
Hunk #7 succeeded at 4275 (offset 81 lines).
patching file contrib/postgres_fdw/postgres_fdw.c
Hunk #1 succeeded at 4356 (offset 3 lines).
Hunk #2 succeeded at 4386 (offset 3 lines).
patching file contrib/postgres_fdw/sql/postgres_fdw.sql
patching file doc/src/sgml/fdwhandler.sgml
patching file src/backend/foreign/foreign.c
Hunk #2 FAILED at 696.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/foreign/foreign.c.rej
patching file src/backend/optimizer/path/joinpath.c
Hunk #1 FAILED at 25.
Hunk #2 succeeded at 109 (offset 27 lines).
Hunk #3 succeeded at 134 (offset 27 lines).
Hunk #4 succeeded at 197 (offset 27 lines).
Hunk #5 succeeded at 208 (offset 27 lines).
Hunk #6 succeeded at 225 (offset 27 lines).
Hunk #7 succeeded at 745 (offset 113 lines).
Hunk #8 FAILED at 894.
Hunk #9 succeeded at 1558 (offset 267 lines).
Hunk #10 succeeded at 1609 (offset 268 lines).
2 out of 10 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/joinpath.c.rej
patching file src/include/foreign/fdwapi.h
Hunk #1 succeeded at 237 (offset 2 lines).
patching file src/include/nodes/relation.h
Hunk #1 succeeded at 914 (offset 10 lines).
Hunk #2 succeeded at 2057 (offset 47 lines).

Marked "Waiting on Author".

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: David Steele (#59)

Re: postgres_fdw bug in 9.6

On 2017/03/17 0:37, David Steele wrote:

This patch does not apply cleanly at cccbdde:

Marked "Waiting on Author".

Ok, I'll update the patch. One thing I'd like to revise in addition to
that is (1) add to JoinPathExtraData a flag member to indicate whether
to give the FDW a chance to consider a remote join, which will be set to
true if the joinrel's fdwroutine is not NULL and the fdwroutine's
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
to create an alternative local join path, such as hashclauses and
mergeclauses proposed in the patch, into JoinPathExtraData in
add_paths_to_joinrel. This would avoid useless overhead in saving such
info into JoinPathExtraData when we don't give the FDW that chance.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Etsuro Fujita (#60)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/03/21 18:40, Etsuro Fujita wrote:

Ok, I'll update the patch. One thing I'd like to revise in addition to
that is (1) add to JoinPathExtraData a flag member to indicate whether
to give the FDW a chance to consider a remote join, which will be set to
true if the joinrel's fdwroutine is not NULL and the fdwroutine's
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
to create an alternative local join path, such as hashclauses and
mergeclauses proposed in the patch, into JoinPathExtraData in
add_paths_to_joinrel. This would avoid useless overhead in saving such
info into JoinPathExtraData when we don't give the FDW that chance.

Done. Attached is a new version of the patch.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-6.patchbinary/octet-stream; name=epqpath-for-foreignjoin-6.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1629,1650 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1629,1644 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1673,1694 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1667,1682 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1718,1739 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1706,1721 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1762,1783 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1744,1759 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1794,1799 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1770,1836 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4225,4242 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4262,4277 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4368,4385 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4403,4418 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4405,4426 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * reconstruct the row for EvalPlanQual(). Find an alternative local path
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4405,4432 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local-join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		Path	   *outer_path = outerrel->cheapest_total_path;
! 		Path	   *inner_path = innerrel->cheapest_total_path;
! 
! 		/* Should be unparameterized */
! 		Assert(outer_path->param_info == NULL);
! 		Assert(inner_path->param_info == NULL);
! 
! 		/* Create an unparameterized local-join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outer_path, inner_path,
! 									   NULL, jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
***************
*** 4429,4435 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4435,4441 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 462,467 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 462,472 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used as the alternative local
!      join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
  
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
  
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,854 ----
  	return oid;
  }
  
+ 
  /*
!  * Build an alternative local-join path to reconstruct a join tuple for the
!  * foreign-join in EvalPlanQual
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					Path *outer_path,
! 					Path *inner_path,
! 					Relids required_outer,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
  
! 	/* Should be called only if consider_foreignjoin */
! 	Assert(extra->consider_foreignjoin);
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* outer_path should not require rels from inner_path */
! 				Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   required_outer);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Neither path should require rels from the other path */
! 				Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
! 					   !PATH_PARAM_BY_REL(inner_path, outer_path->parent));
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclauses)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclauses,
! 										  outer_path, inner_path,
! 										  extra->sjinfo, &extra->semifactors);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra->sjinfo,
! 														   &extra->semifactors,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   required_outer,
! 														   extra->hashclauses);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true", there
! 					 * will be no join clauses at all; create a clauseless
! 					 * mergejoin path.  Else create a mergejoin path by
! 					 * explicitly sorting both the outer and inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 											   outer_path, inner_path,
! 											   NIL, NIL, extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																NIL, NIL, NIL);
! 					}
! 					else
! 					{
! 						List	   *outerkeys = extra->outersortkeys;
! 						List	   *innerkeys = extra->innersortkeys;
  
! 						/*
! 						 * It's possible that the cheapest-total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   extra->mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys,
! 											   extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																extra->mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
--- 25,30 ----
***************
*** 112,124 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 109,126 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.consider_foreignjoin = false;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 133,139 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 135,141 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 193,202 **** add_paths_to_joinrel(PlannerInfo *root,
  											  joinrel->lateral_relids);
  
  	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 195,213 ----
  											  joinrel->lateral_relids);
  
  	/*
+ 	 * Decide whether to give the FDW a chance to push down the join to the
+ 	 * foreign server; if inner and outer relations are foreign tables (or
+ 	 * joins) belonging to the same server and assigned to the same user to
+ 	 * check access permissions as, we give the FDW that chance.
+ 	 */
+ 	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+ 		extra.consider_foreignjoin = true;
+ 
+ 	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 207,213 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 218,224 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 224,230 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 235,241 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 239,250 **** add_paths_to_joinrel(PlannerInfo *root,
  							 jointype, &extra);
  
  	/*
! 	 * 5. If inner and outer relations are foreign tables (or joins) belonging
! 	 * to the same server and assigned to the same user to check access
! 	 * permissions as, give the FDW a chance to push down joins.
  	 */
! 	if (joinrel->fdwroutine &&
! 		joinrel->fdwroutine->GetForeignJoinPaths)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
--- 250,258 ----
  							 jointype, &extra);
  
  	/*
! 	 * 5. Give the FDW a chance to push down the join to the foreign server.
  	 */
! 	if (extra.consider_foreignjoin)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
***************
*** 744,749 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 752,764 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 930,935 **** sort_inner_and_outer(PlannerInfo *root,
--- 945,958 ----
  									   innerkeys,
  									   jointype,
  									   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1542,1547 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1565,1575 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1750,1755 **** hash_inner_and_outer(PlannerInfo *root,
--- 1778,1787 ----
  										  hashclauses, jointype, extra);
  		}
  	}
+ 
+ 	/* Save hashclauses for possible use by the FDW */
+ 	if (extra->consider_foreignjoin && hashclauses)
+ 		extra->hashclauses = hashclauses;
  }
  
  /*
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 237,242 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 237,244 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					Path *outer_path, Path *inner_path, Relids required_outer,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 914,919 **** typedef struct Path
--- 914,923 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2053,2069 **** typedef struct SemiAntiJoinFactors
--- 2057,2089 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * consider_foreignjoin is a flag to indicate whether to give the FDW
+  *		a chance to push down the join to the foreign server
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	bool		consider_foreignjoin;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#62

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Etsuro Fujita (#61)

Re: postgres_fdw bug in 9.6

The patch applies cleanly, compiles. make check in regress as well as
postgres_fdw works fine. Here are few comments

local-join should be local join.

The comments should explain why.
+        /* Should be unparameterized */
+        Assert(outer_path->param_info == NULL);
+        Assert(inner_path->param_info == NULL);

+ a suitable local join path, which can be used as the alternative local
May be we should reword this as ... which can be used to create an alternative
local ... This rewording is required even in the existing docs.

+                /* outer_path should not require rels from inner_path */
+                Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
Probably this should throw an error or return NULL in such case rather than
Asserting. This function is callable from any FDW, and that FDW may provide
such paths, may be because of an internal bug. Same case with
+                /* Neither path should require rels from the other path */
+                Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
+                       !PATH_PARAM_BY_REL(inner_path, outer_path->parent));

While the comment below mentions ON true, the testcase you have added is for ON
false. Either the testcase should change or this comment. That raises another
question, what happens when somebody does FULL JOIN ON false?
+ * If special case: for "x FULL JOIN y ON true", there

Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be able
to create a nested loop join for JOIN_RIGHT?
+        case JOIN_RIGHT:
+        case JOIN_FULL:

On Thu, Mar 23, 2017 at 5:50 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/03/21 18:40, Etsuro Fujita wrote:

Ok, I'll update the patch. One thing I'd like to revise in addition to
that is (1) add to JoinPathExtraData a flag member to indicate whether
to give the FDW a chance to consider a remote join, which will be set to
true if the joinrel's fdwroutine is not NULL and the fdwroutine's
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
to create an alternative local join path, such as hashclauses and
mergeclauses proposed in the patch, into JoinPathExtraData in
add_paths_to_joinrel. This would avoid useless overhead in saving such
info into JoinPathExtraData when we don't give the FDW that chance.

Done. Attached is a new version of the patch.

Best regards,
Etsuro Fujita

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ashutosh Bapat (#62)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/03/30 20:16, Ashutosh Bapat wrote:

The patch applies cleanly, compiles. make check in regress as well as
postgres_fdw works fine. Here are few comments

Thanks for the review!

local-join should be local join.

OK, done.

The comments should explain why.
+        /* Should be unparameterized */
+        Assert(outer_path->param_info == NULL);
+        Assert(inner_path->param_info == NULL);

Done.

+ a suitable local join path, which can be used as the alternative local
May be we should reword this as ... which can be used to create an alternative
local ... This rewording is required even in the existing docs.

Done.

+                /* outer_path should not require rels from inner_path */
+                Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent));
Probably this should throw an error or return NULL in such case rather than
Asserting. This function is callable from any FDW, and that FDW may provide
such paths, may be because of an internal bug. Same case with
+                /* Neither path should require rels from the other path */
+                Assert(!PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
+                       !PATH_PARAM_BY_REL(inner_path, outer_path->parent));

Good idea! I think it's better to throw an error because that is a bug
in the FDW; done that way.

While the comment below mentions ON true, the testcase you have added is for ON
false. Either the testcase should change or this comment. That raises another
question, what happens when somebody does FULL JOIN ON false?
+ * If special case: for "x FULL JOIN y ON true", there

FULL JOIN ON FALSE would be handled the same way as FULL JOIN ON TRUE,
so I think we should rewrite that comment into something like this: If
special case: for "x FULL JOIN y ON true" or "x FULL JOIN y ON false"...

Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be able
to create a nested loop join for JOIN_RIGHT?
+        case JOIN_RIGHT:
+        case JOIN_FULL:

I don't think so, because nestloop joins aren't supported for
JOIN_RIGHT. See ExecInitNestLoop().

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-7.patchtext/x-patch; name=epqpath-for-foreignjoin-7.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1629,1650 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1629,1644 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1673,1694 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1667,1682 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1718,1739 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1706,1721 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1762,1783 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1744,1759 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1794,1799 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1770,1836 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4225,4242 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4262,4277 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4368,4385 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4403,4418 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4405,4426 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * reconstruct the row for EvalPlanQual(). Find an alternative local path
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4405,4441 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		Path	   *outer_path = outerrel->cheapest_total_path;
! 		Path	   *inner_path = innerrel->cheapest_total_path;
! 
! 		/*
! 		 * The cheapest total paths should be unparameterized, because (1) if
! 		 * the input relation is a base relation, we should have created an
! 		 * unparameterized foreign scan path for the base relation and (2) if
! 		 * the input relation is a join relation, we should have created at
! 		 * least one unparameterized local join path (or foreign join path)
! 		 * for the join relation, by using the unparameterized foreign scan
! 		 * paths built for base relations involved in the join relation (else
! 		 * we shouldn't get here).
! 		 */
! 		Assert(outer_path->param_info == NULL);
! 		Assert(inner_path->param_info == NULL);
! 
! 		/* Create an unparameterized local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outer_path, inner_path,
! 									   NULL, jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not created");
  			return;
  		}
  	}
***************
*** 4429,4435 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4444,4450 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 462,467 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 462,472 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used to create an alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
  
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
  
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL)
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,857 ----
  	return oid;
  }
  
+ 
  /*
!  * Build an alternative local join path to reconstruct a join tuple for the
!  * foreign-join in EvalPlanQual
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					Path *outer_path,
! 					Path *inner_path,
! 					Relids required_outer,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
  
! 	/* Should be called only if consider_foreignjoin */
! 	Assert(extra->consider_foreignjoin);
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* outer_path should not require rels from inner_path */
! 				if (PATH_PARAM_BY_REL(outer_path, inner_path->parent))
! 					elog(ERROR, "outer_path should not require relations from inner_path");
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   required_outer);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Neither path should require rels from the other path */
! 				if (PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
! 					PATH_PARAM_BY_REL(inner_path, outer_path->parent))
! 					elog(ERROR, "neither path should require relations from the other path");
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclauses)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclauses,
! 										  outer_path, inner_path,
! 										  extra->sjinfo, &extra->semifactors);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra->sjinfo,
! 														   &extra->semifactors,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   required_outer,
! 														   extra->hashclauses);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true" or "x FULL
! 					 * JOIN y ON false", there will be no join clauses at all;
! 					 * create a clauseless mergejoin path.  Else create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 											   outer_path, inner_path,
! 											   NIL, NIL, extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																NIL, NIL, NIL);
! 					}
! 					else
! 					{
! 						List	   *outerkeys = extra->outersortkeys;
! 						List	   *innerkeys = extra->innersortkeys;
  
! 						/*
! 						 * It's possible that the cheapest-total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   extra->mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys,
! 											   extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																extra->mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
--- 25,30 ----
***************
*** 112,124 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 109,126 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.consider_foreignjoin = false;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 133,139 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 135,141 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 193,202 **** add_paths_to_joinrel(PlannerInfo *root,
  											  joinrel->lateral_relids);
  
  	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 195,213 ----
  											  joinrel->lateral_relids);
  
  	/*
+ 	 * Decide whether to give the FDW a chance to push down the join to the
+ 	 * foreign server; if inner and outer relations are foreign tables (or
+ 	 * joins) belonging to the same server and assigned to the same user to
+ 	 * check access permissions as, we give the FDW that chance.
+ 	 */
+ 	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+ 		extra.consider_foreignjoin = true;
+ 
+ 	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 207,213 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 218,224 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 224,230 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 235,241 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 239,250 **** add_paths_to_joinrel(PlannerInfo *root,
  							 jointype, &extra);
  
  	/*
! 	 * 5. If inner and outer relations are foreign tables (or joins) belonging
! 	 * to the same server and assigned to the same user to check access
! 	 * permissions as, give the FDW a chance to push down joins.
  	 */
! 	if (joinrel->fdwroutine &&
! 		joinrel->fdwroutine->GetForeignJoinPaths)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
--- 250,258 ----
  							 jointype, &extra);
  
  	/*
! 	 * 5. Give the FDW a chance to push down the join to the foreign server.
  	 */
! 	if (extra.consider_foreignjoin)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
***************
*** 744,749 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 752,764 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 930,935 **** sort_inner_and_outer(PlannerInfo *root,
--- 945,958 ----
  									   innerkeys,
  									   jointype,
  									   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1542,1547 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1565,1575 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1750,1755 **** hash_inner_and_outer(PlannerInfo *root,
--- 1778,1787 ----
  										  hashclauses, jointype, extra);
  		}
  	}
+ 
+ 	/* Save hashclauses for possible use by the FDW */
+ 	if (extra->consider_foreignjoin && hashclauses)
+ 		extra->hashclauses = hashclauses;
  }
  
  /*
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 237,242 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 237,244 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					Path *outer_path, Path *inner_path, Relids required_outer,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 914,919 **** typedef struct Path
--- 914,923 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2053,2069 **** typedef struct SemiAntiJoinFactors
--- 2057,2089 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * consider_foreignjoin is a flag to indicate whether to give the FDW
+  *		a chance to push down the join to the foreign server
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	bool		consider_foreignjoin;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#64

Jeff Janes

jeff.janes@gmail.com

almost 9 years ago

In reply to: Etsuro Fujita (#61)

Re: postgres_fdw bug in 9.6

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

On 2017/03/21 18:40, Etsuro Fujita wrote:

Ok, I'll update the patch. One thing I'd like to revise in addition to
that is (1) add to JoinPathExtraData a flag member to indicate whether
to give the FDW a chance to consider a remote join, which will be set to
true if the joinrel's fdwroutine is not NULL and the fdwroutine's
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
to create an alternative local join path, such as hashclauses and
mergeclauses proposed in the patch, into JoinPathExtraData in
add_paths_to_joinrel. This would avoid useless overhead in saving such
info into JoinPathExtraData when we don't give the FDW that chance.

Done. Attached is a new version of the patch.

Is the fix for 9.6.3 going to be just a back port of this, or will it look
different?

Cheers,

Jeff

#65

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Jeff Janes (#64)

Re: postgres_fdw bug in 9.6

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
Done. Attached is a new version of the patch.

Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?

+1 for backporting; although that requires that GetForeignJoinPaths be
updated so that the FDW uses a new function to create an alternative
local join path (ie, CreateLocalJoinPath), that would make maintenance
of the code easy.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Etsuro Fujita (#63)

Re: postgres_fdw bug in 9.6

Probably we should use "could not be created" instead of "was not created"
in "... a local path suitable for EPQ checks was not created".

"outer_path should not require relations from inner_path" may be reworded
as "outer paths should not be parameterized by the inner relations".

"neither path should require relations from the other path" may be reworded
as "neither path should be parameterized by the the other joining relation".

While the comment below mentions ON true, the testcase you have added is

for ON
false. Either the testcase should change or this comment. That raises
another
question, what happens when somebody does FULL JOIN ON false?
+ * If special case: for "x FULL JOIN y ON true",
there

FULL JOIN ON FALSE would be handled the same way as FULL JOIN ON TRUE, so
I think we should rewrite that comment into something like this: If special
case: for "x FULL JOIN y ON true" or "x FULL JOIN y ON false"...

Ok.

Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be
able
to create a nested loop join for JOIN_RIGHT?
+        case JOIN_RIGHT:
+        case JOIN_FULL:
I don't think so, because nestloop joins aren't supported for JOIN_RIGHT.
See ExecInitNestLoop().

Hmm, I see in match_unsorted_outer()
1254 case JOIN_RIGHT:
1255 case JOIN_FULL:
1256 nestjoinOK = false;
1257 useallclauses = true;
1258 break;

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#67

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ashutosh Bapat (#66)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/04/04 14:38, Ashutosh Bapat wrote:

Probably we should use "could not be created" instead of "was not
created" in "... a local path suitable for EPQ checks was not created".

Done.

"outer_path should not require relations from inner_path" may be
reworded as "outer paths should not be parameterized by the inner
relations".

"neither path should require relations from the other path" may be
reworded as "neither path should be parameterized by the the other
joining relation".

Done. I used "input" instead of "joining" in the latter, though.

Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We
should be able
to create a nested loop join for JOIN_RIGHT?
+ case JOIN_RIGHT:
+ case JOIN_FULL:

I don't think so, because nestloop joins aren't supported for
JOIN_RIGHT. See ExecInitNestLoop().

Hmm, I see in match_unsorted_outer()
1254 case JOIN_RIGHT:
1255 case JOIN_FULL:
1256 nestjoinOK = false;
1257 useallclauses = true;
1258 break;

Yeah, I should have pointed that out as well.

I rebased the patch also. Please find attached an updated version of
the patch.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-8.patchtext/x-patch; name=epqpath-for-foreignjoin-8.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1629,1650 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1629,1644 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1673,1694 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1667,1682 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1718,1739 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1706,1721 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1762,1783 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1744,1759 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1794,1799 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1770,1836 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4230,4247 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4267,4282 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4373,4390 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4408,4423 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4413,4434 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * reconstruct the row for EvalPlanQual(). Find an alternative local path
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4413,4449 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		Path	   *outer_path = outerrel->cheapest_total_path;
! 		Path	   *inner_path = innerrel->cheapest_total_path;
! 
! 		/*
! 		 * The cheapest total paths should be unparameterized, because (1) if
! 		 * the input relation is a base relation, we should have created an
! 		 * unparameterized foreign scan path for the base relation and (2) if
! 		 * the input relation is a join relation, we should have created at
! 		 * least one unparameterized local join path (where unparameterized
! 		 * foreign scan paths are chosen for base relations involved in the
! 		 * join relation) or foreign join path for the join relation (else we
! 		 * shouldn't get here).
! 		 */
! 		Assert(outer_path->param_info == NULL);
! 		Assert(inner_path->param_info == NULL);
! 
! 		/* Create an unparameterized local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outer_path, inner_path,
! 									   NULL, jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks could not be created");
  			return;
  		}
  	}
***************
*** 4437,4443 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4452,4458 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 462,467 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 462,472 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used to create an alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(IS_JOIN_REL(joinrel));
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
  
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
  
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,857 ----
  	return oid;
  }
  
+ 
  /*
!  * Build an alternative local join path to reconstruct a join tuple for the
!  * foreign-join in EvalPlanQual
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					Path *outer_path,
! 					Path *inner_path,
! 					Relids required_outer,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
  
! 	/* Should be called only if consider_foreignjoin */
! 	Assert(extra->consider_foreignjoin);
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* outer_path should not require rels from inner_path */
! 				if (PATH_PARAM_BY_REL(outer_path, inner_path->parent))
! 					elog(ERROR, "outer paths should not be parameterized by inner relations");
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path,
! 									  extra->sjinfo, &extra->semifactors);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra->sjinfo,
! 													   &extra->semifactors,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   required_outer);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Neither path should require rels from the other path */
! 				if (PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
! 					PATH_PARAM_BY_REL(inner_path, outer_path->parent))
! 					elog(ERROR, "neither path should not be parameterized by the other input relation");
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclauses)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclauses,
! 										  outer_path, inner_path,
! 										  extra->sjinfo, &extra->semifactors);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra->sjinfo,
! 														   &extra->semifactors,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   required_outer,
! 														   extra->hashclauses);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true" or "x FULL
! 					 * JOIN y ON false", there will be no join clauses at all;
! 					 * create a clauseless mergejoin path.  Else create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 											   outer_path, inner_path,
! 											   NIL, NIL, extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																NIL, NIL, NIL);
! 					}
! 					else
! 					{
! 						List	   *outerkeys = extra->outersortkeys;
! 						List	   *innerkeys = extra->innersortkeys;
  
! 						/*
! 						 * It's possible that the cheapest-total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   extra->mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys,
! 											   extra->sjinfo);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra->sjinfo,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																extra->mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 25,33 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
--- 25,30 ----
***************
*** 112,124 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
--- 109,126 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.consider_foreignjoin = false;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * Find potential mergejoin clauses.  We can skip this if we are not
***************
*** 133,139 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
--- 135,141 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI or ANTI join, compute correction factors for cost
***************
*** 193,202 **** add_paths_to_joinrel(PlannerInfo *root,
  											  joinrel->lateral_relids);
  
  	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 195,213 ----
  											  joinrel->lateral_relids);
  
  	/*
+ 	 * Decide whether to give the FDW a chance to push down the join to the
+ 	 * foreign server; if inner and outer relations are foreign tables (or
+ 	 * joins) belonging to the same server and assigned to the same user to
+ 	 * check access permissions as, we give the FDW that chance.
+ 	 */
+ 	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+ 		extra.consider_foreignjoin = true;
+ 
+ 	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 207,213 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 218,224 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 224,230 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 235,241 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 239,250 **** add_paths_to_joinrel(PlannerInfo *root,
  							 jointype, &extra);
  
  	/*
! 	 * 5. If inner and outer relations are foreign tables (or joins) belonging
! 	 * to the same server and assigned to the same user to check access
! 	 * permissions as, give the FDW a chance to push down joins.
  	 */
! 	if (joinrel->fdwroutine &&
! 		joinrel->fdwroutine->GetForeignJoinPaths)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
--- 250,258 ----
  							 jointype, &extra);
  
  	/*
! 	 * 5. Give the FDW a chance to push down the join to the foreign server.
  	 */
! 	if (extra.consider_foreignjoin)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
***************
*** 744,749 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 752,764 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 930,935 **** sort_inner_and_outer(PlannerInfo *root,
--- 945,958 ----
  									   innerkeys,
  									   jointype,
  									   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1542,1547 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1565,1575 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1750,1755 **** hash_inner_and_outer(PlannerInfo *root,
--- 1778,1787 ----
  										  hashclauses, jointype, extra);
  		}
  	}
+ 
+ 	/* Save hashclauses for possible use by the FDW */
+ 	if (extra->consider_foreignjoin && hashclauses)
+ 		extra->hashclauses = hashclauses;
  }
  
  /*
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 237,242 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif   /* FDWAPI_H */
--- 237,244 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					Path *outer_path, Path *inner_path, Relids required_outer,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif   /* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 957,962 **** typedef struct Path
--- 957,966 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2142,2158 **** typedef struct SemiAntiJoinFactors
--- 2146,2178 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI or ANTI joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * consider_foreignjoin is a flag to indicate whether to give the FDW
+  *		a chance to push down the join to the foreign server
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	bool		consider_foreignjoin;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#68

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Etsuro Fujita (#67)

Re: postgres_fdw bug in 9.6

On Tue, Apr 4, 2017 at 2:31 PM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>
wrote:

I rebased the patch also. Please find attached an updated version of the
patch.

Thanks, marking this as "ready for committer".

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#69

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Ashutosh Bapat (#68)

Re: postgres_fdw bug in 9.6

On 2017/04/04 21:28, Ashutosh Bapat wrote:

On Tue, Apr 4, 2017 at 2:31 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
I rebased the patch also. Please find attached an updated version
of the patch.

Thanks, marking this as "ready for committer".

Thanks!

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Etsuro Fujita (#65)

Re: postgres_fdw bug in 9.6

On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
Done. Attached is a new version of the patch.
Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?

+1 for backporting; although that requires that GetForeignJoinPaths be
updated so that the FDW uses a new function to create an alternative local
join path (ie, CreateLocalJoinPath), that would make maintenance of the code
easy.

Well, the problem here is that this breaks ABI compatibility. If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.

Also, the CommitFest entry for this seems to be a bit sketchy. It
claims that Tom Lane is a co-author of this patch which, AFAICS, is
not the case. It is listed under Miscellaneous rather than "Bug
Fixes", which seems like a surprising decision. And it uses a subject
line which is neither very clear nor the same as the (also not
particularly helpful) subject line of the email thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Robert Haas (#70)

Re: postgres_fdw bug in 9.6

On Fri, Apr 7, 2017 at 2:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
Done. Attached is a new version of the patch.
Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?

+1 for backporting; although that requires that GetForeignJoinPaths be
updated so that the FDW uses a new function to create an alternative local
join path (ie, CreateLocalJoinPath), that would make maintenance of the code
easy.

Well, the problem here is that this breaks ABI compatibility. If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.

Also, the CommitFest entry for this seems to be a bit sketchy. It
claims that Tom Lane is a co-author of this patch which, AFAICS, is
not the case. It is listed under Miscellaneous rather than "Bug
Fixes", which seems like a surprising decision. And it uses a subject
line which is neither very clear nor the same as the (also not
particularly helpful) subject line of the email thread.

Looking at the code itself, I find the changes to joinpath.c rather alarming.

+    /* Save hashclauses for possible use by the FDW */
+    if (extra->consider_foreignjoin && hashclauses)
+        extra->hashclauses = hashclauses;

A minor consideration is that this is fairly far away from where
hashclauses actually gets populated, so if someone later added an
early "return" statement to this function -- after creating some paths
-- it could subtly break join pushdown. But I also think there's no
real need for this. The loop that extracts hash clauses is simple
enough that we could just refactor it into a separate function, or if
necessary duplicate the logic.

+        /* Save first mergejoin data for possible use by the FDW */
+        if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+        {
+            extra->mergeclauses = cur_mergeclauses;
+            extra->outersortkeys = outerkeys;
+            extra->innersortkeys = innerkeys;
+        }

Similarly here. select_outer_pathkeys_for_merge(),
find_mergeclauses_for_pathkeys(), and make_inner_pathkeys_for_merge()
are all extern, so there's nothing to keep CreateLocalJoinPath() from
just doing that work itself instead of getting help from joinpath,
which I guess seems better to me. I think it's just better if we
don't burden joinpath.c with keeping little bits of data around that
CreateLocalJoinPath() can easily get for itself.

There appears to be no regression test covering the case where we get
a Merge Full Join with a non-empty list of mergeclauses. Hash Full
Join is tested, as is Merge Full Join without merge clauses, but
there's no test for Merge Full Join with mergeclauses, and since that
is a separate code path it seems like it should be tested.

- /*
- * If either inner or outer path is a ForeignPath corresponding to a
- * pushed down join, replace it with the fdw_outerpath, so that we
- * maintain path for EPQ checks built entirely of local join
- * strategies.
- */
- if (IsA(joinpath->outerjoinpath, ForeignPath))
- {
- ForeignPath *foreign_path;
-
- foreign_path = (ForeignPath *) joinpath->outerjoinpath;
- if (IS_JOIN_REL(foreign_path->path.parent))
- joinpath->outerjoinpath = foreign_path->fdw_outerpath;
- }
-
- if (IsA(joinpath->innerjoinpath, ForeignPath))
- {
- ForeignPath *foreign_path;
-
- foreign_path = (ForeignPath *) joinpath->innerjoinpath;
- if (IS_JOIN_REL(foreign_path->path.parent))
- joinpath->innerjoinpath = foreign_path->fdw_outerpath;
- }

This logic is removed and not replaced with anything, but I don't see
what keeps this code...

+        Path       *outer_path = outerrel->cheapest_total_path;
+        Path       *inner_path = innerrel->cheapest_total_path;

...from picking a ForeignPath?

There's probably more to think about here, but those are my question
on an initial read-through.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72

David Steele

david@pgmasters.net

almost 9 years ago

In reply to: Robert Haas (#71)

Re: postgres_fdw bug in 9.6

On 4/7/17 3:24 PM, Robert Haas wrote:

There's probably more to think about here, but those are my question
on an initial read-through.

This bug has been moved to CF 2017-07.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 9 years ago

In reply to: Robert Haas (#70)

Re: postgres_fdw bug in 9.6

On Sat, Apr 8, 2017 at 12:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
Done. Attached is a new version of the patch.
Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?

+1 for backporting; although that requires that GetForeignJoinPaths be
updated so that the FDW uses a new function to create an alternative local
join path (ie, CreateLocalJoinPath), that would make maintenance of the code
easy.

Well, the problem here is that this breaks ABI compatibility. If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.

I had submitted a patch in [1]. We thought that that patch is good to
fix the issue on the backbranches. But it got berried in the thread.
If you think that's a feasible solution for backbranches, I will work
on the comments.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 9 years ago

In reply to: Robert Haas (#70)

Re: postgres_fdw bug in 9.6

On 2017/04/08 3:33, Robert Haas wrote:

On Mon, Apr 3, 2017 at 2:12 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/04/01 1:32, Jeff Janes wrote:

On Thu, Mar 23, 2017 at 5:20 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp <mailto:fujita.etsuro@lab.ntt.co.jp>> wrote:
Done. Attached is a new version of the patch.
Is the fix for 9.6.3 going to be just a back port of this, or will it
look different?

+1 for backporting; although that requires that GetForeignJoinPaths be
updated so that the FDW uses a new function to create an alternative local
join path (ie, CreateLocalJoinPath), that would make maintenance of the code
easy.

Well, the problem here is that this breaks ABI compatibility. If we
applied this to 9.6, and somebody tried to use a previously-compiled
FDW .so against a new server version, it would fail after the upgrade,
because the new server wouldn't have GetExistingLocalJoinPath and also
possibly because of the change to the structure of JoinPathExtraData.
Maybe there's no better alternative, and maybe nothing outside of
postgres_fdw is using this stuff anyway, but it seems like a concern.

Yeah, but I was thinking that it'd be a good idea to add a note about
that to the release notes, to avoid such troubles.

Also, the CommitFest entry for this seems to be a bit sketchy. It
claims that Tom Lane is a co-author of this patch which, AFAICS, is
not the case. It is listed under Miscellaneous rather than "Bug
Fixes", which seems like a surprising decision. And it uses a subject
line which is neither very clear nor the same as the (also not
particularly helpful) subject line of the email thread.

Thanks for correcting that!

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 8 years ago

In reply to: Etsuro Fujita (#67)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/04/04 18:01, Etsuro Fujita wrote:

I rebased the patch also. Please find attached an updated version of
the patch.

I rebased the patch to HEAD. PFA a new version of the patch.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-9.patchtext/plain; charset=UTF-8; name=epqpath-for-foreignjoin-9.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1701,1722 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1745,1766 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1739,1754 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1790,1811 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1778,1793 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1834,1855 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1816,1831 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1866,1871 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1842,1908 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4298,4315 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4335,4350 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4441,4458 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4476,4491 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4493,4514 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * reconstruct the row for EvalPlanQual(). Find an alternative local path
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4493,4529 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		Path	   *outer_path = outerrel->cheapest_total_path;
! 		Path	   *inner_path = innerrel->cheapest_total_path;
! 
! 		/*
! 		 * The cheapest total paths should be unparameterized, because (1) if
! 		 * the input relation is a base relation, we should have created an
! 		 * unparameterized foreign scan path for the base relation and (2) if
! 		 * the input relation is a join relation, we should have created at
! 		 * least one unparameterized local join path (where unparameterized
! 		 * foreign scan paths are chosen for base relations involved in the
! 		 * join relation) or foreign join path for the join relation (else we
! 		 * shouldn't get here).
! 		 */
! 		Assert(outer_path->param_info == NULL);
! 		Assert(inner_path->param_info == NULL);
! 
! 		/* Create an unparameterized local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outer_path, inner_path,
! 									   NULL, jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks could not be created");
  			return;
  		}
  	}
***************
*** 4517,4523 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4532,4538 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 493,498 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 493,503 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used to create an alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
!  *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(IS_JOIN_REL(joinrel));
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
! 				{
! 					HashPath   *hash_path = makeNode(HashPath);
  
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
! 				}
! 				break;
  
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
  				}
! 				break;
! 
! 			case T_MergeJoin:
  				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,852 ----
  	return oid;
  }
  
+ 
  /*
!  * Build an alternative local join path to reconstruct a join tuple for the
!  * foreign-join in EvalPlanQual
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					Path *outer_path,
! 					Path *inner_path,
! 					Relids required_outer,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
  
! 	/* Should be called only if consider_foreignjoin */
! 	Assert(extra->consider_foreignjoin);
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* outer_path should not require rels from inner_path */
! 				if (PATH_PARAM_BY_REL(outer_path, inner_path->parent))
! 					elog(ERROR, "outer paths should not be parameterized by inner relations");
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path, extra);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   required_outer);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Neither path should require rels from the other path */
! 				if (PATH_PARAM_BY_REL(outer_path, inner_path->parent) ||
! 					PATH_PARAM_BY_REL(inner_path, outer_path->parent))
! 					elog(ERROR, "neither path should not be parameterized by the other input relation");
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclauses)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclauses,
! 										  outer_path, inner_path, extra);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   required_outer,
! 														   extra->hashclauses);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true" or "x FULL
! 					 * JOIN y ON false", there will be no join clauses at all;
! 					 * create a clauseless mergejoin path.  Else create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype, NIL,
! 											   outer_path, inner_path,
! 											   NIL, NIL, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																NIL, NIL, NIL);
! 					}
! 					else
! 					{
! 						List	   *outerkeys = extra->outersortkeys;
! 						List	   *innerkeys = extra->innersortkeys;
  
! 						/*
! 						 * It's possible that the cheapest-total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   extra->mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																required_outer,
! 																extra->mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 26,34 ****
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
- #define PATH_PARAM_BY_REL(path, rel)  \
- 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
- 
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
--- 26,31 ----
***************
*** 113,125 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
  	/*
  	 * See if the inner relation is provably unique for this outer rel.
--- 110,127 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.consider_foreignjoin = false;
+ 	extra.hashclauses = NIL;
+ 	extra.mergeclauses = NIL;
+ 	extra.outersortkeys = NIL;
+ 	extra.innersortkeys = NIL;
  
  	/*
  	 * See if the inner relation is provably unique for this outer rel.
***************
*** 177,183 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
--- 179,185 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
***************
*** 237,246 **** add_paths_to_joinrel(PlannerInfo *root,
  											  joinrel->lateral_relids);
  
  	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 239,257 ----
  											  joinrel->lateral_relids);
  
  	/*
+ 	 * Decide whether to give the FDW a chance to push down the join to the
+ 	 * foreign server; if inner and outer relations are foreign tables (or
+ 	 * joins) belonging to the same server and assigned to the same user to
+ 	 * check access permissions as, we give the FDW that chance.
+ 	 */
+ 	if (joinrel->fdwroutine && joinrel->fdwroutine->GetForeignJoinPaths)
+ 		extra.consider_foreignjoin = true;
+ 
+ 	/*
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 251,257 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 262,268 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 268,274 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 279,285 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 283,294 **** add_paths_to_joinrel(PlannerInfo *root,
  							 jointype, &extra);
  
  	/*
! 	 * 5. If inner and outer relations are foreign tables (or joins) belonging
! 	 * to the same server and assigned to the same user to check access
! 	 * permissions as, give the FDW a chance to push down joins.
  	 */
! 	if (joinrel->fdwroutine &&
! 		joinrel->fdwroutine->GetForeignJoinPaths)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
--- 294,302 ----
  							 jointype, &extra);
  
  	/*
! 	 * 5. Give the FDW a chance to push down the join to the foreign server.
  	 */
! 	if (extra.consider_foreignjoin)
  		joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
  												 outerrel, innerrel,
  												 jointype, &extra);
***************
*** 780,785 **** clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
--- 788,800 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into fields of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	mergeclauses	RestrictInfos to use as merge clauses in a mergejoin
+  *	outersortkeys	Sort pathkeys for the outer relation of the mergejoin
+  *	innersortkeys	Sort pathkeys for the inner relation of the mergejoin
   */
  static void
  sort_inner_and_outer(PlannerInfo *root,
***************
*** 966,971 **** sort_inner_and_outer(PlannerInfo *root,
--- 981,994 ----
  									   innerkeys,
  									   jointype,
  									   extra);
+ 
+ 		/* Save first mergejoin data for possible use by the FDW */
+ 		if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+ 		{
+ 			extra->mergeclauses = cur_mergeclauses;
+ 			extra->outersortkeys = outerkeys;
+ 			extra->innersortkeys = innerkeys;
+ 		}
  	}
  }
  
***************
*** 1578,1583 **** consider_parallel_nestloop(PlannerInfo *root,
--- 1601,1611 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  *
+  * This function returns the following info into a field of the
+  * JoinPathExtraData struct for possible use by the FDW:
+  *
+  *	hashclauses		RestrictInfos to use as hash clauses in a hashjoin
   */
  static void
  hash_inner_and_outer(PlannerInfo *root,
***************
*** 1786,1791 **** hash_inner_and_outer(PlannerInfo *root,
--- 1814,1823 ----
  										  hashclauses, jointype, extra);
  		}
  	}
+ 
+ 	/* Save hashclauses for possible use by the FDW */
+ 	if (extra->consider_foreignjoin && hashclauses)
+ 		extra->hashclauses = hashclauses;
  }
  
  /*
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 237,242 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif							/* FDWAPI_H */
--- 237,244 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root, RelOptInfo *joinrel,
! 					Path *outer_path, Path *inner_path, Relids required_outer,
! 					JoinType jointype, JoinPathExtraData *extra);
  
  #endif							/* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 973,978 **** typedef struct Path
--- 973,982 ----
  #define PATH_REQ_OUTER(path)  \
  	((path)->param_info ? (path)->param_info->ppi_req_outer : (Relids) NULL)
  
+ /* Macro for determining whether path is parameterized by rel */
+ #define PATH_PARAM_BY_REL(path, rel)  \
+ 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
+ 
  /*----------
   * IndexPath represents an index scan over a single index.
   *
***************
*** 2169,2188 **** typedef struct SemiAntiJoinFactors
--- 2173,2208 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * inner_unique is true if each outer tuple provably matches no more
   *		than one inner tuple
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI/ANTI/inner_unique joins)
   * param_source_rels are OK targets for parameterization of result paths
+  *
+  * Remaining fields are set only for possible use by the FDW:
+  *
+  * consider_foreignjoin is a flag to indicate whether to give the FDW
+  *		a chance to push down the join to the foreign server
+  * hashclauses are the RestrictInfos to use as hash clauses in a hashjoin
+  * mergeclauses are the RestrictInfos to use as merge clauses in a mergejoin
+  * outersortkeys are the sort pathkeys for the outer side of the mergejoin
+  * innersortkeys are the sort pathkeys for the inner side of the mergejoin
   */
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
  	bool		inner_unique;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	bool		consider_foreignjoin;
+ 	List	   *hashclauses;
+ 	List	   *mergeclauses;
+ 	List	   *outersortkeys;
+ 	List	   *innersortkeys;
  } JoinPathExtraData;
  
  /*

#76

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Etsuro Fujita (#75)

Re: postgres_fdw bug in 9.6

On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2017/04/04 18:01, Etsuro Fujita wrote:

I rebased the patch also. Please find attached an updated version of the
patch.

I rebased the patch to HEAD. PFA a new version of the patch.

Tom, you were instrumental in identifying what was going wrong here
initially. Any chance you'd be willing to have a look at the patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#76)

Re: postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I rebased the patch to HEAD. PFA a new version of the patch.

Tom, you were instrumental in identifying what was going wrong here
initially. Any chance you'd be willing to have a look at the patch?

I will, but probably not for a week or so. Going eclipse-chasing.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Tom Lane (#77)

Re: postgres_fdw bug in 9.6

On Fri, Aug 18, 2017 at 3:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I rebased the patch to HEAD. PFA a new version of the patch.

Tom, you were instrumental in identifying what was going wrong here
initially. Any chance you'd be willing to have a look at the patch?

I will, but probably not for a week or so. Going eclipse-chasing.

Good luck:
https://xkcd.com/1876/
And enjoy:
https://xkcd.com/1877/
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 8 years ago

In reply to: Michael Paquier (#78)

Re: postgres_fdw bug in 9.6

On 2017/08/18 8:16, Michael Paquier wrote:

On Fri, Aug 18, 2017 at 3:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 17, 2017 at 8:00 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I rebased the patch to HEAD. PFA a new version of the patch.

Tom, you were instrumental in identifying what was going wrong here
initially. Any chance you'd be willing to have a look at the patch?

I will, but probably not for a week or so. Going eclipse-chasing.

Good luck:
https://xkcd.com/1876/
And enjoy:
https://xkcd.com/1877/

Enjoy!

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 8 years ago

In reply to: Robert Haas (#71)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/04/08 4:24, Robert Haas wrote:

Looking at the code itself, I find the changes to joinpath.c rather alarming.

I missed this mail. Sorry about that, Robert.

+    /* Save hashclauses for possible use by the FDW */
+    if (extra->consider_foreignjoin && hashclauses)
+        extra->hashclauses = hashclauses;
A minor consideration is that this is fairly far away from where
hashclauses actually gets populated, so if someone later added an
early "return" statement to this function -- after creating some paths
-- it could subtly break join pushdown. But I also think there's no
real need for this. The loop that extracts hash clauses is simple
enough that we could just refactor it into a separate function, or if
necessary duplicate the logic.

I refactored that into a new function so that we can call that function
at the top of add_paths_to_joinrel and store the result in
JoinPathExtraData.

+        /* Save first mergejoin data for possible use by the FDW */
+        if (extra->consider_foreignjoin && outerkeys == all_pathkeys)
+        {
+            extra->mergeclauses = cur_mergeclauses;
+            extra->outersortkeys = outerkeys;
+            extra->innersortkeys = innerkeys;
+        }
Similarly here. select_outer_pathkeys_for_merge(),
find_mergeclauses_for_pathkeys(), and make_inner_pathkeys_for_merge()
are all extern, so there's nothing to keep CreateLocalJoinPath() from
just doing that work itself instead of getting help from joinpath,
which I guess seems better to me. I think it's just better if we
don't burden joinpath.c with keeping little bits of data around that
CreateLocalJoinPath() can easily get for itself.

Done that way.

There appears to be no regression test covering the case where we get
a Merge Full Join with a non-empty list of mergeclauses. Hash Full
Join is tested, as is Merge Full Join without merge clauses, but
there's no test for Merge Full Join with mergeclauses, and since that
is a separate code path it seems like it should be tested.

Done.

- /*
- * If either inner or outer path is a ForeignPath corresponding to a
- * pushed down join, replace it with the fdw_outerpath, so that we
- * maintain path for EPQ checks built entirely of local join
- * strategies.
- */
- if (IsA(joinpath->outerjoinpath, ForeignPath))
- {
- ForeignPath *foreign_path;
-
- foreign_path = (ForeignPath *) joinpath->outerjoinpath;
- if (IS_JOIN_REL(foreign_path->path.parent))
- joinpath->outerjoinpath = foreign_path->fdw_outerpath;
- }
-
- if (IsA(joinpath->innerjoinpath, ForeignPath))
- {
- ForeignPath *foreign_path;
-
- foreign_path = (ForeignPath *) joinpath->innerjoinpath;
- if (IS_JOIN_REL(foreign_path->path.parent))
- joinpath->innerjoinpath = foreign_path->fdw_outerpath;
- }

This logic is removed and not replaced with anything, but I don't see
what keeps this code...
+        Path       *outer_path = outerrel->cheapest_total_path;
+        Path       *inner_path = innerrel->cheapest_total_path;
...from picking a ForeignPath?

CreateLocalJoinPath creates an alternative local join path for a foreign
join from the cheapest total paths for the outer/inner relations. The
reason for the above is to pass these paths to that function. On second
thought, however, I think it would be convenient for the caller to just
pass outerrel/innerrel to that function. So, I modified that function's
API as such. Another change is: the previous version of that function
allowed the caller to create a parameterized local-join path
corresponding to a parameterized foreign join, but that is a feature,
not a bug fix, so I dropped that. (I'll propose that as part of the
patch in [1]https://commitfest.postgresql.org/14/1042/.)

There's probably more to think about here, but those are my question
on an initial read-through.

Thanks for the review!

Attached is an updated version of the patch.

Best regards,
Etsuro Fujita

[1]: https://commitfest.postgresql.org/14/1042/

Attachments:

epqpath-for-foreignjoin-10.patchtext/plain; charset=UTF-8; name=epqpath-for-foreignjoin-10.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1701,1722 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1745,1766 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1739,1754 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1790,1811 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1778,1793 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1834,1855 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1816,1831 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1866,1871 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1842,1943 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                        QUERY PLAN                                                                                                                                        
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Sort
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          Sort Key: t2.c1, t3.c1
+          ->  Nested Loop
+                Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+                ->  Foreign Scan
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                      Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1::bit(16) = r3.c1::bit(16)))))
+                      ->  Merge Full Join
+                            Output: t2.c1, t2.*, t3.c1, t3.*
+                            Merge Cond: (((t2.c1)::bit(16)) = ((t3.c1)::bit(16)))
+                            ->  Sort
+                                  Output: t2.c1, t2.*, ((t2.c1)::bit(16))
+                                  Sort Key: ((t2.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft4 t2
+                                        Output: t2.c1, t2.*, (t2.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                            ->  Sort
+                                  Output: t3.c1, t3.*, ((t3.c1)::bit(16))
+                                  Sort Key: ((t3.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft5 t3
+                                        Output: t3.c1, t3.*, (t3.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (29 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4298,4315 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4370,4385 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4441,4458 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4511,4526 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4494,4514 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4494,4514 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		/* Create an unparameterized local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks could not be created");
  			return;
  		}
  	}
***************
*** 4517,4523 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4517,4523 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 493,498 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 493,505 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used to create an alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
   *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(IS_JOIN_REL(joinrel));
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
  				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
  				}
! 				break;
! 
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
! 				}
! 				break;
  
! 			case T_MergeJoin:
! 				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,874 ----
  	return oid;
  }
  
+ 
  /*
!  * Build an alternative local join path to reconstruct a join tuple for the
!  * foreign-join in EvalPlanQual
   *
!  * 'joinrel' is the join relation
!  * 'outerrel' is the outer join relation
!  * 'innerrel' is the inner join relation
!  * 'jointype' is the type of join to do
!  * 'extra' contains additional input values
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
  
! 	/*
! 	 * We don't try to build the alternative local join path if the cheapest
! 	 * total outer and inner paths are parameterized; that seems unlikely in
! 	 * normal case.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return NULL;
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path, extra);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   NULL);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclause_list)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclause_list,
! 										  outer_path, inner_path, extra);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   NULL,
! 														   extra->hashclause_list);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If special case: for "x FULL JOIN y ON true" or "x FULL
! 					 * JOIN y ON false", there will be no join clauses at all;
! 					 * create a clauseless mergejoin path.  Else create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   NIL, outer_path, inner_path,
! 											   NIL, NIL, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																NIL,
! 																NIL,
! 																NIL);
! 					}
! 					else
! 					{
! 						List	   *mergeclauses;
! 						List	   *outerkeys;
! 						List	   *innerkeys;
  
! 						/* Build sort pathkeys for the outer side */
! 						outerkeys = select_outer_pathkeys_for_merge(root,
! 																	extra->mergeclause_list,
! 																	joinrel);
! 						/* Sort the mergeclauses into the corresponding ordering */
! 						mergeclauses = find_mergeclauses_for_pathkeys(root,
! 																	  outerkeys,
! 																	  true,
! 																	  extra->mergeclause_list);
! 						/* Should have used them all... */
! 						Assert(list_length(mergeclauses) == list_length(extra->mergeclause_list));
! 						/* Build sort pathkeys for the inner side */
! 						innerkeys = make_inner_pathkeys_for_merge(root,
! 																  mergeclauses,
! 																  outerkeys);
! 						/*
! 						 * It's possible that the cheapest total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
  
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 68,73 **** static List *select_mergejoin_clauses(PlannerInfo *root,
--- 68,77 ----
  						 List *restrictlist,
  						 JoinType jointype,
  						 bool *mergejoin_allowed);
+ static List *select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype);
  static void generate_mergejoin_paths(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *innerrel,
***************
*** 113,123 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
--- 117,128 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
+ 	extra.mergejoin_allowed = true;
+ 	extra.hashclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
***************
*** 177,183 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
--- 182,203 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
! 
! 	/*
! 	 * Find hashjoin clauses.  As above, we can skip this if we are not
! 	 * interested in doing a hashjoin, but disregard enable_hashjoin for full
! 	 * joins, because there may be no other alternative.
! 	 *
! 	 * Note: we need to build only one hashclause list for any given pair of
! 	 * outer and inner relations; all of the hashable clauses will be used as
! 	 * keys.
! 	 */
! 	if (enable_hashjoin || jointype == JOIN_FULL)
! 		extra.hashclause_list = select_hashjoin_clauses(outerrel,
! 														innerrel,
! 														restrictlist,
! 														jointype);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
***************
*** 240,246 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 260,266 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 251,257 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 271,277 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 268,274 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 288,294 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 1589,1629 **** hash_inner_and_outer(PlannerInfo *root,
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
! 	List	   *hashclauses;
! 	ListCell   *l;
! 
! 	/*
! 	 * We need to build only one hashclauses list for any given pair of outer
! 	 * and inner relations; all of the hashable clauses will be used as keys.
! 	 *
! 	 * Scan the join's restrictinfo list to find hashjoinable clauses that are
! 	 * usable with this pair of sub-relations.
! 	 */
! 	hashclauses = NIL;
! 	foreach(l, extra->restrictlist)
! 	{
! 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
! 
! 		/*
! 		 * If processing an outer join, only use its own join clauses for
! 		 * hashing.  For inner joins we need not be so picky.
! 		 */
! 		if (isouterjoin && restrictinfo->is_pushed_down)
! 			continue;
! 
! 		if (!restrictinfo->can_join ||
! 			restrictinfo->hashjoinoperator == InvalidOid)
! 			continue;			/* not hashjoinable */
! 
! 		/*
! 		 * Check if clause has the form "outer op inner" or "inner op outer".
! 		 */
! 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
! 			continue;			/* no good for these input relations */
! 
! 		hashclauses = lappend(hashclauses, restrictinfo);
! 	}
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
--- 1609,1615 ----
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	List	   *hashclauses = extra->hashclause_list;
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
***************
*** 1910,1912 **** select_mergejoin_clauses(PlannerInfo *root,
--- 1896,1949 ----
  
  	return result_list;
  }
+ 
+ /*
+  * select_hashjoin_clauses
+  *	  Select hashjoin clauses that are usable for a particular join.
+  *	  Returns a list of RestrictInfo nodes for those clauses.
+  *
+  * As in select_mergejoin_clauses(), we mark each selected RestrictInfo to
+  * show which side is currently being considered as outer.  These are
+  * transient markings that are only good for the duration of the current
+  * add_paths_to_joinrel() call!
+  *
+  * We examine each restrictinfo clause known for the join to see
+  * if it is hashjoinable and involves vars from the two sub-relations
+  * currently of interest.
+  */
+ static List *
+ select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype)
+ {
+ 	List	   *hashclauses = NIL;
+ 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+ 	ListCell   *l;
+ 
+ 	foreach(l, restrictlist)
+ 	{
+ 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+ 
+ 		/*
+ 		 * If processing an outer join, only use its own join clauses for
+ 		 * hashing.  For inner joins we need not be so picky.
+ 		 */
+ 		if (isouterjoin && restrictinfo->is_pushed_down)
+ 			continue;
+ 
+ 		if (!restrictinfo->can_join ||
+ 			restrictinfo->hashjoinoperator == InvalidOid)
+ 			continue;			/* not hashjoinable */
+ 
+ 		/*
+ 		 * Check if clause has the form "outer op inner" or "inner op outer".
+ 		 */
+ 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+ 			continue;			/* no good for these input relations */
+ 
+ 		hashclauses = lappend(hashclauses, restrictinfo);
+ 	}
+ 
+ 	return hashclauses;
+ }
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 237,242 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif							/* FDWAPI_H */
--- 237,247 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra);
  
  #endif							/* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 2171,2176 **** typedef struct SemiAntiJoinFactors
--- 2171,2179 ----
   *		clauses that apply to this join
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
+  * hashclause_list is a list of RestrictInfo nodes for available
+  *		hashjoin clauses in this join
   * inner_unique is true if each outer tuple provably matches no more
   *		than one inner tuple
   * sjinfo is extra info about special joins for selectivity estimation
***************
*** 2181,2186 **** typedef struct JoinPathExtraData
--- 2184,2191 ----
  {
  	List	   *restrictlist;
  	List	   *mergeclause_list;
+ 	bool		mergejoin_allowed;
+ 	List	   *hashclause_list;
  	bool		inner_unique;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;

#81

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 8 years ago

In reply to: Etsuro Fujita (#80)

1 attachment(s)

Re: postgres_fdw bug in 9.6

On 2017/08/21 20:37, Etsuro Fujita wrote:

Attached is an updated version of the patch.

I corrected some comments. New version attached.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-11.patchtext/plain; charset=UTF-8; name=epqpath-for-foreignjoin-11.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1701,1722 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1745,1766 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1739,1754 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1790,1811 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1778,1793 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1834,1855 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1816,1831 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1866,1871 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1842,1943 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                        QUERY PLAN                                                                                                                                        
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Sort
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          Sort Key: t2.c1, t3.c1
+          ->  Nested Loop
+                Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+                ->  Foreign Scan
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                      Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1::bit(16) = r3.c1::bit(16)))))
+                      ->  Merge Full Join
+                            Output: t2.c1, t2.*, t3.c1, t3.*
+                            Merge Cond: (((t2.c1)::bit(16)) = ((t3.c1)::bit(16)))
+                            ->  Sort
+                                  Output: t2.c1, t2.*, ((t2.c1)::bit(16))
+                                  Sort Key: ((t2.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft4 t2
+                                        Output: t2.c1, t2.*, (t2.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                            ->  Sort
+                                  Output: t3.c1, t3.*, ((t3.c1)::bit(16))
+                                  Sort Key: ((t3.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft5 t3
+                                        Output: t3.c1, t3.*, (t3.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (29 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4298,4315 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4370,4385 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4441,4458 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4511,4526 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4494,4514 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4494,4514 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		/* Create a local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks could not be created");
  			return;
  		}
  	}
***************
*** 4517,4523 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4517,4523 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 493,498 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 493,505 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 995,1005 **** RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 995,1003 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</> may be used to build
!      a suitable local join path, which can be used to create an alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
   *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(IS_JOIN_REL(joinrel));
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
  				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
  				}
! 				break;
! 
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
! 				}
! 				break;
  
! 			case T_MergeJoin:
! 				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,873 ----
  	return oid;
  }
  
+ 
  /*
!  * Try to build an alternative local join path to reconstruct a join tuple for
!  * the foreign join in EvalPlanQual
   *
!  * 'joinrel' is the join relation
!  * 'outerrel' is the outer join relation
!  * 'innerrel' is the inner join relation
!  * 'jointype' is the type of join to do
!  * 'extra' contains additional input values
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
  
! 	/*
! 	 * We don't try to build the alternative local join path if the cheapest
! 	 * total outer and inner paths are parameterized; that seems unlikely in
! 	 * normal case.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return NULL;
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path, extra);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   NULL);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclause_list)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclause_list,
! 										  outer_path, inner_path, extra);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   NULL,
! 														   extra->hashclause_list);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If right or full join with no mergejoinable clauses,
! 					 * create a clauseless mergejoin path.  Otherwise create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   NIL, outer_path, inner_path,
! 											   NIL, NIL, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																NIL,
! 																NIL,
! 																NIL);
! 					}
! 					else
! 					{
! 						List	   *mergeclauses;
! 						List	   *outerkeys;
! 						List	   *innerkeys;
  
! 						/* Build sort pathkeys for the outer side */
! 						outerkeys = select_outer_pathkeys_for_merge(root,
! 																	extra->mergeclause_list,
! 																	joinrel);
! 						/* Sort the mergeclauses into the corresponding ordering */
! 						mergeclauses = find_mergeclauses_for_pathkeys(root,
! 																	  outerkeys,
! 																	  true,
! 																	  extra->mergeclause_list);
! 						/* Should have used them all... */
! 						Assert(list_length(mergeclauses) == list_length(extra->mergeclause_list));
! 						/* Build sort pathkeys for the inner side */
! 						innerkeys = make_inner_pathkeys_for_merge(root,
! 																  mergeclauses,
! 																  outerkeys);
! 						/*
! 						 * It's possible that the cheapest total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
  
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 68,73 **** static List *select_mergejoin_clauses(PlannerInfo *root,
--- 68,77 ----
  						 List *restrictlist,
  						 JoinType jointype,
  						 bool *mergejoin_allowed);
+ static List *select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype);
  static void generate_mergejoin_paths(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *innerrel,
***************
*** 113,123 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
--- 117,128 ----
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
  	ListCell   *lc;
  
  	extra.restrictlist = restrictlist;
+ 	extra.mergejoin_allowed = true;
  	extra.mergeclause_list = NIL;
+ 	extra.hashclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
  
***************
*** 177,183 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
--- 182,203 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
! 
! 	/*
! 	 * Find hashjoin clauses.  As above, skip this if we are not interested in
! 	 * doing a hashjoin, but disregard enable_hashjoin for full joins, because
! 	 * there may be no other alternative.
! 	 *
! 	 * Note: we need to build only one hashclause list for any given pair of
! 	 * outer and inner relations; all of the hashable clauses will be used as
! 	 * keys.
! 	 */
! 	if (enable_hashjoin || jointype == JOIN_FULL)
! 		extra.hashclause_list = select_hashjoin_clauses(outerrel,
! 														innerrel,
! 														restrictlist,
! 														jointype);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
***************
*** 240,246 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 260,266 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 251,257 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 271,277 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 268,274 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 288,294 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 1589,1629 **** hash_inner_and_outer(PlannerInfo *root,
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
! 	List	   *hashclauses;
! 	ListCell   *l;
! 
! 	/*
! 	 * We need to build only one hashclauses list for any given pair of outer
! 	 * and inner relations; all of the hashable clauses will be used as keys.
! 	 *
! 	 * Scan the join's restrictinfo list to find hashjoinable clauses that are
! 	 * usable with this pair of sub-relations.
! 	 */
! 	hashclauses = NIL;
! 	foreach(l, extra->restrictlist)
! 	{
! 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
! 
! 		/*
! 		 * If processing an outer join, only use its own join clauses for
! 		 * hashing.  For inner joins we need not be so picky.
! 		 */
! 		if (isouterjoin && restrictinfo->is_pushed_down)
! 			continue;
! 
! 		if (!restrictinfo->can_join ||
! 			restrictinfo->hashjoinoperator == InvalidOid)
! 			continue;			/* not hashjoinable */
! 
! 		/*
! 		 * Check if clause has the form "outer op inner" or "inner op outer".
! 		 */
! 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
! 			continue;			/* no good for these input relations */
! 
! 		hashclauses = lappend(hashclauses, restrictinfo);
! 	}
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
--- 1609,1615 ----
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	List	   *hashclauses = extra->hashclause_list;
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
***************
*** 1910,1912 **** select_mergejoin_clauses(PlannerInfo *root,
--- 1896,1948 ----
  
  	return result_list;
  }
+ 
+ /*
+  * select_hashjoin_clauses
+  *	  Select hashjoin clauses that are usable for a particular join.
+  *	  Returns a list of RestrictInfo nodes for those clauses.
+  *
+  * We mark each selected RestrictInfo to show which side is currently
+  * being considered as outer.  These are transient markings that are only
+  * good for the duration of the current add_paths_to_joinrel() call!
+  *
+  * We examine each restrictinfo clause known for the join to see
+  * if it is hashjoinable and involves vars from the two sub-relations
+  * currently of interest.
+  */
+ static List *
+ select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype)
+ {
+ 	List	   *hashclauses = NIL;
+ 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+ 	ListCell   *l;
+ 
+ 	foreach(l, restrictlist)
+ 	{
+ 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+ 
+ 		/*
+ 		 * If processing an outer join, only use its own join clauses for
+ 		 * hashing.  For inner joins we need not be so picky.
+ 		 */
+ 		if (isouterjoin && restrictinfo->is_pushed_down)
+ 			continue;
+ 
+ 		if (!restrictinfo->can_join ||
+ 			restrictinfo->hashjoinoperator == InvalidOid)
+ 			continue;			/* not hashjoinable */
+ 
+ 		/*
+ 		 * Check if clause has the form "outer op inner" or "inner op outer".
+ 		 */
+ 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+ 			continue;			/* no good for these input relations */
+ 
+ 		hashclauses = lappend(hashclauses, restrictinfo);
+ 	}
+ 
+ 	return hashclauses;
+ }
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 237,242 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif							/* FDWAPI_H */
--- 237,247 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra);
  
  #endif							/* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 2169,2176 **** typedef struct SemiAntiJoinFactors
--- 2169,2179 ----
   *
   * restrictlist contains all of the RestrictInfo nodes for restriction
   *		clauses that apply to this join
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
   * mergeclause_list is a list of RestrictInfo nodes for available
   *		mergejoin clauses in this join
+  * hashclause_list is a list of RestrictInfo nodes for available
+  *		hashjoin clauses in this join
   * inner_unique is true if each outer tuple provably matches no more
   *		than one inner tuple
   * sjinfo is extra info about special joins for selectivity estimation
***************
*** 2180,2186 **** typedef struct SemiAntiJoinFactors
--- 2183,2191 ----
  typedef struct JoinPathExtraData
  {
  	List	   *restrictlist;
+ 	bool		mergejoin_allowed;
  	List	   *mergeclause_list;
+ 	List	   *hashclause_list;
  	bool		inner_unique;
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;

#82

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Etsuro Fujita (#81)

Re: postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

[ epqpath-for-foreignjoin-11.patch ]

I started looking at this. I've not yet wrapped my head around what
CreateLocalJoinPath() is doing, but I noted that Robert's concerns
about ABI breakage in the back branches would not be very difficult
to satisfy. What we'd need to do is

(1) In the back-branch patch, add the new fields of JoinPathExtraData
at the end, rather than in their more natural locations. This should
avoid any ABI break for uses of that struct, since there's no reason
why an FDW would be creating its own variables of that type rather
than just using what the core code passes it.

(2) In the back branches, just leave GetExistingLocalJoinPath in place
rather than deleting it. Existing FDWs would still be subject to the
bug until they were updated, but given how hard it is to trigger, that
doesn't seem like a huge problem.

A variant of (2) is to install a hack fix in GetExistingLocalJoinPath
rather than leaving it unchanged. I'm not very excited about that though;
it doesn't seem unlikely that we might introduce new bugs that way, and
it would certainly be a distraction from getting the real fix finished.

We'd definitely need to do things that way in 9.6. I'm not quite sure
whether it's too late to adopt the clean solution in v10.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83

Ryan Murphy

ryanfmurphy@gmail.com

over 8 years ago

In reply to: Tom Lane (#82)

Re: postgres_fdw bug in 9.6

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

This feature is obviously a pretty deep cut, and I don't understand the JOIN system enough to comment on whether the code is correct. I did not see any issues when glancing through the patch.

I ran "make check", "make installcheck" and "make installcheck-world" and had ALMOST no problems: I marked it Tested and Passed because the only issue I had was a recurring issue I've had with "make installcheck-world" which I think is unrelated:

*** path/to/postgres/src/interfaces/ecpg/test/expected/connect-test5.stderr 2017-02-14 09:22:25.000000000 -0600
--- path/to/postgres/src/interfaces/ecpg/test/results/connect-test5.stderr  2017-09-04 23:31:50.000000000 -0500
***************
*** 36,42 ****
  [NO_PID]: sqlca: code: 0, state: 00000
  [NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port <DEFAULT>  for user regress_ecpg_user2
  [NO_PID]: sqlca: code: 0, state: 00000
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database "regress_ecpg_user2" does not exist
...

But this still Needs Review by someone who knows the JOIN system better.

Best,
Ryan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#84

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 8 years ago

In reply to: Ryan Murphy (#83)

Re: postgres_fdw bug in 9.6

On 2017/09/05 13:43, Ryan Murphy wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

This feature is obviously a pretty deep cut, and I don't understand the JOIN system enough to comment on whether the code is correct. I did not see any issues when glancing through the patch.

Thank you for the review!

I ran "make check", "make installcheck" and "make installcheck-world" and had ALMOST no problems: I marked it Tested and Passed because the only issue I had was a recurring issue I've had with "make installcheck-world" which I think is unrelated:
*** path/to/postgres/src/interfaces/ecpg/test/expected/connect-test5.stderr 2017-02-14 09:22:25.000000000 -0600
--- path/to/postgres/src/interfaces/ecpg/test/results/connect-test5.stderr  2017-09-04 23:31:50.000000000 -0500
***************
*** 36,42 ****
[NO_PID]: sqlca: code: 0, state: 00000
[NO_PID]: ECPGconnect: opening database <DEFAULT> on <DEFAULT> port <DEFAULT>  for user regress_ecpg_user2
[NO_PID]: sqlca: code: 0, state: 00000
! [NO_PID]: ECPGconnect: could not open database: FATAL:  database "regress_ecpg_user2" does not exist
...

I tested that with HEAD, but couldn't reproduce this problem. Didn't
you test that with HEAD?

But this still Needs Review by someone who knows the JOIN system better.

I think Tom is reviewing this patch [1]/messages/by-id/22168.1504041271@sss.pgh.pa.us.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/22168.1504041271@sss.pgh.pa.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85

Ryan Murphy

ryanfmurphy@gmail.com

over 8 years ago

In reply to: Etsuro Fujita (#84)

Re: postgres_fdw bug in 9.6

I tested that with HEAD, but couldn't reproduce this problem. Didn't you

test that with HEAD?

Yeah, I know it's not an issue other people are having - I think it's
something specific about how my environment / postgres build is set up. In
any case I knew that it wasn't caused by your patch.

Thanks,
Ryan

#86

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 8 years ago

In reply to: Etsuro Fujita (#84)

Re: postgres_fdw bug in 9.6

On Tue, Sep 5, 2017 at 1:10 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I think Tom is reviewing this patch [1].

I am marking this as ready for committer as I don't have any new
comments and possibly other reviewers have also done reviewing it.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#87

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#82)

Re: postgres_fdw bug in 9.6

On Tue, Aug 29, 2017 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

[ epqpath-for-foreignjoin-11.patch ]

I started looking at this. I've not yet wrapped my head around what
CreateLocalJoinPath() is doing, but I noted that Robert's concerns
about ABI breakage in the back branches would not be very difficult
to satisfy. What we'd need to do is

(1) In the back-branch patch, add the new fields of JoinPathExtraData
at the end, rather than in their more natural locations. This should
avoid any ABI break for uses of that struct, since there's no reason
why an FDW would be creating its own variables of that type rather
than just using what the core code passes it.

(2) In the back branches, just leave GetExistingLocalJoinPath in place
rather than deleting it. Existing FDWs would still be subject to the
bug until they were updated, but given how hard it is to trigger, that
doesn't seem like a huge problem.

A variant of (2) is to install a hack fix in GetExistingLocalJoinPath
rather than leaving it unchanged. I'm not very excited about that though;
it doesn't seem unlikely that we might introduce new bugs that way, and
it would certainly be a distraction from getting the real fix finished.

We'd definitely need to do things that way in 9.6. I'm not quite sure
whether it's too late to adopt the clean solution in v10.

It probably is now. Are you still planning to do something about this patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#87)

Re: postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Aug 29, 2017 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We'd definitely need to do things that way in 9.6. I'm not quite sure
whether it's too late to adopt the clean solution in v10.

It probably is now. Are you still planning to do something about this patch?

It's still on my list, but I didn't get to it during the CF.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#89

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Tom Lane (#88)

Re: [HACKERS] postgres_fdw bug in 9.6

I wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Aug 29, 2017 at 5:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We'd definitely need to do things that way in 9.6. I'm not quite sure
whether it's too late to adopt the clean solution in v10.

It probably is now. Are you still planning to do something about this patch?

It's still on my list, but I didn't get to it during the CF.

I got around to looking at this thread again. Perhaps as a result of
not having looked at it for awhile, the thing that had been nagging
at me from the very beginning (where I said "I kind of wonder why this
infrastructure exists at all") finally crystallized: I do not think
this infrastructure *should* exist, at least not for postgres_fdw.

The reason EPQ exists at all is basically to avoid applying the overhead
of SELECT FOR UPDATE in cases where we don't have to. If we chased up
to the end of every update chain and locked the frontmost tuple every
time, then EPQ rechecks wouldn't be needed because we'd already have
locked and qual-checked the right tuple. This is, I think, a reasonable
optimization on local tables; but it's far less clear that it would be
sane to do it like that on remote tables, because of the extra network
round trips required. Moreover, we already foreclosed the decision
because postgres_fdw always issues SELECT FOR UPDATE, not just SELECT,
to the remote side in any case where EPQ locking might happen.

Therefore, the remote side is already holding a tuple lock on the right
tuple, and it's already done anything EPQ-like that might need to have
happened, and it's guaranteed to have given us back a fully up-to-date
row value. This is true whether the remote query is for a single table
or for a join.

In short, we should get rid of all of this expensive and broken logic and
just make EPQ recheck on a foreign join be a no-op, just as it is for a
foreign base table. (Indeed, considering the fact that what we think is
a base table might be a join view on the far side, it's hard to come to
any conclusion but that making such a distinction is fundamentally wrong.)
If there are any cases where the join pushdown logic is failing to emit
"FOR UPDATE", we need to fix that, but a quick check says that it does
emit that at least in simple cases.

Comments?

regards, tom lane

#90

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Tom Lane (#89)

Re: [HACKERS] postgres_fdw bug in 9.6

On Tue, Nov 28, 2017 at 11:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, we should get rid of all of this expensive and broken logic and
just make EPQ recheck on a foreign join be a no-op, just as it is for a
foreign base table.

I'm not sure that it is. What of
5fc4c26db5120bd90348b6ee3101fcddfdf54800? That was before I started
putting "Discussion" links into commit messages as a matter of
routine, but I'm pretty sure that fixed what seemed to Etsuro Fujita,
Kyotaro Horiguchi, and myself to be pretty clear misbehavior. See
also 385f337c9f39b21dca96ca4770552a10a6d5af24. We've put an awful lot
of effort into this over the last few years; I'm a bit hesitant to
believe none of that did anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#91

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Robert Haas (#90)

Re: [HACKERS] postgres_fdw bug in 9.6

On Tue, Nov 28, 2017 at 11:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 28, 2017 at 11:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, we should get rid of all of this expensive and broken logic and
just make EPQ recheck on a foreign join be a no-op, just as it is for a
foreign base table.

I'm not sure that it is. What of
5fc4c26db5120bd90348b6ee3101fcddfdf54800? That was before I started
putting "Discussion" links into commit messages as a matter of
routine, but I'm pretty sure that fixed what seemed to Etsuro Fujita,
Kyotaro Horiguchi, and myself to be pretty clear misbehavior. See
also 385f337c9f39b21dca96ca4770552a10a6d5af24. We've put an awful lot
of effort into this over the last few years; I'm a bit hesitant to
believe none of that did anything.

IIUC, the problem here is not about stale foreign tuples (or whether
they can change) - as you described above they can not, since we have
requested FOR UPDATE. But when a foreign join is further joined with a
local table, tuples in which change after they are joined, we need to
make sure that the total join tuple (local table joined with foreign
join) still qualifies. We do this by locally joining those rows again.
[1]: /messages/by-id/558A18B3.9050201@lab.ntt.co.jp

[1]: /messages/by-id/558A18B3.9050201@lab.ntt.co.jp

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#92

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Ashutosh Bapat (#91)

Re: [HACKERS] postgres_fdw bug in 9.6

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:

On Tue, Nov 28, 2017 at 11:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 28, 2017 at 11:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, we should get rid of all of this expensive and broken logic and
just make EPQ recheck on a foreign join be a no-op, just as it is for a
foreign base table.

IIUC, the problem here is not about stale foreign tuples (or whether
they can change) - as you described above they can not, since we have
requested FOR UPDATE. But when a foreign join is further joined with a
local table, tuples in which change after they are joined, we need to
make sure that the total join tuple (local table joined with foreign
join) still qualifies. We do this by locally joining those rows again.
[1] is where the discussion started
[1] /messages/by-id/558A18B3.9050201@lab.ntt.co.jp

AFAICT, [1] just demonstrates that nobody had thought about the scenario
up to that point, not that the subsequently chosen solution was a good
one. In that example,

LockRows (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
-> Nested Loop (cost=100.00..101.14 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
Join Filter: (foo.a = tab.a)
-> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
-> Foreign Scan (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them. The only thing that
can change is the output of the local scan on public.tab. Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.

If the topmost join were done differently, passing tab.a down as a
parameter into the foreign join, then we would have a situation where
there was something that needed to be rechecked within the foreign join.
But that would require having made a parameterized path for the foreign
join, which postgres_fdw doesn't do and lacks a lot of other
infrastructure for besides this, I believe.

Moreover, even if we had parameterized foreign join paths, I'm not sure
that it makes sense to build the amount of plan infrastructure involved
in the current implementation just to recheck the pushed-down qual(s).
That could be done a whole lot more cheaply, not only in terms of the
actual EPQ execution (which maybe you could argue isn't performance
critical), but also in terms of the planner effort and executor startup
effort involved to create all those subsidiary plan nodes.

I regret not having paid more attention to the foreign join stuff, because
maybe that could have saved you guys a lot of work ... but I really feel
like there was a lot of misguided work here.

regards, tom lane

#93

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Tom Lane (#92)

Re: [HACKERS] postgres_fdw bug in 9.6

On Thu, Nov 30, 2017 at 7:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

[snip]

Moving to next CF per the hotness of the topic.
--
Michael

#94

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 8 years ago

In reply to: Tom Lane (#92)

Re: [HACKERS] postgres_fdw bug in 9.6

(2017/11/30 7:32), Tom Lane wrote:

AFAICT, [1] just demonstrates that nobody had thought about the scenario
up to that point, not that the subsequently chosen solution was a good
one. In that example,

LockRows (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
-> Nested Loop (cost=100.00..101.14 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
Join Filter: (foo.a = tab.a)
-> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
-> Foreign Scan (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them. The only thing that
can change is the output of the local scan on public.tab. Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.

That's right, but is that true when the FDW uses late row locking?

Best regards,
Etsuro Fujita

#95

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Etsuro Fujita (#94)

Re: [HACKERS] postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

(2017/11/30 7:32), Tom Lane wrote:

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them. The only thing that
can change is the output of the local scan on public.tab. Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.

That's right, but is that true when the FDW uses late row locking?

An FDW using late row locking would need to work harder, yes. But
that's true at the scan level as well as the join level. We have
already committed to using early locking in postgres_fdw, for the
network-round-trip-cost reasons I mentioned before, and I can't see
why we'd change that decision at the join level.

Right now we've got the worst of both worlds, in that we're actually
doing early row locking on the remote, but we're paying (some of) the
code complexity costs required for late locking.

regards, tom lane

#96

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Tom Lane (#92)

Re: [HACKERS] postgres_fdw bug in 9.6

On Wed, Nov 29, 2017 at 5:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

AFAICT, [1] just demonstrates that nobody had thought about the scenario
up to that point, not that the subsequently chosen solution was a good
one.

But have a look at this:

/messages/by-id/561E12D4.7040403@lab.ntt.co.jp

That shows a case where, before
5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the
foreign table to do an EPQ recheck produced an unambiguously wrong
answer; the query stipulates that inttab.a = ft1.a but the row
returned lacks that property. I think this clearly shows that EPQ
rechecks are needed at least for FDW paths that are parameterized.

However, I guess that doesn't actually refute your point, which IIUC
is that maybe we don't need these checks for FDW join paths, because
we don't build parameterized paths for those yet. Now I think it would
still be a good idea to have a way of handling that case, because
somebody's likely going want to fix that /* XXX Consider parameterized
paths for the join relation */ eventually, but I guess for the
back-branches just setting epq_path = NULL instead of calling
GetExistingLocalJoinPath() might be the way to go... assuming that
your contention that this won't break anything is indeed correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#97

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Robert Haas (#96)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

But have a look at this:
/messages/by-id/561E12D4.7040403@lab.ntt.co.jp
That shows a case where, before
5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the
foreign table to do an EPQ recheck produced an unambiguously wrong
answer; the query stipulates that inttab.a = ft1.a but the row
returned lacks that property. I think this clearly shows that EPQ
rechecks are needed at least for FDW paths that are parameterized.

Certainly; that's what I said before.

However, I guess that doesn't actually refute your point, which IIUC
is that maybe we don't need these checks for FDW join paths, because
we don't build parameterized paths for those yet. Now I think it would
still be a good idea to have a way of handling that case, because
somebody's likely going want to fix that /* XXX Consider parameterized
paths for the join relation */ eventually, but I guess for the
back-branches just setting epq_path = NULL instead of calling
GetExistingLocalJoinPath() might be the way to go... assuming that
your contention that this won't break anything is indeed correct.

I thought some more about this. While it seems clear that we don't
actually need to recheck anything within a postgres_fdw foreign join,
there's still a problem: we have to be able to regurgitate the join
row during postgresRecheckForeignScan. Since, in the current system
design, the only available data is scan-level rowmarks (that is,
whole-row values from each base foreign relation), there isn't any
very good way to produce the join row except to insert the rowmark
values into dummy scan nodes and then re-execute the join locally.
So really, that is what the outerPlan infrastructure is doing for us.

We could imagine that, instead of the scan-level rowmarks, the foreign
join node could produce a ROW() expression containing the values it
actually has to output, and then use that as a "join-level rowmark" to
reconstitute the join row without any of the outerPlan infrastructure.
This would have some pretty significant advantages, notably that we
could (in principle) avoid fetching all the columns of remote tables
we no longer needed scan-level rowmarks for.

However, to do that, we'd have to dynamically decide which rowmark
expressions actually need to wind up getting computed in the final join
plan, based on which joins we choose to do remotely. The current scheme
of statically adding rowmarks to the query during base relation setup
doesn't cut it. So that's sounding like a nontrivial rewrite (and one
that would break the related FDW APIs, although the patch at hand does
that too).

As for the question of a future extension to parameterized paths,
I think that it would likely work to recheck the parameterized qual
at the join node, making it basically just like rechecks for scan-level
nodes. The objection to this is that it might not produce the same
answer if the parameterized qual references relations that are on the
insides of outer joins ... but I think that in practice we could just
blow off that case (ie, reject producing parameterized paths involving
such quals). The reason is that I think such cases arise only very
rarely. It could only happen for non-strict qual expressions, because
a strict qual would have resulted in the outer join getting strength
reduced to a regular join.

Not sure where that leaves us for the patch at hand. It feels to me
like it's a temporary band-aid for code that we want to get rid of
in the not-too-long run. As such, it'd be okay if it were smaller,
but it seems bigger and more invasive than I could wish for a band-aid.

regards, tom lane

#98

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 8 years ago

In reply to: Tom Lane (#95)

Re: [HACKERS] postgres_fdw bug in 9.6

(2017/11/30 23:22), Tom Lane wrote:

Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> writes:

(2017/11/30 7:32), Tom Lane wrote:

the output of the foreign join cannot change during EPQ, since the remote
server already locked the rows before returning them. The only thing that
can change is the output of the local scan on public.tab. Yes, we then
need to re-verify that foo.a = tab.a ... but in this example, that's the
responsibility of the NestLoop plan node, not the foreign join.

That's right, but is that true when the FDW uses late row locking?

An FDW using late row locking would need to work harder, yes. But
that's true at the scan level as well as the join level. We have
already committed to using early locking in postgres_fdw, for the
network-round-trip-cost reasons I mentioned before, and I can't see
why we'd change that decision at the join level.

My concern is FDWs that support join pushdown in combination with late
row locking. I don't know whether such FDWs really exist, but if so, an
output of a foreign join computed from re-fetched tuples might change.

Right now we've got the worst of both worlds, in that we're actually
doing early row locking on the remote, but we're paying (some of) the
code complexity costs required for late locking.

Because we have provided the late row locking API, I think we should pay
a certain degree of consideration for such FDWs.

Best regards,
Etsuro Fujita

#99

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Tom Lane (#97)

Re: [HACKERS] postgres_fdw bug in 9.6

On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not sure where that leaves us for the patch at hand. It feels to me
like it's a temporary band-aid for code that we want to get rid of
in the not-too-long run. As such, it'd be okay if it were smaller,
but it seems bigger and more invasive than I could wish for a band-aid.

Well, the bug report is a year old today, so we should try to do
something. The patch needs a rebase, but reading through it, I'm not
convinced that it's particularly risky. I mean, it's going to break
FDWs that are calling GetExistingLocalJoinPath(), but there are
probably very few third-party FDWs that do that. Any that do are
precisely the ones that need this fix, so I think there's a good
chance they'll forgive of us for requiring them to make a code change.
I think we can contain the risk of anything else getting broken to an
acceptable level because there's not really very much other code
changing. The changes to joinpath.c need to be carefully audited to
make sure that they are not changing the semantics, but that seems
like it shouldn't take more than an hour of code-reading to check.
The new fields in JoinPathExtraData can be moved to the end of the
struct to minimize ABI breakage. I don't see what else there is that
could break apart from the EPQ rechecks themselves, and if that
weren't broken already, this patch wouldn't exist.

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released. I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen. We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10. And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11. Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#100

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 8 years ago

In reply to: Robert Haas (#99)

1 attachment(s)

Re: [HACKERS] postgres_fdw bug in 9.6

(2017/12/09 5:53), Robert Haas wrote:

On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Not sure where that leaves us for the patch at hand. It feels to me
like it's a temporary band-aid for code that we want to get rid of
in the not-too-long run. As such, it'd be okay if it were smaller,
but it seems bigger and more invasive than I could wish for a band-aid.

Well, the bug report is a year old today, so we should try to do
something. The patch needs a rebase, but reading through it, I'm not
convinced that it's particularly risky. I mean, it's going to break
FDWs that are calling GetExistingLocalJoinPath(), but there are
probably very few third-party FDWs that do that. Any that do are
precisely the ones that need this fix, so I think there's a good
chance they'll forgive of us for requiring them to make a code change.
I think we can contain the risk of anything else getting broken to an
acceptable level because there's not really very much other code
changing. The changes to joinpath.c need to be carefully audited to
make sure that they are not changing the semantics, but that seems
like it shouldn't take more than an hour of code-reading to check.
The new fields in JoinPathExtraData can be moved to the end of the
struct to minimize ABI breakage. I don't see what else there is that
could break apart from the EPQ rechecks themselves, and if that
weren't broken already, this patch wouldn't exist.

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released. I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen. We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10. And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11. Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().

Agreed. Attached is an rebased version which moved the new fields in
JoinPathExtraData to the end of that struct.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-12.patchtext/x-diff; name=epqpath-for-foreignjoin-12.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1701,1722 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1745,1766 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1739,1754 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1790,1811 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1778,1793 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1834,1855 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1816,1831 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1866,1871 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1842,1943 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                        QUERY PLAN                                                                                                                                        
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Sort
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          Sort Key: t2.c1, t3.c1
+          ->  Nested Loop
+                Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+                ->  Foreign Scan
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                      Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1::bit(16) = r3.c1::bit(16)))))
+                      ->  Merge Full Join
+                            Output: t2.c1, t2.*, t3.c1, t3.*
+                            Merge Cond: (((t2.c1)::bit(16)) = ((t3.c1)::bit(16)))
+                            ->  Sort
+                                  Output: t2.c1, t2.*, ((t2.c1)::bit(16))
+                                  Sort Key: ((t2.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft4 t2
+                                        Output: t2.c1, t2.*, (t2.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                            ->  Sort
+                                  Output: t3.c1, t3.*, ((t3.c1)::bit(16))
+                                  Sort Key: ((t3.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft5 t3
+                                        Output: t3.c1, t3.*, (t3.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (29 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4300,4317 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4372,4387 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4443,4460 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4513,4528 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4494,4514 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4494,4514 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		/* Create a local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks could not be created");
  			return;
  		}
  	}
***************
*** 4517,4523 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4517,4523 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 493,498 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 493,505 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 999,1009 **** RecheckForeignScan(ForeignScanState *node,
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</literal> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</literal>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 999,1007 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</literal> may be used to build
!      a suitable local join path, which can be used as the alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
   *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(IS_JOIN_REL(joinrel));
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
  				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
  				}
! 				break;
! 
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
! 				}
! 				break;
  
! 			case T_MergeJoin:
! 				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,873 ----
  	return oid;
  }
  
+ 
  /*
!  * Try to build an alternative local join path to reconstruct a join tuple for
!  * the foreign join in EvalPlanQual
   *
!  * 'joinrel' is the join relation
!  * 'outerrel' is the outer join relation
!  * 'innerrel' is the inner join relation
!  * 'jointype' is the type of join to do
!  * 'extra' contains additional input values
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
  
! 	/*
! 	 * We don't try to build the alternative local join path if the cheapest
! 	 * total outer and inner paths are parameterized; that seems unlikely in
! 	 * normal case.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return NULL;
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path, extra);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   NULL);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclause_list)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclause_list,
! 										  outer_path, inner_path, extra);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra,
! 														   outer_path,
! 														   inner_path,
! 														   extra->restrictlist,
! 														   NULL,
! 														   extra->hashclause_list);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If right or full join with no mergejoinable clauses,
! 					 * create a clauseless mergejoin path.  Otherwise create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   NIL, outer_path, inner_path,
! 											   NIL, NIL, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																NIL,
! 																NIL,
! 																NIL);
! 					}
! 					else
! 					{
! 						List	   *mergeclauses;
! 						List	   *outerkeys;
! 						List	   *innerkeys;
  
! 						/* Build sort pathkeys for the outer side */
! 						outerkeys = select_outer_pathkeys_for_merge(root,
! 																	extra->mergeclause_list,
! 																	joinrel);
! 						/* Sort the mergeclauses into the corresponding ordering */
! 						mergeclauses = find_mergeclauses_for_pathkeys(root,
! 																	  outerkeys,
! 																	  true,
! 																	  extra->mergeclause_list);
! 						/* Should have used them all... */
! 						Assert(list_length(mergeclauses) == list_length(extra->mergeclause_list));
! 						/* Build sort pathkeys for the inner side */
! 						innerkeys = make_inner_pathkeys_for_merge(root,
! 																  mergeclauses,
! 																  outerkeys);
! 						/*
! 						 * It's possible that the cheapest total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
  
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 78,83 **** static List *select_mergejoin_clauses(PlannerInfo *root,
--- 78,87 ----
  						 List *restrictlist,
  						 JoinType jointype,
  						 bool *mergejoin_allowed);
+ static List *select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype);
  static void generate_mergejoin_paths(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *innerrel,
***************
*** 123,129 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  	Relids		joinrelids;
  
--- 127,132 ----
***************
*** 143,148 **** add_paths_to_joinrel(PlannerInfo *root,
--- 146,153 ----
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.mergejoin_allowed = true;
+ 	extra.hashclause_list = NIL;
  
  	/*
  	 * See if the inner relation is provably unique for this outer rel.
***************
*** 200,206 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
--- 205,226 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
! 
! 	/*
! 	 * Find hashjoin clauses.  As above, skip this if we are not interested in
! 	 * doing a hashjoin, but disregard enable_hashjoin for full joins, because
! 	 * there may be no other alternative.
! 	 *
! 	 * Note: we need to build only one hashclause list for any given pair of
! 	 * outer and inner relations; all of the hashable clauses will be used as
! 	 * keys.
! 	 */
! 	if (enable_hashjoin || jointype == JOIN_FULL)
! 		extra.hashclause_list = select_hashjoin_clauses(outerrel,
! 														innerrel,
! 														restrictlist,
! 														jointype);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
***************
*** 263,269 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 283,289 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 274,280 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 294,300 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 291,297 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 311,317 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 1675,1715 **** hash_inner_and_outer(PlannerInfo *root,
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
! 	List	   *hashclauses;
! 	ListCell   *l;
! 
! 	/*
! 	 * We need to build only one hashclauses list for any given pair of outer
! 	 * and inner relations; all of the hashable clauses will be used as keys.
! 	 *
! 	 * Scan the join's restrictinfo list to find hashjoinable clauses that are
! 	 * usable with this pair of sub-relations.
! 	 */
! 	hashclauses = NIL;
! 	foreach(l, extra->restrictlist)
! 	{
! 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
! 
! 		/*
! 		 * If processing an outer join, only use its own join clauses for
! 		 * hashing.  For inner joins we need not be so picky.
! 		 */
! 		if (isouterjoin && restrictinfo->is_pushed_down)
! 			continue;
! 
! 		if (!restrictinfo->can_join ||
! 			restrictinfo->hashjoinoperator == InvalidOid)
! 			continue;			/* not hashjoinable */
! 
! 		/*
! 		 * Check if clause has the form "outer op inner" or "inner op outer".
! 		 */
! 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
! 			continue;			/* no good for these input relations */
! 
! 		hashclauses = lappend(hashclauses, restrictinfo);
! 	}
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
--- 1695,1701 ----
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	List	   *hashclauses = extra->hashclause_list;
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
***************
*** 1996,1998 **** select_mergejoin_clauses(PlannerInfo *root,
--- 1982,2034 ----
  
  	return result_list;
  }
+ 
+ /*
+  * select_hashjoin_clauses
+  *	  Select hashjoin clauses that are usable for a particular join.
+  *	  Returns a list of RestrictInfo nodes for those clauses.
+  *
+  * We mark each selected RestrictInfo to show which side is currently
+  * being considered as outer.  These are transient markings that are only
+  * good for the duration of the current add_paths_to_joinrel() call!
+  *
+  * We examine each restrictinfo clause known for the join to see
+  * if it is hashjoinable and involves vars from the two sub-relations
+  * currently of interest.
+  */
+ static List *
+ select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype)
+ {
+ 	List	   *hashclauses = NIL;
+ 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+ 	ListCell   *l;
+ 
+ 	foreach(l, restrictlist)
+ 	{
+ 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+ 
+ 		/*
+ 		 * If processing an outer join, only use its own join clauses for
+ 		 * hashing.  For inner joins we need not be so picky.
+ 		 */
+ 		if (isouterjoin && restrictinfo->is_pushed_down)
+ 			continue;
+ 
+ 		if (!restrictinfo->can_join ||
+ 			restrictinfo->hashjoinoperator == InvalidOid)
+ 			continue;			/* not hashjoinable */
+ 
+ 		/*
+ 		 * Check if clause has the form "outer op inner" or "inner op outer".
+ 		 */
+ 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+ 			continue;			/* no good for these input relations */
+ 
+ 		hashclauses = lappend(hashclauses, restrictinfo);
+ 	}
+ 
+ 	return hashclauses;
+ }
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 247,252 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif							/* FDWAPI_H */
--- 247,257 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra);
  
  #endif							/* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 2270,2275 **** typedef struct SemiAntiJoinFactors
--- 2270,2278 ----
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI/ANTI/inner_unique joins)
   * param_source_rels are OK targets for parameterization of result paths
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
+  * hashclause_list is a list of RestrictInfo nodes for available
+  *		hashjoin clauses in this join
   */
  typedef struct JoinPathExtraData
  {
***************
*** 2279,2284 **** typedef struct JoinPathExtraData
--- 2282,2289 ----
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	bool		mergejoin_allowed;
+ 	List	   *hashclause_list;
  } JoinPathExtraData;
  
  /*

#101

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Etsuro Fujita (#100)

Re: [HACKERS] postgres_fdw bug in 9.6

On Mon, Dec 11, 2017 at 8:25 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released. I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen. We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10. And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11. Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().

Agreed. Attached is an rebased version which moved the new fields in
JoinPathExtraData to the end of that struct.

FYI this doesn't compile anymore, because initial_cost_hashjoin() and
create_hashjoin_path() changed in master.

--
Thomas Munro
http://www.enterprisedb.com

#102

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

about 8 years ago

In reply to: Thomas Munro (#101)

1 attachment(s)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/12 10:41), Thomas Munro wrote:

On Mon, Dec 11, 2017 at 8:25 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Now, if you're still super-concerned about this breaking something, we
could commit it only to master, where it will have 9 months to bake
before it gets released. I think that's overly conservative, but I
think it's still better than waiting for the rewrite you'd like to see
happen. We don't know when or if anyone is going to undertake that,
and if we wait, we may easing release a v11 that's got the same defect
as v9.6 and now v10. And I don't see that we lose much by committing
this now even if that rewrite does happen in time for v11. Ripping
out CreateLocalJoinPath() won't be any harder than ripping out
GetExistingLocalJoinPath().

Agreed. Attached is an rebased version which moved the new fields in
JoinPathExtraData to the end of that struct.

FYI this doesn't compile anymore, because initial_cost_hashjoin() and
create_hashjoin_path() changed in master.

Thank you for letting me know about that! Here is an updated version.

Best regards,
Etsuro Fujita

Attachments:

epqpath-for-foreignjoin-13.patchtext/x-diff; name=epqpath-for-foreignjoin-13.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 1701,1722 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1701,1716 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1745,1766 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1739,1754 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1790,1811 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1778,1793 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1834,1855 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
                             Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
!                                  Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
                             Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
!                                  Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1816,1831 ----
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                 Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Nested Loop
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Join Filter: (t1.c1 = t2.c1)
!                      ->  Foreign Scan on public.ft1 t1
                             Output: t1.c1, t1.c3, t1.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Foreign Scan on public.ft2 t2
                             Output: t2.c1, t2.*
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (17 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1866,1871 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 1842,1943 ----
   110 | 110
  (10 rows)
  
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                      QUERY PLAN                                                                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Hash Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Hash Cond: (t2.c1 = t3.c1)
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Hash
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                        QUERY PLAN                                                                                                                                        
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Sort
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          Sort Key: t2.c1, t3.c1
+          ->  Nested Loop
+                Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+                ->  Foreign Scan
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                      Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON (((r2.c1::bit(16) = r3.c1::bit(16)))))
+                      ->  Merge Full Join
+                            Output: t2.c1, t2.*, t3.c1, t3.*
+                            Merge Cond: (((t2.c1)::bit(16)) = ((t3.c1)::bit(16)))
+                            ->  Sort
+                                  Output: t2.c1, t2.*, ((t2.c1)::bit(16))
+                                  Sort Key: ((t2.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft4 t2
+                                        Output: t2.c1, t2.*, (t2.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                            ->  Sort
+                                  Output: t3.c1, t3.*, ((t3.c1)::bit(16))
+                                  Sort Key: ((t3.c1)::bit(16))
+                                  ->  Foreign Scan on public.ft5 t3
+                                        Output: t3.c1, t3.*, (t3.c1)::bit(16)
+                                        Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ (29 rows)
+ 
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+                                                                                                                                                 QUERY PLAN                                                                                                                                                 
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  LockRows
+    Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+    ->  Nested Loop
+          Output: "T 3".c1, t2.c1, t3.c1, "T 3".ctid, t2.*, t3.*
+          ->  Foreign Scan
+                Output: t2.c1, t2.*, t3.c1, t3.*
+                Relations: (public.ft4 t2) FULL JOIN (public.ft5 t3)
+                Remote SQL: SELECT r2.c1, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r3.c1, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END FROM ("S 1"."T 3" r2 FULL JOIN "S 1"."T 4" r3 ON ((false))) ORDER BY r2.c1 ASC NULLS LAST, r3.c1 ASC NULLS LAST
+                ->  Merge Full Join
+                      Output: t2.c1, t2.*, t3.c1, t3.*
+                      Join Filter: false
+                      ->  Foreign Scan on public.ft4 t2
+                            Output: t2.c1, t2.*
+                            Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+                      ->  Materialize
+                            Output: t3.c1, t3.*
+                            ->  Foreign Scan on public.ft5 t3
+                                  Output: t3.c1, t3.*
+                                  Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+          ->  Materialize
+                Output: "T 3".c1, "T 3".ctid
+                ->  Seq Scan on "S 1"."T 3"
+                      Output: "T 3".c1, "T 3".ctid
+                      Filter: ("T 3".c1 = 50)
+ (24 rows)
+ 
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
***************
*** 4300,4317 **** UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (17 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 4372,4387 ----
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c8, r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 9)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (15 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 4443,4460 **** DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Hash Join
                 Output: ft2.ctid, ft1.*
!                Hash Cond: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Hash
                       Output: ft1.*, ft1.c1
!                      ->  Foreign Scan on public.ft1
!                            Output: ft1.*, ft1.c1
!                            Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (17 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 4513,4528 ----
           Output: ft2.ctid, ft1.*
           Relations: (public.ft2) INNER JOIN (public.ft1)
           Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1.c2 = r2."C 1")) AND (((r2."C 1" % 10) = 2)))) FOR UPDATE OF r1
!          ->  Nested Loop
                 Output: ft2.ctid, ft1.*
!                Join Filter: (ft2.c2 = ft1.c1)
                 ->  Foreign Scan on public.ft2
                       Output: ft2.ctid, ft2.c2
                       Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!                ->  Foreign Scan on public.ft1
                       Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (15 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 4494,4514 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * GetExistingLocalJoinPath will find a suitable path for this purpose in
! 	 * the path list of the joinrel, if one exists.  We must be careful to
! 	 * call it before adding any ForeignPath, since the ForeignPath might
! 	 * dominate the only suitable local path available.  We also do it before
! 	 * calling foreign_join_ok(), since that function updates fpinfo and marks
! 	 * it as pushable if the join is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		epq_path = GetExistingLocalJoinPath(joinrel);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks was not found");
  			return;
  		}
  	}
--- 4494,4514 ----
  	/*
  	 * If there is a possibility that EvalPlanQual will be executed, we need
  	 * to be able to reconstruct the row using scans of the base relations.
! 	 * CreateLocalJoinPath will build an alternative local join path for this
! 	 * purpose.  We must be careful to call it before calling foreign_join_ok,
! 	 * since that function updates fpinfo and marks it as pushable if the join
! 	 * is found to be pushable.
  	 */
  	if (root->parse->commandType == CMD_DELETE ||
  		root->parse->commandType == CMD_UPDATE ||
  		root->rowMarks)
  	{
! 		/* Create a local join path */
! 		epq_path = CreateLocalJoinPath(root, joinrel, outerrel, innerrel,
! 									   jointype, extra);
  		if (!epq_path)
  		{
! 			elog(DEBUG3, "could not push down foreign join because a local path suitable for EPQ checks could not be created");
  			return;
  		}
  	}
***************
*** 4517,4523 **** postgresGetForeignJoinPaths(PlannerInfo *root,
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we copied one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
--- 4517,4523 ----
  
  	if (!foreign_join_ok(root, joinrel, jointype, outerrel, innerrel, extra))
  	{
! 		/* Free path required for EPQ if we created one; we don't need it now */
  		if (epq_path)
  			pfree(epq_path);
  		return;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 493,498 **** SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
--- 493,505 ----
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
+ -- FOR UPDATE/SHARE in situations where a full join is pushed down
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1 = t3.c1)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (t2.c1::bit(16) = t3.c1::bit(16))) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, t2.c1, t3.c1 FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (ft4 t2 FULL JOIN ft5 t3 ON (FALSE)) ON (TRUE) ORDER BY t1.c1, t2.c1, t3.c1 FOR UPDATE OF t1;
  -- join in CTE
  EXPLAIN (VERBOSE, COSTS OFF)
  WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***************
*** 999,1009 **** RecheckForeignScan(ForeignScanState *node,
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>GetExistingLocalJoinPath</literal> may be used to search
!      existing paths for a suitable local join path, which can be used as the
!      alternative local join plan.  <literal>GetExistingLocalJoinPath</literal>
!      searches for an unparameterized path in the path list of the specified
!      join relation.  (If it does not find such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
--- 999,1007 ----
       can be executed and the resulting tuple can be stored in the slot.
       This plan need not be efficient since no base table will return more
       than one row; for example, it may implement all joins as nested loops.
!      The function <literal>CreateLocalJoinPath</literal> may be used to build
!      a suitable local join path, which can be used as the alternative
!      local join plan.  (If it does not build such a path, it returns NULL, in
       which case a foreign data wrapper may build the local path by itself or
       may choose not to create access paths for that join.)
      </para>
*** a/src/backend/foreign/foreign.c
--- b/src/backend/foreign/foreign.c
***************
*** 22,27 ****
--- 22,30 ----
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
  #include "miscadmin.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/paths.h"
  #include "utils/builtins.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
***************
*** 689,801 **** get_foreign_server_oid(const char *servername, bool missing_ok)
  	return oid;
  }
  
  /*
!  * Get a copy of an existing local path for a given join relation.
!  *
!  * This function is usually helpful to obtain an alternate local path for EPQ
!  * checks.
!  *
!  * Right now, this function only supports unparameterized foreign joins, so we
!  * only search for unparameterized path in the given list of paths. Since we
!  * are searching for a path which can be used to construct an alternative local
!  * plan for a foreign join, we look for only MergeJoin, HashJoin or NestLoop
!  * paths.
!  *
!  * If the inner or outer subpath of the chosen path is a ForeignScan, we
!  * replace it with its outer subpath.  For this reason, and also because the
!  * planner might free the original path later, the path returned by this
!  * function is a shallow copy of the original.  There's no need to copy
!  * the substructure, so we don't.
   *
!  * Since the plan created using this path will presumably only be used to
!  * execute EPQ checks, efficiency of the path is not a concern. But since the
!  * path list in RelOptInfo is anyway sorted by total cost we are likely to
!  * choose the most efficient path, which is all for the best.
   */
! extern Path *
! GetExistingLocalJoinPath(RelOptInfo *joinrel)
  {
! 	ListCell   *lc;
  
! 	Assert(IS_JOIN_REL(joinrel));
  
! 	foreach(lc, joinrel->pathlist)
  	{
! 		Path	   *path = (Path *) lfirst(lc);
! 		JoinPath   *joinpath = NULL;
  
! 		/* Skip parameterized paths. */
! 		if (path->param_info != NULL)
! 			continue;
  
! 		switch (path->pathtype)
! 		{
! 			case T_HashJoin:
  				{
! 					HashPath   *hash_path = makeNode(HashPath);
! 
! 					memcpy(hash_path, path, sizeof(HashPath));
! 					joinpath = (JoinPath *) hash_path;
  				}
! 				break;
! 
! 			case T_NestLoop:
  				{
! 					NestPath   *nest_path = makeNode(NestPath);
! 
! 					memcpy(nest_path, path, sizeof(NestPath));
! 					joinpath = (JoinPath *) nest_path;
! 				}
! 				break;
  
! 			case T_MergeJoin:
! 				{
! 					MergePath  *merge_path = makeNode(MergePath);
  
! 					memcpy(merge_path, path, sizeof(MergePath));
! 					joinpath = (JoinPath *) merge_path;
  				}
! 				break;
! 
! 			default:
! 
! 				/*
! 				 * Just skip anything else. We don't know if corresponding
! 				 * plan would build the output row from whole-row references
! 				 * of base relations and execute the EPQ checks.
! 				 */
! 				break;
! 		}
! 
! 		/* This path isn't good for us, check next. */
! 		if (!joinpath)
! 			continue;
! 
! 		/*
! 		 * If either inner or outer path is a ForeignPath corresponding to a
! 		 * pushed down join, replace it with the fdw_outerpath, so that we
! 		 * maintain path for EPQ checks built entirely of local join
! 		 * strategies.
! 		 */
! 		if (IsA(joinpath->outerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->outerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->outerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		if (IsA(joinpath->innerjoinpath, ForeignPath))
! 		{
! 			ForeignPath *foreign_path;
! 
! 			foreign_path = (ForeignPath *) joinpath->innerjoinpath;
! 			if (IS_JOIN_REL(foreign_path->path.parent))
! 				joinpath->innerjoinpath = foreign_path->fdw_outerpath;
! 		}
! 
! 		return (Path *) joinpath;
  	}
! 	return NULL;
  }
--- 692,875 ----
  	return oid;
  }
  
+ 
  /*
!  * Try to build an alternative local join path to reconstruct a join tuple for
!  * the foreign join in EvalPlanQual
   *
!  * 'joinrel' is the join relation
!  * 'outerrel' is the outer join relation
!  * 'innerrel' is the inner join relation
!  * 'jointype' is the type of join to do
!  * 'extra' contains additional input values
   */
! Path *
! CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra)
  {
! 	Path	   *result;
! 	Path	   *outer_path = outerrel->cheapest_total_path;
! 	Path	   *inner_path = innerrel->cheapest_total_path;
  
! 	/*
! 	 * We don't try to build the alternative local join path if the cheapest
! 	 * total outer and inner paths are parameterized; that seems unlikely in
! 	 * normal case.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return NULL;
  
! 	switch (jointype)
  	{
! 		case JOIN_INNER:
! 		case JOIN_LEFT:
! 		case JOIN_SEMI:
! 		case JOIN_ANTI:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Get an initial estimate */
! 				initial_cost_nestloop(root, &workspace, jointype,
! 									  outer_path, inner_path, extra);
! 				/* Generate a nestloop path */
! 				result = (Path *) create_nestloop_path(root,
! 													   joinrel,
! 													   jointype,
! 													   &workspace,
! 													   extra,
! 													   outer_path,
! 													   inner_path,
! 													   extra->restrictlist,
! 													   NIL,
! 													   NULL);
! 			}
! 			break;
! 		case JOIN_RIGHT:
! 		case JOIN_FULL:
! 			{
! 				JoinCostWorkspace workspace;
  
! 				/* Generate a hashjoin or mergejoin path, if possible */
! 				if (extra->hashclause_list)
  				{
! 					/* Get an initial estimate */
! 					initial_cost_hashjoin(root, &workspace, jointype,
! 										  extra->hashclause_list,
! 										  outer_path, inner_path,
! 										  extra, false);
! 					/* Generate a hashjoin path */
! 					result = (Path *) create_hashjoin_path(root,
! 														   joinrel,
! 														   jointype,
! 														   &workspace,
! 														   extra,
! 														   outer_path,
! 														   inner_path,
! 														   false,
! 														   extra->restrictlist,
! 														   NULL,
! 														   extra->hashclause_list);
  				}
! 				else if (extra->mergejoin_allowed)
  				{
! 					/*
! 					 * If right or full join with no mergejoinable clauses,
! 					 * create a clauseless mergejoin path.  Otherwise create a
! 					 * mergejoin path by explicitly sorting both the outer and
! 					 * inner relations.
! 					 */
! 					if (!extra->mergeclause_list)
! 					{
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   NIL, outer_path, inner_path,
! 											   NIL, NIL, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																NIL,
! 																NIL,
! 																NIL);
! 					}
! 					else
! 					{
! 						List	   *mergeclauses;
! 						List	   *outerkeys;
! 						List	   *innerkeys;
  
! 						/* Build sort pathkeys for the outer side */
! 						outerkeys = select_outer_pathkeys_for_merge(root,
! 																	extra->mergeclause_list,
! 																	joinrel);
! 						/* Sort the mergeclauses into the corresponding ordering */
! 						mergeclauses = find_mergeclauses_for_pathkeys(root,
! 																	  outerkeys,
! 																	  true,
! 																	  extra->mergeclause_list);
! 						/* Should have used them all... */
! 						Assert(list_length(mergeclauses) == list_length(extra->mergeclause_list));
! 						/* Build sort pathkeys for the inner side */
! 						innerkeys = make_inner_pathkeys_for_merge(root,
! 																  mergeclauses,
! 																  outerkeys);
! 						/*
! 						 * It's possible that the cheapest total paths will
! 						 * already be sorted properly; if so, suppress an
! 						 * explicit sort.
! 						 */
! 						if (outerkeys &&
! 							pathkeys_contained_in(outerkeys,
! 												  outer_path->pathkeys))
! 							outerkeys = NIL;
! 						if (innerkeys &&
! 							pathkeys_contained_in(innerkeys,
! 												  inner_path->pathkeys))
! 							innerkeys = NIL;
  
! 						/* Get an initial estimate */
! 						initial_cost_mergejoin(root, &workspace, jointype,
! 											   mergeclauses,
! 											   outer_path, inner_path,
! 											   outerkeys, innerkeys, extra);
! 						/* Generate a mergejoin path */
! 						result = (Path *) create_mergejoin_path(root,
! 																joinrel,
! 																jointype,
! 																&workspace,
! 																extra,
! 																outer_path,
! 																inner_path,
! 																extra->restrictlist,
! 																NIL,
! 																NULL,
! 																mergeclauses,
! 																outerkeys,
! 																innerkeys);
! 					}
  				}
! 				else
! 					result = NULL;
! 			}
! 			break;
! 		default:
! 			/* other values not expected here */
! 			elog(ERROR, "unrecognized join type: %d",
! 				 (int) jointype);
! 			result = NULL; /* keep compiler quiet */
! 			break;
  	}
! 
! 	return result;
  }
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 78,83 **** static List *select_mergejoin_clauses(PlannerInfo *root,
--- 78,87 ----
  						 List *restrictlist,
  						 JoinType jointype,
  						 bool *mergejoin_allowed);
+ static List *select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype);
  static void generate_mergejoin_paths(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *innerrel,
***************
*** 123,129 **** add_paths_to_joinrel(PlannerInfo *root,
  					 List *restrictlist)
  {
  	JoinPathExtraData extra;
- 	bool		mergejoin_allowed = true;
  	ListCell   *lc;
  	Relids		joinrelids;
  
--- 127,132 ----
***************
*** 143,148 **** add_paths_to_joinrel(PlannerInfo *root,
--- 146,153 ----
  	extra.mergeclause_list = NIL;
  	extra.sjinfo = sjinfo;
  	extra.param_source_rels = NULL;
+ 	extra.mergejoin_allowed = true;
+ 	extra.hashclause_list = NIL;
  
  	/*
  	 * See if the inner relation is provably unique for this outer rel.
***************
*** 200,206 **** add_paths_to_joinrel(PlannerInfo *root,
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 														  &mergejoin_allowed);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
--- 205,226 ----
  														  innerrel,
  														  restrictlist,
  														  jointype,
! 													&extra.mergejoin_allowed);
! 
! 	/*
! 	 * Find hashjoin clauses.  As above, skip this if we are not interested in
! 	 * doing a hashjoin, but disregard enable_hashjoin for full joins, because
! 	 * there may be no other alternative.
! 	 *
! 	 * Note: we need to build only one hashclause list for any given pair of
! 	 * outer and inner relations; all of the hashable clauses will be used as
! 	 * keys.
! 	 */
! 	if (enable_hashjoin || jointype == JOIN_FULL)
! 		extra.hashclause_list = select_hashjoin_clauses(outerrel,
! 														innerrel,
! 														restrictlist,
! 														jointype);
  
  	/*
  	 * If it's SEMI, ANTI, or inner_unique join, compute correction factors
***************
*** 263,269 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 283,289 ----
  	 * 1. Consider mergejoin paths where both relations must be explicitly
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
! 	if (extra.mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 274,280 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
--- 294,300 ----
  	 * (That's okay because we know that nestloop can't handle right/full
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  
***************
*** 291,297 **** add_paths_to_joinrel(PlannerInfo *root,
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
--- 311,317 ----
  	 * those made by match_unsorted_outer when add_paths_to_joinrel() is
  	 * invoked with the two rels given in the other order.
  	 */
! 	if (extra.mergejoin_allowed)
  		match_unsorted_inner(root, joinrel, outerrel, innerrel,
  							 jointype, &extra);
  #endif
***************
*** 1682,1722 **** hash_inner_and_outer(PlannerInfo *root,
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
! 	List	   *hashclauses;
! 	ListCell   *l;
! 
! 	/*
! 	 * We need to build only one hashclauses list for any given pair of outer
! 	 * and inner relations; all of the hashable clauses will be used as keys.
! 	 *
! 	 * Scan the join's restrictinfo list to find hashjoinable clauses that are
! 	 * usable with this pair of sub-relations.
! 	 */
! 	hashclauses = NIL;
! 	foreach(l, extra->restrictlist)
! 	{
! 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
! 
! 		/*
! 		 * If processing an outer join, only use its own join clauses for
! 		 * hashing.  For inner joins we need not be so picky.
! 		 */
! 		if (isouterjoin && restrictinfo->is_pushed_down)
! 			continue;
! 
! 		if (!restrictinfo->can_join ||
! 			restrictinfo->hashjoinoperator == InvalidOid)
! 			continue;			/* not hashjoinable */
! 
! 		/*
! 		 * Check if clause has the form "outer op inner" or "inner op outer".
! 		 */
! 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
! 			continue;			/* no good for these input relations */
! 
! 		hashclauses = lappend(hashclauses, restrictinfo);
! 	}
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
--- 1702,1708 ----
  					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
! 	List	   *hashclauses = extra->hashclause_list;
  
  	/* If we found any usable hashclauses, make paths */
  	if (hashclauses)
***************
*** 2024,2026 **** select_mergejoin_clauses(PlannerInfo *root,
--- 2010,2062 ----
  
  	return result_list;
  }
+ 
+ /*
+  * select_hashjoin_clauses
+  *	  Select hashjoin clauses that are usable for a particular join.
+  *	  Returns a list of RestrictInfo nodes for those clauses.
+  *
+  * We mark each selected RestrictInfo to show which side is currently
+  * being considered as outer.  These are transient markings that are only
+  * good for the duration of the current add_paths_to_joinrel() call!
+  *
+  * We examine each restrictinfo clause known for the join to see
+  * if it is hashjoinable and involves vars from the two sub-relations
+  * currently of interest.
+  */
+ static List *
+ select_hashjoin_clauses(RelOptInfo *outerrel,
+ 						RelOptInfo *innerrel,
+ 						List *restrictlist,
+ 						JoinType jointype)
+ {
+ 	List	   *hashclauses = NIL;
+ 	bool		isouterjoin = IS_OUTER_JOIN(jointype);
+ 	ListCell   *l;
+ 
+ 	foreach(l, restrictlist)
+ 	{
+ 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+ 
+ 		/*
+ 		 * If processing an outer join, only use its own join clauses for
+ 		 * hashing.  For inner joins we need not be so picky.
+ 		 */
+ 		if (isouterjoin && restrictinfo->is_pushed_down)
+ 			continue;
+ 
+ 		if (!restrictinfo->can_join ||
+ 			restrictinfo->hashjoinoperator == InvalidOid)
+ 			continue;			/* not hashjoinable */
+ 
+ 		/*
+ 		 * Check if clause has the form "outer op inner" or "inner op outer".
+ 		 */
+ 		if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
+ 			continue;			/* no good for these input relations */
+ 
+ 		hashclauses = lappend(hashclauses, restrictinfo);
+ 	}
+ 
+ 	return hashclauses;
+ }
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
***************
*** 247,252 **** extern FdwRoutine *GetFdwRoutineByRelId(Oid relid);
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *GetExistingLocalJoinPath(RelOptInfo *joinrel);
  
  #endif							/* FDWAPI_H */
--- 247,257 ----
  extern FdwRoutine *GetFdwRoutineForRelation(Relation relation, bool makecopy);
  extern bool IsImportableForeignTable(const char *tablename,
  						 ImportForeignSchemaStmt *stmt);
! extern Path *CreateLocalJoinPath(PlannerInfo *root,
! 					RelOptInfo *joinrel,
! 					RelOptInfo *outerrel,
! 					RelOptInfo *innerrel,
! 					JoinType jointype,
! 					JoinPathExtraData *extra);
  
  #endif							/* FDWAPI_H */
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 2271,2276 **** typedef struct SemiAntiJoinFactors
--- 2271,2279 ----
   * sjinfo is extra info about special joins for selectivity estimation
   * semifactors is as shown above (only valid for SEMI/ANTI/inner_unique joins)
   * param_source_rels are OK targets for parameterization of result paths
+  * mergejoin_allowed is a flag to indicate whether mergejoins are allowed
+  * hashclause_list is a list of RestrictInfo nodes for available
+  *		hashjoin clauses in this join
   */
  typedef struct JoinPathExtraData
  {
***************
*** 2280,2285 **** typedef struct JoinPathExtraData
--- 2283,2290 ----
  	SpecialJoinInfo *sjinfo;
  	SemiAntiJoinFactors semifactors;
  	Relids		param_source_rels;
+ 	bool		mergejoin_allowed;
+ 	List	   *hashclause_list;
  } JoinPathExtraData;
  
  /*

#103

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#97)

1 attachment(s)

Re: [HACKERS] postgres_fdw bug in 9.6

On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I thought some more about this. While it seems clear that we don't
actually need to recheck anything within a postgres_fdw foreign join,
there's still a problem: we have to be able to regurgitate the join
row during postgresRecheckForeignScan. Since, in the current system
design, the only available data is scan-level rowmarks (that is,
whole-row values from each base foreign relation), there isn't any
very good way to produce the join row except to insert the rowmark
values into dummy scan nodes and then re-execute the join locally.
So really, that is what the outerPlan infrastructure is doing for us.

I started to look at this patch again today and got cold feet. It
seems to me that this entire design on the posted patch is based on
your remarks in /messages/by-id/13242.1481582736@sss.pgh.pa.us --

# One way that we could make things better is to rely on the knowledge
# that EPQ isn't asked to evaluate joins for more than one row per input
# relation, and therefore using merge or hash join technology is really
# overkill. We could make a tree of simple nestloop joins, which aren't
# going to care about sort order, if we could identify the correct join
# clauses to apply.

However, those remarks are almost entirely wrong. It's true that the
relations for which we have EPQ tuples will only ever output a single
tuple, but because of what you said in the quoted text above, this
does NOT imply that the recheck plan never needs to worry about
dealing with a large number of rows, as the following example shows.
(The part about making a tree of simple nestloop joins is wrong not
only for this reason but because it won't work in the executor, as
previously discussed.)

rhaas=# explain select * from foo join (bar join baz on bar.a = baz.a)
on foo.a = bar.a where foo.b = 'hello' for update;
QUERY PLAN
-----------------------------------------------------------------------------------------------
LockRows (cost=109.02..121.46 rows=1 width=162)
-> Merge Join (cost=109.02..121.45 rows=1 width=162)
Merge Cond: (bar.a = foo.a)
-> Foreign Scan (cost=100.58..4914.58 rows=40000 width=119)
Relations: (public.bar) INNER JOIN (public.baz)
-> Hash Join (cost=2556.00..4962.00 rows=40000 width=119)
Hash Cond: (bar.a = baz.a)
-> Foreign Scan on bar (cost=100.00..1956.00
rows=40000 width=28)
-> Hash (cost=1956.00..1956.00 rows=40000 width=27)
-> Foreign Scan on baz
(cost=100.00..1956.00 rows=40000 width=27)
-> Sort (cost=8.44..8.45 rows=1 width=28)
Sort Key: foo.a
-> Index Scan using foo_b_idx on foo (cost=0.41..8.43
rows=1 width=28)
Index Cond: (b = 'hello'::text)

There's really nothing to prevent the bar/baz join from producing
arbitrarily many rows, which means that it's not good to turn the
recheck plan into a Nested Loop unconditionally -- and come to think
of it, I'm pretty sure this kind of case is why
GetExistingLocalJoinPath() tries to using an existing path: it wants a
path that isn't going suck if it gets executed.

After some thought, it seems that there's a much simpler way that we
could fix the problem you identified in that original email: if the
EPQ path isn't properly sorted, have postgres_fdw's
add_paths_with_pathkeys_for_rel stick a Sort node on top of it. I
tried this and it does indeed fix Jeff Janes' initial test case.
Patch attached. One could do better here if one wanted to revise
GetExistingLocalJoinPath() to take Last *pathkeys as an additional
argument; we could then try to find the optimal EPQ path for each
possible set of sortkeys. But I don't feel the need to work on that
right now, and it would be nicer not to back-patch a change to the
signature of GetExistingLocalJoinPath(), as has already been noted.

It is altogether possible that this doesn't fix every issue with the
GetExistingLocalJoinPath interface, and perhaps that is due for a
broader rewrite. However, I think that more recent discussion has
cast some doubt on the fiery criticism of that previous note. That
note asserted that we'd have trouble with join nests containing more
than 2 joins, but as was pointed out at the time, that's not really
true, because if we happen to pull a join out of the pathlist that has
ForeignScan children, we'll fetch out the EPQ-recheck subpaths that
they created and use those instead. And there are other problems with
the analysis in that email too as noted above. The only problem that
*has actually been demonstrated* is that we can end up using as an EPQ
path something that doesn't generate the right sort order, and that is
easily fixed by making it do so, which the attached patch does. So I
propose we commit and back-patch this and move on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

sort-epq-path-if-needed.patchapplication/octet-stream; name=sort-epq-path-if-needed.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 683d641fa7..72fe4ab8fd 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1371,24 +1371,27 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
                Remote SQL: SELECT s8.c1, s8.c2, s9.c1, s9.c2 FROM ((SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s8(c1, c2) FULL JOIN (SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s9(c1, c2) ON (((s8.c1 = s9.c1)))) WHERE (((s8.c1 IS NULL) OR (s8.c1 IS NOT NULL))) ORDER BY s8.c1 ASC NULLS LAST, s9.c1 ASC NULLS LAST
-               ->  Hash Full Join
+               ->  Sort
                      Output: ft4.c1, ft4.*, ft5.c1, ft5.*
-                     Hash Cond: (ft4.c1 = ft5.c1)
-                     Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
-                     ->  Foreign Scan on public.ft4
-                           Output: ft4.c1, ft4.*
-                           Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))
-                     ->  Hash
-                           Output: ft5.c1, ft5.*
-                           ->  Foreign Scan on public.ft5
+                     Sort Key: ft4.c1, ft5.c1
+                     ->  Hash Full Join
+                           Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                           Hash Cond: (ft4.c1 = ft5.c1)
+                           Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
+                           ->  Foreign Scan on public.ft4
+                                 Output: ft4.c1, ft4.*
+                                 Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+                           ->  Hash
                                  Output: ft5.c1, ft5.*
-                                 Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+                                 ->  Foreign Scan on public.ft5
+                                       Output: ft5.c1, ft5.*
+                                       Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))
          ->  Materialize
                Output: "T 3".c1, "T 3".ctid
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(25 rows)
+(28 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -1701,22 +1704,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
  c1  | c1  
@@ -1745,22 +1751,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
  c1  | c1  
@@ -1790,22 +1799,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
  c1  | c1  
@@ -1834,22 +1846,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  c1  | c1  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 7992ba5852..9c899f30fb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4330,10 +4330,23 @@ add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 		Cost		startup_cost;
 		Cost		total_cost;
 		List	   *useful_pathkeys = lfirst(lc);
+		Path	   *sorted_epq_path;
 
 		estimate_path_cost_size(root, rel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
+		/* epq_path might not be properly sorted */
+		sorted_epq_path = epq_path;
+		if (sorted_epq_path != NULL &&
+			!pathkeys_contained_in(useful_pathkeys,
+								   sorted_epq_path->pathkeys))
+			sorted_epq_path = (Path *)
+				create_sort_path(root,
+								 rel,
+								 sorted_epq_path,
+								 useful_pathkeys,
+								 -1.0);
+
 		add_path(rel, (Path *)
 				 create_foreignscan_path(root, rel,
 										 NULL,
@@ -4342,7 +4355,7 @@ add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 										 total_cost,
 										 useful_pathkeys,
 										 NULL,
-										 epq_path,
+										 sorted_epq_path,
 										 NIL));
 	}
 }

#104

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Robert Haas (#103)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/13 6:07), Robert Haas wrote:

On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

I thought some more about this. While it seems clear that we don't
actually need to recheck anything within a postgres_fdw foreign join,
there's still a problem: we have to be able to regurgitate the join
row during postgresRecheckForeignScan. Since, in the current system
design, the only available data is scan-level rowmarks (that is,
whole-row values from each base foreign relation), there isn't any
very good way to produce the join row except to insert the rowmark
values into dummy scan nodes and then re-execute the join locally.
So really, that is what the outerPlan infrastructure is doing for us.

I started to look at this patch again today and got cold feet. It
seems to me that this entire design on the posted patch is based on
your remarks in /messages/by-id/13242.1481582736@sss.pgh.pa.us --

# One way that we could make things better is to rely on the knowledge
# that EPQ isn't asked to evaluate joins for more than one row per input
# relation, and therefore using merge or hash join technology is really
# overkill. We could make a tree of simple nestloop joins, which aren't
# going to care about sort order, if we could identify the correct join
# clauses to apply.

That's right.

However, those remarks are almost entirely wrong. It's true that the
relations for which we have EPQ tuples will only ever output a single
tuple, but because of what you said in the quoted text above, this
does NOT imply that the recheck plan never needs to worry about
dealing with a large number of rows, as the following example shows.
(The part about making a tree of simple nestloop joins is wrong not
only for this reason but because it won't work in the executor, as
previously discussed.)

rhaas=# explain select * from foo join (bar join baz on bar.a = baz.a)
on foo.a = bar.a where foo.b = 'hello' for update;
QUERY PLAN
-----------------------------------------------------------------------------------------------
LockRows (cost=109.02..121.46 rows=1 width=162)
-> Merge Join (cost=109.02..121.45 rows=1 width=162)
Merge Cond: (bar.a = foo.a)
-> Foreign Scan (cost=100.58..4914.58 rows=40000 width=119)
Relations: (public.bar) INNER JOIN (public.baz)
-> Hash Join (cost=2556.00..4962.00 rows=40000 width=119)
Hash Cond: (bar.a = baz.a)
-> Foreign Scan on bar (cost=100.00..1956.00
rows=40000 width=28)
-> Hash (cost=1956.00..1956.00 rows=40000 width=27)
-> Foreign Scan on baz
(cost=100.00..1956.00 rows=40000 width=27)
-> Sort (cost=8.44..8.45 rows=1 width=28)
Sort Key: foo.a
-> Index Scan using foo_b_idx on foo (cost=0.41..8.43
rows=1 width=28)
Index Cond: (b = 'hello'::text)

There's really nothing to prevent the bar/baz join from producing
arbitrarily many rows, which means that it's not good to turn the
recheck plan into a Nested Loop unconditionally -- and come to think
of it, I'm pretty sure this kind of case is why
GetExistingLocalJoinPath() tries to using an existing path: it wants a
path that isn't going suck if it gets executed.

Yeah, but I don't think the above example is good enough to explain
that, because I think the bar/baz join would produce at most one tuple
in an EPQ recheck since we would have only one EPQ tuple for both bar
and baz in that recheck, and the join is inner. I think such an example
would probably be given e.g., by a modified version of the SQL where we
have a full join of bar and baz, not an inner join.

After some thought, it seems that there's a much simpler way that we
could fix the problem you identified in that original email: if the
EPQ path isn't properly sorted, have postgres_fdw's
add_paths_with_pathkeys_for_rel stick a Sort node on top of it. I
tried this and it does indeed fix Jeff Janes' initial test case.
Patch attached.

The patch looks good to me.

One could do better here if one wanted to revise
GetExistingLocalJoinPath() to take Last *pathkeys as an additional
argument; we could then try to find the optimal EPQ path for each
possible set of sortkeys. But I don't feel the need to work on that
right now, and it would be nicer not to back-patch a change to the
signature of GetExistingLocalJoinPath(), as has already been noted.

That makes sense to me.

The only problem that
*has actually been demonstrated* is that we can end up using as an EPQ
path something that doesn't generate the right sort order, and that is
easily fixed by making it do so, which the attached patch does. So I
propose we commit and back-patch this and move on.

Agreed.

Best regards,
Etsuro Fujita

#105

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Etsuro Fujita (#104)

Re: [HACKERS] postgres_fdw bug in 9.6

On Mon, Jan 15, 2018 at 3:09 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Yeah, but I don't think the above example is good enough to explain that,
because I think the bar/baz join would produce at most one tuple in an EPQ
recheck since we would have only one EPQ tuple for both bar and baz in that
recheck, and the join is inner. I think such an example would probably be
given e.g., by a modified version of the SQL where we have a full join of
bar and baz, not an inner join.

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case. However, that would also be true for
a full join, wouldn't it?

Regardless of that, the patch fixes the reported problem with very
little code change, and somebody can always improve it further later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#106

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Robert Haas (#103)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

I started to look at this patch again today and got cold feet. It
seems to me that this entire design on the posted patch is based on
your remarks in /messages/by-id/13242.1481582736@sss.pgh.pa.us --

# One way that we could make things better is to rely on the knowledge
# that EPQ isn't asked to evaluate joins for more than one row per input
# relation, and therefore using merge or hash join technology is really
# overkill. We could make a tree of simple nestloop joins, which aren't
# going to care about sort order, if we could identify the correct join
# clauses to apply.

However, those remarks are almost entirely wrong.

Hm, one of us is confused. Possibly it's me, but:

It's true that the
relations for which we have EPQ tuples will only ever output a single
tuple,

Yup, and that is all of them. We should have a tuple identity
(ctid or whole-row) for every base relation in an EPQ test.

but because of what you said in the quoted text above, this
does NOT imply that the recheck plan never needs to worry about
dealing with a large number of rows, as the following example shows.

This example doesn't prove anything whatsoever about what happens during
EPQ, though, or at least it shouldn't. If we're examining more than one
row then we're doing it wrong, because the point of the EPQ evaluation
is only to determine whether one single joined row (still) meets the
query conditions.

After some thought, it seems that there's a much simpler way that we
could fix the problem you identified in that original email: if the
EPQ path isn't properly sorted, have postgres_fdw's
add_paths_with_pathkeys_for_rel stick a Sort node on top of it. I
tried this and it does indeed fix Jeff Janes' initial test case.

Hm. Simple is certainly good, but if there's multiple rows coming
back during an EPQ recheck then I think we have a performance problem.

regards, tom lane

#107

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#106)

Re: [HACKERS] postgres_fdw bug in 9.6

On Mon, Jan 15, 2018 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After some thought, it seems that there's a much simpler way that we
could fix the problem you identified in that original email: if the
EPQ path isn't properly sorted, have postgres_fdw's
add_paths_with_pathkeys_for_rel stick a Sort node on top of it. I
tried this and it does indeed fix Jeff Janes' initial test case.

Hm. Simple is certainly good, but if there's multiple rows coming
back during an EPQ recheck then I think we have a performance problem.

You are correct ... I was wrong about that part, and said so in an
email on this thread sent about 45 minutes before yours. However, I
still think the patch is a good fix for the immediate issue, unless
you see some problem with it. It's simple and back-patchable and does
not preclude further work anybody, including you, might want to do in
the future.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#108

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Robert Haas (#107)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jan 15, 2018 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. Simple is certainly good, but if there's multiple rows coming
back during an EPQ recheck then I think we have a performance problem.

You are correct ... I was wrong about that part, and said so in an
email on this thread sent about 45 minutes before yours. However, I
still think the patch is a good fix for the immediate issue, unless
you see some problem with it. It's simple and back-patchable and does
not preclude further work anybody, including you, might want to do in
the future.

I spent some more time poking at this. I confirmed that there is *not*
a performance problem: as I'd thought, during an EPQ recheck each base
scan relation will return its single EPQ tuple, without any communication
with the remote server, whether or not the rel was marked FOR UPDATE.

There *is* a problem with GetExistingLocalJoinPath not honoring its
API spec: it will sometimes return join nests that include remote
joins at levels below the top, as I'd speculated to begin with.
This turns out not to be a problem for postgres_fdw, because any
such remote-join node will just recursively fob off EPQ checks
onto its own outerpath, so that eventually you get down to base
scan relations anyway, and everything works. However, because
GetExistingLocalJoinPath is claimed to be a general purpose function
useful for other FDWs, the fact that its API contract is a lie
seems to me to be a problem anyway. Another FDW might have a stronger
dependency on there not being remote sub-joins. (Reviewing the thread
history, I see that Ashutosh agreed this was possible at
<CAFjFpRfYETZLiKERpxCeAN1MsiJypsyNd6x9FQ7OWjY=_TGK2Q@mail.gmail.com>
and suggested that we just change the function's commentary to not
make promises it won't keep. Maybe that's enough, but I think just
walking away isn't enough.)

I'm also still pretty unhappy with the amount of useless planning work
caused by doing GetExistingLocalJoinPath during path creation. It strikes
me that we could likely replace the entire thing with some code that just
reconstructs the join node's output tuple during EPQ using the rowmark
data for all the base relations. Outer joins aren't really a problem:
we could tell which relations were replaced by nulls because the rowmark
values that bubbled up to the top went to nulls themselves. However,
that's a nontrivial amount of work and probably wouldn't result in
something we cared to back-patch, especially since it's not really a bug
fix.

Anyway, I agree with you that we know of no observable bug here except
for Jeff's original complaint, and forcing sorting of the EPQ paths
would be enough to fix that.

Your patch seems OK codewise, but I think the comment is not nearly
adequate. Maybe something like "The EPQ path must be at least as well
sorted as the path itself, in case it gets used as input to a mergejoin."

Also, a regression test case would be appropriate perhaps. I tried
to reproduce Jeff's complaint in the postgres_fdw regression database,
and it actually failed on the very first multi-way join I tried:

contrib_regression=# explain select * from ft1,ft2,ft4,ft5 where ft1.c1=ft2.c1 and ft1.c1=ft4.c1 and ft1.c1=ft5.c1 for update;
ERROR: outer pathkeys do not match mergeclauses

Also, for the archives' sake, here's an example showing that
GetExistingLocalJoinPath is not doing what it says on the tin:

contrib_regression=# set from_collapse_limit TO 1;
SET
contrib_regression=# set enable_hashjoin TO 0;
SET
contrib_regression=# set enable_mergejoin TO 0;
SET
contrib_regression=# explain select ft1.c1,ft4.* from ft1,ft2 left join ft4 on ft2.c1=ft4.c1,ft5,loc1 where ft1.c1=ft2.c1 and ft1.c1=ft5.c1 and ft1.c1=loc1.f1 for update of ft1,loc1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
LockRows (cost=105.72..781.42 rows=210 width=286)
-> Nested Loop (cost=105.72..779.32 rows=210 width=286)
Join Filter: (ft2.c1 = loc1.f1)
-> Seq Scan on loc1 (cost=0.00..22.70 rows=1270 width=10)
-> Materialize (cost=105.72..128.06 rows=33 width=179)
-> Foreign Scan (cost=105.72..127.89 rows=33 width=179)
Relations: ((public.ft1) INNER JOIN ((public.ft2) LEFT JOIN (public.ft4))) INNER JOIN (public.ft5)
-> Nested Loop (cost=233.62..721.53 rows=136 width=179)
Join Filter: (ft2.c1 = ft5.c1)
-> Nested Loop (cost=202.12..12631.43 rows=822 width=137)
Join Filter: (ft1.c1 = ft2.c1)
-> Foreign Scan on ft1 (cost=100.00..141.00 rows=1000 width=75)
-> Materialize (cost=102.12..162.48 rows=822 width=83)
-> Foreign Scan (cost=102.12..158.37 rows=822 width=83)
Relations: (public.ft2) LEFT JOIN (public.ft4)
-> Nested Loop Left Join (cost=200.00..856.78 rows=822 width=83)
Join Filter: (ft2.c1 = ft4.c1)
-> Foreign Scan on ft2 (cost=100.00..137.66 rows=822 width=54)
-> Materialize (cost=100.00..102.75 rows=50 width=54)
-> Foreign Scan on ft4 (cost=100.00..102.50 rows=50 width=54)
-> Materialize (cost=100.00..102.16 rows=33 width=43)
-> Foreign Scan on ft5 (cost=100.00..101.99 rows=33 width=43)
(22 rows)

I don't think this latter example needs to become a regression test
in this form. If we wanted to gear up an isolation test that
actually exercised EPQ with postgres_fdw, maybe it'd be useful
in that context, to prove that nested remote joins work for EPQ.

regards, tom lane

#109

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Robert Haas (#105)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/16 1:47), Robert Haas wrote:

On Mon, Jan 15, 2018 at 3:09 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Yeah, but I don't think the above example is good enough to explain that,
because I think the bar/baz join would produce at most one tuple in an EPQ
recheck since we would have only one EPQ tuple for both bar and baz in that
recheck, and the join is inner. I think such an example would probably be
given e.g., by a modified version of the SQL where we have a full join of
bar and baz, not an inner join.

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case. However, that would also be true for
a full join, wouldn't it?

Consider:

postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
a | b | a | b
---+-----+---+-----
1 | bar | |
| | 2 | baz
(2 rows)

Both relations have one tuple, but the full join produces two join
tuples. I think it would be possible that something like this happens
when executing a local join plan for a foreign join that performs a full
join remotely.

Regardless of that, the patch fixes the reported problem with very
little code change, and somebody can always improve it further later.

Agreed.

Best regards,
Etsuro Fujita

#110

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Etsuro Fujita (#109)

Re: [HACKERS] postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

(2018/01/16 1:47), Robert Haas wrote:

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case. However, that would also be true for
a full join, wouldn't it?

Consider:

postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
a | b | a | b
---+-----+---+-----
1 | bar | |
| | 2 | baz
(2 rows)

Both relations have one tuple, but the full join produces two join
tuples. I think it would be possible that something like this happens
when executing a local join plan for a foreign join that performs a full
join remotely.

Doesn't really matter though, does it? Each of those join rows will
be processed as a separate EPQ event.

regards, tom lane

#111

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Tom Lane (#110)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/16 11:17), Tom Lane wrote:

Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> writes:

(2018/01/16 1:47), Robert Haas wrote:

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case. However, that would also be true for
a full join, wouldn't it?

Consider:

postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
a | b | a | b
---+-----+---+-----
1 | bar | |
| | 2 | baz
(2 rows)

Both relations have one tuple, but the full join produces two join
tuples. I think it would be possible that something like this happens
when executing a local join plan for a foreign join that performs a full
join remotely.

Doesn't really matter though, does it? Each of those join rows will
be processed as a separate EPQ event.

I assume that such a local join plan is executed as part of a FOR UPDATE
query like the one shown by Robert (the bar/baz foreign join part in
that query), so I am thinking that those join rows will be processed as
a single event.

Best regards,
Etsuro Fujita

#112

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Etsuro Fujita (#111)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/16 12:00), Etsuro Fujita wrote:

(2018/01/16 11:17), Tom Lane wrote:

Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> writes:

(2018/01/16 1:47), Robert Haas wrote:

Hmm, I was thinking that bar and baz wouldn't be constrained to return
just one tuple in that case, but I'm wrong: there would just be one
tuple per relation in that case. However, that would also be true for
a full join, wouldn't it?

Consider:

postgres=# create table bar (a int, b text);
postgres=# create table baz (a int, b text);
postgres=# insert into bar values (1, 'bar');
postgres=# insert into baz values (2, 'baz');
postgres=# select * from bar full join baz on bar.a = baz.a;
a | b | a | b
---+-----+---+-----
1 | bar | |
| | 2 | baz
(2 rows)

Both relations have one tuple, but the full join produces two join
tuples. I think it would be possible that something like this happens
when executing a local join plan for a foreign join that performs a full
join remotely.

Doesn't really matter though, does it? Each of those join rows will
be processed as a separate EPQ event.

I assume that such a local join plan is executed as part of a FOR UPDATE
query like the one shown by Robert (the bar/baz foreign join part in
that query), so I am thinking that those join rows will be processed as
a single event.

I realized I am wrong; the local join execution plan would never produce
multiple tuples in a single event.

Best regards,
Etsuro Fujita

#113

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Tom Lane (#108)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/16 6:38), Tom Lane wrote:

Robert Haas<robertmhaas@gmail.com> writes:

On Mon, Jan 15, 2018 at 12:31 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Hm. Simple is certainly good, but if there's multiple rows coming
back during an EPQ recheck then I think we have a performance problem.

You are correct ... I was wrong about that part, and said so in an
email on this thread sent about 45 minutes before yours.

I was wrong too.

However, I
still think the patch is a good fix for the immediate issue, unless
you see some problem with it. It's simple and back-patchable and does
not preclude further work anybody, including you, might want to do in
the future.

I still think so too.

I'm also still pretty unhappy with the amount of useless planning work
caused by doing GetExistingLocalJoinPath during path creation. It strikes
me that we could likely replace the entire thing with some code that just
reconstructs the join node's output tuple during EPQ using the rowmark
data for all the base relations.

The join tuple would be reconstructed without a local join execution plan?

Outer joins aren't really a problem:
we could tell which relations were replaced by nulls because the rowmark
values that bubbled up to the top went to nulls themselves.

Yeah, but we would need null-extension or projection...

However,
that's a nontrivial amount of work and probably wouldn't result in
something we cared to back-patch, especially since it's not really a bug
fix.

What do you think about a future extension to parameterized foreign paths?

Best regards,
Etsuro Fujita

#114

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Etsuro Fujita (#113)

Re: [HACKERS] postgres_fdw bug in 9.6

Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:

(2018/01/16 6:38), Tom Lane wrote:

I'm also still pretty unhappy with the amount of useless planning work
caused by doing GetExistingLocalJoinPath during path creation. It strikes
me that we could likely replace the entire thing with some code that just
reconstructs the join node's output tuple during EPQ using the rowmark
data for all the base relations.

The join tuple would be reconstructed without a local join execution plan?

It seems clear to me that this could be done in principle.

Outer joins aren't really a problem:
we could tell which relations were replaced by nulls because the rowmark
values that bubbled up to the top went to nulls themselves.

Yeah, but we would need null-extension or projection...

Yes, the question is how complicated it would be to get the projection
right. As long as the join node's scan tuple only needs to contain values
(possibly whole-row Vars) propagated up from base relations and possibly
nulled out, it seems like it's not very hard. But if we ever try to
implement actual calculation of expressions in intermediate join nodes,
it could get to be a mess. (I haven't looked at the aggregate-pushdown
patch in this connection, although probably that's not a hazard since
we don't support aggregation together with FOR UPDATE. Perhaps we could
likewise simply refuse to push expressions into foreign join nests if
FOR UPDATE is active.)

As against that, the existing EPQ implementation is pretty messy already,
on top of consuming a lot of planning cycles.

What do you think about a future extension to parameterized foreign paths?

Don't have an opinion on that; but it doesn't seem like EPQ is the
principal blocking factor there. You could always skip generating
parameterized foreign paths if FOR UPDATE is needed.

regards, tom lane

#115

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#108)

1 attachment(s)

Re: [HACKERS] postgres_fdw bug in 9.6

On Mon, Jan 15, 2018 at 4:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There *is* a problem with GetExistingLocalJoinPath not honoring its
API spec: it will sometimes return join nests that include remote
joins at levels below the top, as I'd speculated to begin with.
This turns out not to be a problem for postgres_fdw, because any
such remote-join node will just recursively fob off EPQ checks
onto its own outerpath, so that eventually you get down to base
scan relations anyway, and everything works. However, because
GetExistingLocalJoinPath is claimed to be a general purpose function
useful for other FDWs, the fact that its API contract is a lie
seems to me to be a problem anyway.

Yeah, that's not good. The idea, as the comments inside
GetExistingLocalJoinPath say, is that if we find an existing join
whose inner or outer side is a ForeignPath for a join, we fish out the
outer path which should be that subpath's EPQ path. I'm not 100% sure
why that's not happening in your example, but my first guess is that
it's because the inner path has a Materialize node on top of the
join-ForeignPath, and GetExistingLocalJoinPath just sees the
Materialize node and concludes that it doesn't need to look further.
If that's correct, bet we could fix that by telling it to drill
through any MaterialPath it sees and use the sub-path instead; given
that there is only one tuple per relation, the Materialize step can't
be important for performance.

I'm also still pretty unhappy with the amount of useless planning work
caused by doing GetExistingLocalJoinPath during path creation. It strikes
me that we could likely replace the entire thing with some code that just
reconstructs the join node's output tuple during EPQ using the rowmark
data for all the base relations. Outer joins aren't really a problem:
we could tell which relations were replaced by nulls because the rowmark
values that bubbled up to the top went to nulls themselves. However,
that's a nontrivial amount of work and probably wouldn't result in
something we cared to back-patch, especially since it's not really a bug
fix.

Yes, I believe something like this would be possible. I did consider
that approach, but this seemed easier to implement and, as it was, it
took two release cycles to get join pushdown committed. And multiple
hundreds of emails, most of them about EPQ, if I remember correctly.
I'm happy to have somebody implement something better, but I will
probably get a bit hypoxic if I hold my breath waiting for it to
happen.

Your patch seems OK codewise, but I think the comment is not nearly
adequate. Maybe something like "The EPQ path must be at least as well
sorted as the path itself, in case it gets used as input to a mergejoin."

Also, a regression test case would be appropriate perhaps. I tried
to reproduce Jeff's complaint in the postgres_fdw regression database,
and it actually failed on the very first multi-way join I tried:

contrib_regression=# explain select * from ft1,ft2,ft4,ft5 where ft1.c1=ft2.c1 and ft1.c1=ft4.c1 and ft1.c1=ft5.c1 for update;
ERROR: outer pathkeys do not match mergeclauses

Thanks for the review. Updated patch attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

sort-epq-path-if-needed-v2.patchapplication/octet-stream; name=sort-epq-path-if-needed-v2.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 993219133a..f88e0a2d52 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1371,24 +1371,27 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
                Remote SQL: SELECT s8.c1, s8.c2, s9.c1, s9.c2 FROM ((SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s8(c1, c2) FULL JOIN (SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s9(c1, c2) ON (((s8.c1 = s9.c1)))) WHERE (((s8.c1 IS NULL) OR (s8.c1 IS NOT NULL))) ORDER BY s8.c1 ASC NULLS LAST, s9.c1 ASC NULLS LAST
-               ->  Hash Full Join
+               ->  Sort
                      Output: ft4.c1, ft4.*, ft5.c1, ft5.*
-                     Hash Cond: (ft4.c1 = ft5.c1)
-                     Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
-                     ->  Foreign Scan on public.ft4
-                           Output: ft4.c1, ft4.*
-                           Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))
-                     ->  Hash
-                           Output: ft5.c1, ft5.*
-                           ->  Foreign Scan on public.ft5
+                     Sort Key: ft4.c1, ft5.c1
+                     ->  Hash Full Join
+                           Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+                           Hash Cond: (ft4.c1 = ft5.c1)
+                           Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
+                           ->  Foreign Scan on public.ft4
+                                 Output: ft4.c1, ft4.*
+                                 Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+                           ->  Hash
                                  Output: ft5.c1, ft5.*
-                                 Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))
+                                 ->  Foreign Scan on public.ft5
+                                       Output: ft5.c1, ft5.*
+                                       Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))
          ->  Materialize
                Output: "T 3".c1, "T 3".ctid
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(25 rows)
+(28 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -1701,22 +1704,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
  c1  | c1  
@@ -1745,22 +1751,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
  c1  | c1  
@@ -1790,22 +1799,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
  c1  | c1  
@@ -1834,22 +1846,25 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
                Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
-               ->  Merge Join
+               ->  Sort
                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
-                     Merge Cond: (t1.c1 = t2.c1)
-                     ->  Sort
-                           Output: t1.c1, t1.c3, t1.*
-                           Sort Key: t1.c1
-                           ->  Foreign Scan on public.ft1 t1
+                     Sort Key: t1.c3, t1.c1
+                     ->  Merge Join
+                           Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
+                           Merge Cond: (t1.c1 = t2.c1)
+                           ->  Sort
                                  Output: t1.c1, t1.c3, t1.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
-                     ->  Sort
-                           Output: t2.c1, t2.*
-                           Sort Key: t2.c1
-                           ->  Foreign Scan on public.ft2 t2
+                                 Sort Key: t1.c1
+                                 ->  Foreign Scan on public.ft1 t1
+                                       Output: t1.c1, t1.c3, t1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
+                           ->  Sort
                                  Output: t2.c1, t2.*
-                                 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
-(23 rows)
+                                 Sort Key: t2.c1
+                                 ->  Foreign Scan on public.ft2 t2
+                                       Output: t2.c1, t2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
+(26 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
  c1  | c1  
@@ -2314,6 +2329,75 @@ SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5
  (30,31,AAA030) | 30 | 31 | AAA030 | 30 | 31
 (4 rows)
 
+-- multi-way join involving multiple merge joins
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
+                                                                                                                                                                                                                                                                                                                                                                                                         QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ LockRows
+   Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
+   ->  Foreign Scan
+         Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
+         Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
+         Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2, r4.c3, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r3 ON (((r1."C 1" = r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1."C 1" = r4.c1)))) FOR UPDATE OF r1 FOR UPDATE OF r2 FOR UPDATE OF r3 FOR UPDATE OF r4
+         ->  Merge Join
+               Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.*
+               Merge Cond: (ft1.c1 = ft5.c1)
+               ->  Merge Join
+                     Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*
+                     Merge Cond: (ft1.c1 = ft4.c1)
+                     ->  Merge Join
+                           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
+                           Merge Cond: (ft1.c1 = ft2.c1)
+                           ->  Sort
+                                 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
+                                 Sort Key: ft1.c1
+                                 ->  Foreign Scan on public.ft1
+                                       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+                           ->  Sort
+                                 Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
+                                 Sort Key: ft2.c1
+                                 ->  Foreign Scan on public.ft2
+                                       Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
+                                       Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
+                     ->  Sort
+                           Output: ft4.c1, ft4.c2, ft4.c3, ft4.*
+                           Sort Key: ft4.c1
+                           ->  Foreign Scan on public.ft4
+                                 Output: ft4.c1, ft4.c2, ft4.c3, ft4.*
+                                 Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" FOR UPDATE
+               ->  Sort
+                     Output: ft5.c1, ft5.c2, ft5.c3, ft5.*
+                     Sort Key: ft5.c1
+                     ->  Foreign Scan on public.ft5
+                           Output: ft5.c1, ft5.c2, ft5.c3, ft5.*
+                           Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" FOR UPDATE
+(39 rows)
+
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
+ c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |   c3   | c1 | c2 |   c3   
+----+----+-------+------------------------------+--------------------------+----+------------+-----+----+----+-------+------------------------------+--------------------------+----+------------+-----+----+----+--------+----+----+--------
+  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
+ 12 |  2 | 00012 | Tue Jan 13 00:00:00 1970 PST | Tue Jan 13 00:00:00 1970 | 2  | 2          | foo | 12 |  2 | 00012 | Tue Jan 13 00:00:00 1970 PST | Tue Jan 13 00:00:00 1970 | 2  | 2          | foo | 12 | 13 | AAA012 | 12 | 13 | AAA012
+ 18 |  8 | 00018 | Mon Jan 19 00:00:00 1970 PST | Mon Jan 19 00:00:00 1970 | 8  | 8          | foo | 18 |  8 | 00018 | Mon Jan 19 00:00:00 1970 PST | Mon Jan 19 00:00:00 1970 | 8  | 8          | foo | 18 | 19 | AAA018 | 18 | 19 | 
+ 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 | 25 | AAA024 | 24 | 25 | AAA024
+ 30 |  0 | 00030 | Sat Jan 31 00:00:00 1970 PST | Sat Jan 31 00:00:00 1970 | 0  | 0          | foo | 30 |  0 | 00030 | Sat Jan 31 00:00:00 1970 PST | Sat Jan 31 00:00:00 1970 | 0  | 0          | foo | 30 | 31 | AAA030 | 30 | 31 | AAA030
+ 36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 | 37 | AAA036 | 36 | 37 | 
+ 42 |  2 | 00042 | Thu Feb 12 00:00:00 1970 PST | Thu Feb 12 00:00:00 1970 | 2  | 2          | foo | 42 |  2 | 00042 | Thu Feb 12 00:00:00 1970 PST | Thu Feb 12 00:00:00 1970 | 2  | 2          | foo | 42 | 43 | AAA042 | 42 | 43 | AAA042
+ 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 | 49 | AAA048 | 48 | 49 | AAA048
+ 54 |  4 | 00054 | Tue Feb 24 00:00:00 1970 PST | Tue Feb 24 00:00:00 1970 | 4  | 4          | foo | 54 |  4 | 00054 | Tue Feb 24 00:00:00 1970 PST | Tue Feb 24 00:00:00 1970 | 4  | 4          | foo | 54 | 55 | AAA054 | 54 | 55 | 
+ 60 |  0 | 00060 | Mon Mar 02 00:00:00 1970 PST | Mon Mar 02 00:00:00 1970 | 0  | 0          | foo | 60 |  0 | 00060 | Mon Mar 02 00:00:00 1970 PST | Mon Mar 02 00:00:00 1970 | 0  | 0          | foo | 60 | 61 | AAA060 | 60 | 61 | AAA060
+ 66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 | 67 | AAA066 | 66 | 67 | AAA066
+ 72 |  2 | 00072 | Sat Mar 14 00:00:00 1970 PST | Sat Mar 14 00:00:00 1970 | 2  | 2          | foo | 72 |  2 | 00072 | Sat Mar 14 00:00:00 1970 PST | Sat Mar 14 00:00:00 1970 | 2  | 2          | foo | 72 | 73 | AAA072 | 72 | 73 | 
+ 78 |  8 | 00078 | Fri Mar 20 00:00:00 1970 PST | Fri Mar 20 00:00:00 1970 | 8  | 8          | foo | 78 |  8 | 00078 | Fri Mar 20 00:00:00 1970 PST | Fri Mar 20 00:00:00 1970 | 8  | 8          | foo | 78 | 79 | AAA078 | 78 | 79 | AAA078
+ 84 |  4 | 00084 | Thu Mar 26 00:00:00 1970 PST | Thu Mar 26 00:00:00 1970 | 4  | 4          | foo | 84 |  4 | 00084 | Thu Mar 26 00:00:00 1970 PST | Thu Mar 26 00:00:00 1970 | 4  | 4          | foo | 84 | 85 | AAA084 | 84 | 85 | AAA084
+ 90 |  0 | 00090 | Wed Apr 01 00:00:00 1970 PST | Wed Apr 01 00:00:00 1970 | 0  | 0          | foo | 90 |  0 | 00090 | Wed Apr 01 00:00:00 1970 PST | Wed Apr 01 00:00:00 1970 | 0  | 0          | foo | 90 | 91 | AAA090 | 90 | 91 | 
+ 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 | 97 | AAA096 | 96 | 97 | AAA096
+(16 rows)
+
 -- check join pushdown in situations where multiple userids are involved
 CREATE ROLE regress_view_owner SUPERUSER;
 CREATE USER MAPPING FOR regress_view_owner SERVER loopback;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index dc302ca789..7ff43337a9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4330,10 +4330,26 @@ add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 		Cost		startup_cost;
 		Cost		total_cost;
 		List	   *useful_pathkeys = lfirst(lc);
+		Path	   *sorted_epq_path;
 
 		estimate_path_cost_size(root, rel, NIL, useful_pathkeys,
 								&rows, &width, &startup_cost, &total_cost);
 
+		/*
+		 * The EPQ path must be at least as well sorted as the path itself,
+		 * in case it gets used as input to a mergejoin.
+		 */
+		sorted_epq_path = epq_path;
+		if (sorted_epq_path != NULL &&
+			!pathkeys_contained_in(useful_pathkeys,
+								   sorted_epq_path->pathkeys))
+			sorted_epq_path = (Path *)
+				create_sort_path(root,
+								 rel,
+								 sorted_epq_path,
+								 useful_pathkeys,
+								 -1.0);
+
 		add_path(rel, (Path *)
 				 create_foreignscan_path(root, rel,
 										 NULL,
@@ -4342,7 +4358,7 @@ add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 										 total_cost,
 										 useful_pathkeys,
 										 NULL,
-										 epq_path,
+										 sorted_epq_path,
 										 NIL));
 	}
 }
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2bdf7d8c1e..e73c258ff4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -559,6 +559,13 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5.c1 = ft4.c1 WHERE ft4.c1 BETWEEN 10 and 30 ORDER BY ft5.c1, ft4.c1;
 SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5.c1 = ft4.c1 WHERE ft4.c1 BETWEEN 10 and 30 ORDER BY ft5.c1, ft4.c1;
 
+-- multi-way join involving multiple merge joins
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
+
 -- check join pushdown in situations where multiple userids are involved
 CREATE ROLE regress_view_owner SUPERUSER;
 CREATE USER MAPPING FOR regress_view_owner SERVER loopback;

#116

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Robert Haas (#115)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

Thanks for the review. Updated patch attached.

Looks OK to me. Would it be worth annotating the added regression test
case with a comment that this once caused EPQ-related planning problems?

regards, tom lane

#117

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#116)

Re: [HACKERS] postgres_fdw bug in 9.6

On Wed, Jan 17, 2018 at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Thanks for the review. Updated patch attached.

Looks OK to me. Would it be worth annotating the added regression test
case with a comment that this once caused EPQ-related planning problems?

I tend to think somebody who is curious about the origin of any
particular test can just use 'git blame' and/or 'git log -Gwhatever'
to figure out which commits added it, and that therefore it's not
worth including that in the comment explicitly. But I don't care
deeply.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#118

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Robert Haas (#117)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jan 17, 2018 at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Looks OK to me. Would it be worth annotating the added regression test
case with a comment that this once caused EPQ-related planning problems?

I tend to think somebody who is curious about the origin of any
particular test can just use 'git blame' and/or 'git log -Gwhatever'
to figure out which commits added it, and that therefore it's not
worth including that in the comment explicitly. But I don't care
deeply.

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply. Just push it ...

regards, tom lane

#119

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#118)

Re: [HACKERS] postgres_fdw bug in 9.6

On Wed, Jan 17, 2018 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply. Just push it ...

Done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#120

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Robert Haas (#119)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/18 7:09), Robert Haas wrote:

On Wed, Jan 17, 2018 at 4:08 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply. Just push it ...

Done.

Thanks for taking the time to work on this issue, Robert and Tom!

Best regards,
Etsuro Fujita

#121

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Etsuro Fujita (#120)

1 attachment(s)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/18 15:40), Etsuro Fujita wrote:

(2018/01/18 7:09), Robert Haas wrote:

On Wed, Jan 17, 2018 at 4:08 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

It's debatable perhaps -- I tend to err in the other direction.
But likewise, I don't care deeply. Just push it ...

Done.

I noticed that this test case added by the patch is not appropriate:

+-- multi-way join involving multiple merge joins
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;

because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
works well without the fix. I modified this to inject a Sort into the
recheck plan of the very first foreign join. Attached is a patch for that.

Best regards,
Etsuro Fujita

Attachments:

postgres-fdw-regress.patchtext/x-diff; name=postgres-fdw-regress.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 2330,2367 **** SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5
  (4 rows)
  
  -- multi-way join involving multiple merge joins
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
!     AND ft1.c1 = ft5.c1 FOR UPDATE;
!                                                                                                                                                                                                                                                                                                                                                                                                          QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                         
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   LockRows
     Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
     ->  Foreign Scan
           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
           Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
!          Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2, r4.c3, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r3 ON (((r1."C 1" = r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1."C 1" = r4.c1)))) FOR UPDATE OF r1 FOR UPDATE OF r2 FOR UPDATE OF r3 FOR UPDATE OF r4
           ->  Merge Join
                 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.*
!                Merge Cond: (ft1.c1 = ft5.c1)
                 ->  Merge Join
                       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*
!                      Merge Cond: (ft1.c1 = ft4.c1)
!                      ->  Merge Join
                             Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                            Merge Cond: (ft1.c1 = ft2.c1)
!                            ->  Sort
!                                  Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
!                                  Sort Key: ft1.c1
!                                  ->  Foreign Scan on public.ft1
                                         Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                            ->  Sort
!                                  Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                                  Sort Key: ft2.c1
!                                  ->  Foreign Scan on public.ft2
                                         Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
                       ->  Sort
                             Output: ft4.c1, ft4.c2, ft4.c3, ft4.*
                             Sort Key: ft4.c1
--- 2330,2371 ----
  (4 rows)
  
  -- multi-way join involving multiple merge joins
+ SET enable_nestloop TO false;
+ SET enable_hashjoin TO false;
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c2 = ft4.c1
!     AND ft1.c2 = ft5.c1 AND ft1.c1 < 100 AND ft2.c1 < 100 FOR UPDATE;
!                                                                                                                                                                                                                                                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                                             
! --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   LockRows
     Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
     ->  Foreign Scan
           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
           Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
!          Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2, r4.c3, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r2."C 1" < 100)) AND ((r1."C 1" < 100)))) INNER JOIN "S 1"."T 3" r3 ON (((r1.c2 = r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1.c2 = r4.c1)))) FOR UPDATE OF r1 FOR UPDATE OF r2 FOR UPDATE OF r3 FOR UPDATE OF r4
           ->  Merge Join
                 Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.*
!                Merge Cond: (ft1.c2 = ft5.c1)
                 ->  Merge Join
                       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*
!                      Merge Cond: (ft1.c2 = ft4.c1)
!                      ->  Sort
                             Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                            Sort Key: ft1.c2
!                            ->  Merge Join
!                                  Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                                  Merge Cond: (ft1.c1 = ft2.c1)
!                                  ->  Sort
                                         Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
!                                        Sort Key: ft1.c1
!                                        ->  Foreign Scan on public.ft1
!                                              Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*
!                                              Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 100)) FOR UPDATE
!                                  ->  Materialize
                                         Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                                        ->  Foreign Scan on public.ft2
!                                              Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*
!                                              Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 100)) ORDER BY "C 1" ASC NULLS LAST FOR UPDATE
                       ->  Sort
                             Output: ft4.c1, ft4.c2, ft4.c3, ft4.*
                             Sort Key: ft4.c1
***************
*** 2374,2403 **** SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
                       ->  Foreign Scan on public.ft5
                             Output: ft5.c1, ft5.c2, ft5.c3, ft5.*
                             Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" FOR UPDATE
! (39 rows)
  
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
!     AND ft1.c1 = ft5.c1 FOR UPDATE;
   c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |   c3   | c1 | c2 |   c3   
  ----+----+-------+------------------------------+--------------------------+----+------------+-----+----+----+-------+------------------------------+--------------------------+----+------------+-----+----+----+--------+----+----+--------
    6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  12 |  2 | 00012 | Tue Jan 13 00:00:00 1970 PST | Tue Jan 13 00:00:00 1970 | 2  | 2          | foo | 12 |  2 | 00012 | Tue Jan 13 00:00:00 1970 PST | Tue Jan 13 00:00:00 1970 | 2  | 2          | foo | 12 | 13 | AAA012 | 12 | 13 | AAA012
!  18 |  8 | 00018 | Mon Jan 19 00:00:00 1970 PST | Mon Jan 19 00:00:00 1970 | 8  | 8          | foo | 18 |  8 | 00018 | Mon Jan 19 00:00:00 1970 PST | Mon Jan 19 00:00:00 1970 | 8  | 8          | foo | 18 | 19 | AAA018 | 18 | 19 | 
!  24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 |  4 | 00024 | Sun Jan 25 00:00:00 1970 PST | Sun Jan 25 00:00:00 1970 | 4  | 4          | foo | 24 | 25 | AAA024 | 24 | 25 | AAA024
!  30 |  0 | 00030 | Sat Jan 31 00:00:00 1970 PST | Sat Jan 31 00:00:00 1970 | 0  | 0          | foo | 30 |  0 | 00030 | Sat Jan 31 00:00:00 1970 PST | Sat Jan 31 00:00:00 1970 | 0  | 0          | foo | 30 | 31 | AAA030 | 30 | 31 | AAA030
!  36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 | 37 | AAA036 | 36 | 37 | 
!  42 |  2 | 00042 | Thu Feb 12 00:00:00 1970 PST | Thu Feb 12 00:00:00 1970 | 2  | 2          | foo | 42 |  2 | 00042 | Thu Feb 12 00:00:00 1970 PST | Thu Feb 12 00:00:00 1970 | 2  | 2          | foo | 42 | 43 | AAA042 | 42 | 43 | AAA042
!  48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 |  8 | 00048 | Wed Feb 18 00:00:00 1970 PST | Wed Feb 18 00:00:00 1970 | 8  | 8          | foo | 48 | 49 | AAA048 | 48 | 49 | AAA048
!  54 |  4 | 00054 | Tue Feb 24 00:00:00 1970 PST | Tue Feb 24 00:00:00 1970 | 4  | 4          | foo | 54 |  4 | 00054 | Tue Feb 24 00:00:00 1970 PST | Tue Feb 24 00:00:00 1970 | 4  | 4          | foo | 54 | 55 | AAA054 | 54 | 55 | 
!  60 |  0 | 00060 | Mon Mar 02 00:00:00 1970 PST | Mon Mar 02 00:00:00 1970 | 0  | 0          | foo | 60 |  0 | 00060 | Mon Mar 02 00:00:00 1970 PST | Mon Mar 02 00:00:00 1970 | 0  | 0          | foo | 60 | 61 | AAA060 | 60 | 61 | AAA060
!  66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 | 67 | AAA066 | 66 | 67 | AAA066
!  72 |  2 | 00072 | Sat Mar 14 00:00:00 1970 PST | Sat Mar 14 00:00:00 1970 | 2  | 2          | foo | 72 |  2 | 00072 | Sat Mar 14 00:00:00 1970 PST | Sat Mar 14 00:00:00 1970 | 2  | 2          | foo | 72 | 73 | AAA072 | 72 | 73 | 
!  78 |  8 | 00078 | Fri Mar 20 00:00:00 1970 PST | Fri Mar 20 00:00:00 1970 | 8  | 8          | foo | 78 |  8 | 00078 | Fri Mar 20 00:00:00 1970 PST | Fri Mar 20 00:00:00 1970 | 8  | 8          | foo | 78 | 79 | AAA078 | 78 | 79 | AAA078
!  84 |  4 | 00084 | Thu Mar 26 00:00:00 1970 PST | Thu Mar 26 00:00:00 1970 | 4  | 4          | foo | 84 |  4 | 00084 | Thu Mar 26 00:00:00 1970 PST | Thu Mar 26 00:00:00 1970 | 4  | 4          | foo | 84 | 85 | AAA084 | 84 | 85 | AAA084
!  90 |  0 | 00090 | Wed Apr 01 00:00:00 1970 PST | Wed Apr 01 00:00:00 1970 | 0  | 0          | foo | 90 |  0 | 00090 | Wed Apr 01 00:00:00 1970 PST | Wed Apr 01 00:00:00 1970 | 0  | 0          | foo | 90 | 91 | AAA090 | 90 | 91 | 
!  96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 | 97 | AAA096 | 96 | 97 | AAA096
! (16 rows)
  
  -- check join pushdown in situations where multiple userids are involved
  CREATE ROLE regress_view_owner SUPERUSER;
  CREATE USER MAPPING FOR regress_view_owner SERVER loopback;
--- 2378,2403 ----
                       ->  Foreign Scan on public.ft5
                             Output: ft5.c1, ft5.c2, ft5.c3, ft5.*
                             Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" FOR UPDATE
! (41 rows)
  
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c2 = ft4.c1
!     AND ft1.c2 = ft5.c1 AND ft1.c1 < 100 AND ft2.c1 < 100 FOR UPDATE;
   c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |   c3   | c1 | c2 |   c3   
  ----+----+-------+------------------------------+--------------------------+----+------------+-----+----+----+-------+------------------------------+--------------------------+----+------------+-----+----+----+--------+----+----+--------
    6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  6 | 00006 | Wed Jan 07 00:00:00 1970 PST | Wed Jan 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  16 |  6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6  | 6          | foo | 16 |  6 | 00016 | Sat Jan 17 00:00:00 1970 PST | Sat Jan 17 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  26 |  6 | 00026 | Tue Jan 27 00:00:00 1970 PST | Tue Jan 27 00:00:00 1970 | 6  | 6          | foo | 26 |  6 | 00026 | Tue Jan 27 00:00:00 1970 PST | Tue Jan 27 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo | 36 |  6 | 00036 | Fri Feb 06 00:00:00 1970 PST | Fri Feb 06 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  46 |  6 | 00046 | Mon Feb 16 00:00:00 1970 PST | Mon Feb 16 00:00:00 1970 | 6  | 6          | foo | 46 |  6 | 00046 | Mon Feb 16 00:00:00 1970 PST | Mon Feb 16 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  56 |  6 | 00056 | Thu Feb 26 00:00:00 1970 PST | Thu Feb 26 00:00:00 1970 | 6  | 6          | foo | 56 |  6 | 00056 | Thu Feb 26 00:00:00 1970 PST | Thu Feb 26 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo | 66 |  6 | 00066 | Sun Mar 08 00:00:00 1970 PST | Sun Mar 08 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  76 |  6 | 00076 | Wed Mar 18 00:00:00 1970 PST | Wed Mar 18 00:00:00 1970 | 6  | 6          | foo | 76 |  6 | 00076 | Wed Mar 18 00:00:00 1970 PST | Wed Mar 18 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  86 |  6 | 00086 | Sat Mar 28 00:00:00 1970 PST | Sat Mar 28 00:00:00 1970 | 6  | 6          | foo | 86 |  6 | 00086 | Sat Mar 28 00:00:00 1970 PST | Sat Mar 28 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
!  96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo | 96 |  6 | 00096 | Tue Apr 07 00:00:00 1970 PST | Tue Apr 07 00:00:00 1970 | 6  | 6          | foo |  6 |  7 | AAA006 |  6 |  7 | AAA006
! (10 rows)
  
+ RESET enable_nestloop;
+ RESET enable_hashjoin;
  -- check join pushdown in situations where multiple userids are involved
  CREATE ROLE regress_view_owner SUPERUSER;
  CREATE USER MAPPING FOR regress_view_owner SERVER loopback;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 560,570 **** SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5
  SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5.c1 = ft4.c1 WHERE ft4.c1 BETWEEN 10 and 30 ORDER BY ft5.c1, ft4.c1;
  
  -- multi-way join involving multiple merge joins
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
!     AND ft1.c1 = ft5.c1 FOR UPDATE;
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
!     AND ft1.c1 = ft5.c1 FOR UPDATE;
  
  -- check join pushdown in situations where multiple userids are involved
  CREATE ROLE regress_view_owner SUPERUSER;
--- 560,574 ----
  SELECT ft5, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2 FROM ft5 left join ft4 on ft5.c1 = ft4.c1 WHERE ft4.c1 BETWEEN 10 and 30 ORDER BY ft5.c1, ft4.c1;
  
  -- multi-way join involving multiple merge joins
+ SET enable_nestloop TO false;
+ SET enable_hashjoin TO false;
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c2 = ft4.c1
!     AND ft1.c2 = ft5.c1 AND ft1.c1 < 100 AND ft2.c1 < 100 FOR UPDATE;
! SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c2 = ft4.c1
!     AND ft1.c2 = ft5.c1 AND ft1.c1 < 100 AND ft2.c1 < 100 FOR UPDATE;
! RESET enable_nestloop;
! RESET enable_hashjoin;
  
  -- check join pushdown in situations where multiple userids are involved
  CREATE ROLE regress_view_owner SUPERUSER;

#122

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Etsuro Fujita (#121)

Re: [HACKERS] postgres_fdw bug in 9.6

On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I noticed that this test case added by the patch is not appropriate:
+-- multi-way join involving multiple merge joins
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
+SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+    AND ft1.c1 = ft5.c1 FOR UPDATE;
because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
works well without the fix. I modified this to inject a Sort into the
recheck plan of the very first foreign join. Attached is a patch for that.

Mumble. Tom provided me with that example and said it failed without
the patch. I didn't check, I just believed him. But I'm surprised if
he was wrong; Tom usually tries to avoid being wrong...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#123

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Robert Haas (#122)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I noticed that this test case added by the patch is not appropriate:
because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
works well without the fix. I modified this to inject a Sort into the
recheck plan of the very first foreign join. Attached is a patch for that.

Mumble. Tom provided me with that example and said it failed without
the patch. I didn't check, I just believed him. But I'm surprised if
he was wrong; Tom usually tries to avoid being wrong...

Hm. It did fail as advertised when I connected to the contrib_regression
database (after installcheck) and entered the query by hand; I
copied-and-pasted the result of that to show you. It's possible that it
would not have failed in the particular spot where you chose to insert it
in the regression script, if for example there were nondefault planner GUC
settings active at that spot. Did you check that the script produced the
expected failure against unpatched code?

regards, tom lane

#124

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#123)

Re: [HACKERS] postgres_fdw bug in 9.6

On Fri, Jan 19, 2018 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I noticed that this test case added by the patch is not appropriate:
because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
works well without the fix. I modified this to inject a Sort into the
recheck plan of the very first foreign join. Attached is a patch for that.

Mumble. Tom provided me with that example and said it failed without
the patch. I didn't check, I just believed him. But I'm surprised if
he was wrong; Tom usually tries to avoid being wrong...

Hm. It did fail as advertised when I connected to the contrib_regression
database (after installcheck) and entered the query by hand; I
copied-and-pasted the result of that to show you. It's possible that it
would not have failed in the particular spot where you chose to insert it
in the regression script, if for example there were nondefault planner GUC
settings active at that spot. Did you check that the script produced the
expected failure against unpatched code?

No. I guess I'll have to go debug this. Argh.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#125

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Robert Haas (#124)

Re: [HACKERS] postgres_fdw bug in 9.6

On Fri, Jan 19, 2018 at 10:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Hm. It did fail as advertised when I connected to the contrib_regression
database (after installcheck) and entered the query by hand; I
copied-and-pasted the result of that to show you. It's possible that it
would not have failed in the particular spot where you chose to insert it
in the regression script, if for example there were nondefault planner GUC
settings active at that spot. Did you check that the script produced the
expected failure against unpatched code?

No. I guess I'll have to go debug this. Argh.

Well, I'm a little stumped. When I do it the way you did it, it fails
with the same error you got:

contrib_regression=# EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
AND ft1.c1 = ft5.c1 FOR UPDATE;
ERROR: outer pathkeys do not match mergeclauses

But when I add the same command to postgres_fdw.sql and run it as part
of the regression tests, it works. I tried adding it at the end with
\c beforehand so that there wouldn't be any temporary settings in
effect. It still didn't generate an error. So then I ran 'SHOW ALL'
from the regression test file and from my psql terminal and comparing
the results:

< application_name | pg_regress/postgres_fdw
| Sets the application name to be reported in statistics and
logs.

application_name | psql | Sets the application name to be reported in statistics and logs.

< DateStyle | Postgres, MDY
| Sets the display format for date and time values.

DateStyle | ISO, MDY | Sets the display format for date and time values.

< IntervalStyle | postgres_verbose
| Sets the display format for interval values.

IntervalStyle | postgres | Sets the display format for interval values.

< TimeZone | PST8PDT
| Sets the time zone for displaying and interpreting time
stamps.

TimeZone | US/Eastern | Sets the time zone for displaying and interpreting time stamps.

I then tried setting all of those GUCs to have the same value in my
psql session that they had in the script, but I still got the same
error. So, with the exact same settings in effect, this query
reliable produces the error from psql and reliably fails to produce
the error from pg_regress. Furthermore, it remains true even when the
query is sent from pg_regress as the first query in a brand new
session, so the problem can't be some kind of session state left
behind by the previous postgres_fdw tests. In fact, if I extract the
exact psql command that pg_regress runs and execute it by hand, then
the query errors out:

"/Users/rhaas/install/dev/bin/psql" -X -a -q -d "contrib_regression" <
"/Users/rhaas/pgsql/contrib/postgres_fdw/sql/postgres_fdw.sql"

But the exact same command run by pg_regress does not error out.
Maybe the environment variables that pg_regress sets up affect
something? But what would they affect on the server, other than the
values of GUCs? I'm probably missing something dumb here, but I can't
think of what it is at the moment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#126

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Robert Haas (#125)

Re: [HACKERS] postgres_fdw bug in 9.6

Robert Haas <robertmhaas@gmail.com> writes:

Well, I'm a little stumped. When I do it the way you did it, it fails
with the same error you got:

contrib_regression=# EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
AND ft1.c1 = ft5.c1 FOR UPDATE;
ERROR: outer pathkeys do not match mergeclauses

But when I add the same command to postgres_fdw.sql and run it as part
of the regression tests, it works. I tried adding it at the end with
\c beforehand so that there wouldn't be any temporary settings in
effect.

I duplicated those results, and then added an ANALYZE, and then it fails:

***************
*** 7548,7550 ****
--- 7595,7604 ----
  (4 rows)

  RESET enable_partition_wise_join;
+ \c -
+ analyze "S 1"."T 1";
+ -- multi-way join involving multiple merge joins
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+     AND ft1.c1 = ft5.c1 FOR UPDATE;
+ ERROR:  outer pathkeys do not match mergeclauses

So apparently the reason our manual tests replicated the failure is
that an auto-analyze happened in between. Now that I realize that,
I find that the manual query doesn't fail if executed *immediately*
after the regression test run. Wait a minute, repeat, it does fail.

OTOH, put the same ANALYZE before the test case where it is in the
script, no change.

So the contributing factors here are (1) there are a bunch of data
changes to "S 1"."T 1" further down in the script than where you
added the test case, and (2) you need an auto-analyze to have seen
those changes before the problematic plan gets picked.

It looks like Etsuro-san's proposed patch locks down the choice of
plan more tightly, which is probably a reasonable answer.

regards, tom lane

#127

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tom Lane (#126)

Re: [HACKERS] postgres_fdw bug in 9.6

On Sat, Jan 20, 2018 at 12:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It looks like Etsuro-san's proposed patch locks down the choice of
plan more tightly, which is probably a reasonable answer.

OK, committed. I added a comment along the lines you suggested
previously, since this no longer seems like a generic test that
happens to involve a bunch of merge joins.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#128

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 8 years ago

In reply to: Robert Haas (#127)

Re: [HACKERS] postgres_fdw bug in 9.6

(2018/01/31 4:56), Robert Haas wrote:

On Sat, Jan 20, 2018 at 12:20 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

It looks like Etsuro-san's proposed patch locks down the choice of
plan more tightly, which is probably a reasonable answer.

OK, committed. I added a comment along the lines you suggested
previously, since this no longer seems like a generic test that
happens to involve a bunch of merge joins.

Thank you!

Best regards,
Etsuro Fujita