Re: Foreign join pushdown vs EvalPlanQual

Started by Kouhei Kaigaiabout 10 years ago83 messages
#1Kouhei Kaigai
kaigai@ak.jp.nec.com

On Thu, Oct 29, 2015 at 6:05 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

In this case, the EPQ slot to store the joined tuple is still
a challenge to be solved.

Is it possible to use one or any of EPQ slots that are setup for
base relations but represented by ForeignScan/CustomScan?

Yes, I proposed that exact thing upthread.

In case when ForeignScan run a remote join that involves three
base foreign tables (relid=2, 3, 5 for example), for example,
no other code touches this slot. So, it is safe even if we put
a joined tuple on EPQ slots of underlying base relations.

In this case, EPQ slots are initialized as below:

es_epqTuple[0] ... EPQ tuple of base relation (relid=1)
es_epqTuple[1] ... EPQ of the joined tuple (for relis=2, 3 5)
es_epqTuple[2] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above
es_epqTuple[3] ... EPQ tuple of base relation (relid=4)
es_epqTuple[4] ... EPQ of the joined tuple (for relis=2, 3 5), copy of above
es_epqTuple[5] ... EPQ tuple of base relation (relid=6)

You don't really need to initialize them all. You can just initialize
es_epqTuple[1] and leave 2 and 4 unused.

Then, if FDW/CSP is designed to utilize the preliminary joined
tuples rather than local join, it can just raise the tuple kept
in one of the EPQ slots for underlying base relations.
If FDW/CSP prefers local join, it can perform as like local join
doing; check join condition and construct a joined tuple by itself
or by alternative plan.

Right.

A challenge is that junk wholerow references on behalf of ROW_MARK_COPY
are injected by preprocess_targetlist(). It is earlier than the main path
consideration by query_planner(), thus, it is not predictable how remote
query shall be executed at this point.
If ROW_MARK_COPY, base tuple image is fetched using this junk attribute.
So, here is two options if we allow to put joined tuple on either of
es_epqTuple[].

options-1) We ignore record type definition. FDW returns a joined tuple
towards the whole-row reference of either of the base relations in this
join. The junk attribute shall be filtered out eventually and only FDW
driver shall see, so it is harmless to do (probably).
This option takes no big changes, however, we need a little brave to adopt.

options-2) We allow FDW/CSP to adjust target-list of the relevant nodes
after these paths get chosen by planner. It enables to remove whole-row
reference of base relations and add alternative whole-row reference instead
if FDW/CSP can support it.
This feature can be relevant to target-list push-down to the remote side,
not only EPQ rechecks, because adjustment of target-list means we allows
FDW/CSP to determine which expression shall be executed locally, or shall
not be.
I think, this option is more straightforward, however, needs a little bit
deeper consideration, because we have to design the best hook point and
need to ensure how path-ification will perform.

Therefore, I think we need two steps towards the entire solution.
Step-1) FDW/CSP will recheck base EPQ tuples and support local
reconstruction on the fly. It does not need something special
enhancement on the planner - so we can fix up by v9.5 release.
Step-2) FDW/CSP will support adjustment of target-list to add whole-row
reference of joined tuple instead of multiple base relations, then FDW/CSP
will be able to put a joined tuple on either of EPQ slot if it wants - it
takes a new feature enhancement, so v9.6 is a suitable timeline.

How about your opinion towards the direction?
I don't want to drop extra optimization opportunity, however, we are now in
November. I don't have enough brave to add none-obvious new feature here.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#1)

On 2015/11/03 22:15, Kouhei Kaigai wrote:

A challenge is that junk wholerow references on behalf of ROW_MARK_COPY
are injected by preprocess_targetlist(). It is earlier than the main path
consideration by query_planner(), thus, it is not predictable how remote
query shall be executed at this point.
If ROW_MARK_COPY, base tuple image is fetched using this junk attribute.
So, here is two options if we allow to put joined tuple on either of
es_epqTuple[].

options-1) We ignore record type definition. FDW returns a joined tuple
towards the whole-row reference of either of the base relations in this
join. The junk attribute shall be filtered out eventually and only FDW
driver shall see, so it is harmless to do (probably).
This option takes no big changes, however, we need a little brave to adopt.

options-2) We allow FDW/CSP to adjust target-list of the relevant nodes
after these paths get chosen by planner. It enables to remove whole-row
reference of base relations and add alternative whole-row reference instead
if FDW/CSP can support it.
This feature can be relevant to target-list push-down to the remote side,
not only EPQ rechecks, because adjustment of target-list means we allows
FDW/CSP to determine which expression shall be executed locally, or shall
not be.
I think, this option is more straightforward, however, needs a little bit
deeper consideration, because we have to design the best hook point and
need to ensure how path-ification will perform.

Therefore, I think we need two steps towards the entire solution.
Step-1) FDW/CSP will recheck base EPQ tuples and support local
reconstruction on the fly. It does not need something special
enhancement on the planner - so we can fix up by v9.5 release.
Step-2) FDW/CSP will support adjustment of target-list to add whole-row
reference of joined tuple instead of multiple base relations, then FDW/CSP
will be able to put a joined tuple on either of EPQ slot if it wants - it
takes a new feature enhancement, so v9.6 is a suitable timeline.

How about your opinion towards the direction?
I don't want to drop extra optimization opportunity, however, we are now in
November. I don't have enough brave to add none-obvious new feature here.

I think we need to consider a general solution that can be applied not
only to the case where the component tables in a foreign join all use
ROW_MARK_COPY but to the case where those tables use different rowmark
types such as ROW_MARK_COPY and ROW_MARK_EXCLUSIVE, as I pointed out
upthread.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#2)

-----Original Message-----
From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
Sent: Wednesday, November 04, 2015 5:11 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On 2015/11/03 22:15, Kouhei Kaigai wrote:

A challenge is that junk wholerow references on behalf of ROW_MARK_COPY
are injected by preprocess_targetlist(). It is earlier than the main path
consideration by query_planner(), thus, it is not predictable how remote
query shall be executed at this point.
If ROW_MARK_COPY, base tuple image is fetched using this junk attribute.
So, here is two options if we allow to put joined tuple on either of
es_epqTuple[].

options-1) We ignore record type definition. FDW returns a joined tuple
towards the whole-row reference of either of the base relations in this
join. The junk attribute shall be filtered out eventually and only FDW
driver shall see, so it is harmless to do (probably).
This option takes no big changes, however, we need a little brave to adopt.

options-2) We allow FDW/CSP to adjust target-list of the relevant nodes
after these paths get chosen by planner. It enables to remove whole-row
reference of base relations and add alternative whole-row reference instead
if FDW/CSP can support it.
This feature can be relevant to target-list push-down to the remote side,
not only EPQ rechecks, because adjustment of target-list means we allows
FDW/CSP to determine which expression shall be executed locally, or shall
not be.
I think, this option is more straightforward, however, needs a little bit
deeper consideration, because we have to design the best hook point and
need to ensure how path-ification will perform.

Therefore, I think we need two steps towards the entire solution.
Step-1) FDW/CSP will recheck base EPQ tuples and support local
reconstruction on the fly. It does not need something special
enhancement on the planner - so we can fix up by v9.5 release.
Step-2) FDW/CSP will support adjustment of target-list to add whole-row
reference of joined tuple instead of multiple base relations, then FDW/CSP
will be able to put a joined tuple on either of EPQ slot if it wants - it
takes a new feature enhancement, so v9.6 is a suitable timeline.

How about your opinion towards the direction?
I don't want to drop extra optimization opportunity, however, we are now in
November. I don't have enough brave to add none-obvious new feature here.

I think we need to consider a general solution that can be applied not
only to the case where the component tables in a foreign join all use
ROW_MARK_COPY but to the case where those tables use different rowmark
types such as ROW_MARK_COPY and ROW_MARK_EXCLUSIVE, as I pointed out
upthread.

In mixture case, FDW/CSP can choose local recheck & reconstruction based
on the EPQ tuples of base relation. Nobody enforce FDW/CSP to return
a joined tuple always even if author don't want to support the feature.
Why do you think it is not a generic solution? FDW/CSP driver "can choose"
the best solution according to its implementation and capability.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#3)

On 2015/11/04 17:28, Kouhei Kaigai wrote:

I think we need to consider a general solution that can be applied not
only to the case where the component tables in a foreign join all use
ROW_MARK_COPY but to the case where those tables use different rowmark
types such as ROW_MARK_COPY and ROW_MARK_EXCLUSIVE, as I pointed out
upthread.

In mixture case, FDW/CSP can choose local recheck & reconstruction based
on the EPQ tuples of base relation. Nobody enforce FDW/CSP to return
a joined tuple always even if author don't want to support the feature.
Why do you think it is not a generic solution? FDW/CSP driver "can choose"
the best solution according to its implementation and capability.

It looked to me that you were discussing only the case where component
foreign tables in a foreign join all use ROW_MARK_COPY, so I commented
that. Sorry for my misunderstanding.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#1)

On Tue, Nov 3, 2015 at 8:15 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

A challenge is that junk wholerow references on behalf of ROW_MARK_COPY
are injected by preprocess_targetlist(). It is earlier than the main path
consideration by query_planner(), thus, it is not predictable how remote
query shall be executed at this point.

Oh, dear. That seems like a rather serious problem for my approach.

If ROW_MARK_COPY, base tuple image is fetched using this junk attribute.
So, here is two options if we allow to put joined tuple on either of
es_epqTuple[].

Neither of these sounds viable to me.

I'm inclined to go back to something like what you proposed here:

/messages/by-id/9A28C8860F777E439AA12E8AEA7694F80114B89D@BPXM15GP.gisp.nec.co.jp

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#5)

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Sent: Friday, November 06, 2015 9:40 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Etsuro Fujita; Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org;
Shigeru Hanada
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On Tue, Nov 3, 2015 at 8:15 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

A challenge is that junk wholerow references on behalf of ROW_MARK_COPY
are injected by preprocess_targetlist(). It is earlier than the main path
consideration by query_planner(), thus, it is not predictable how remote
query shall be executed at this point.

Oh, dear. That seems like a rather serious problem for my approach.

If ROW_MARK_COPY, base tuple image is fetched using this junk attribute.
So, here is two options if we allow to put joined tuple on either of
es_epqTuple[].

Neither of these sounds viable to me.

I'm inclined to go back to something like what you proposed here:

Good :-)

/messages/by-id/9A28C8860F777E439AA12E8AEA7694F80114B89
D@BPXM15GP.gisp.nec.co.jp

This patch needs to be rebased.
One thing different from the latest version is fdw_recheck_quals of
ForeignScan was added. So, ...

(1) Principle is that FDW driver knows what qualifiers were pushed down
and how does it kept in the private field. So, fdw_recheck_quals is
redundant and to be reverted.

(2) Even though the principle is as described in (1), however,
wired logic in ForeignRecheck() and fdw_recheck_quals are useful
default for most of FDW drivers. So, it shall be kept and valid
only if RecheckForeignScan callback is not defined.

Which is better approach for the v3 patch?
My preference is (1), because fdw_recheck_quals is a new feature,
thus, FDW driver has to be adjusted in v9.5 more or less, even if
it already supports qualifier push-down.
In general, interface becomes more graceful to stick its principle.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#6)

On Fri, Nov 6, 2015 at 9:42 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

This patch needs to be rebased.
One thing different from the latest version is fdw_recheck_quals of
ForeignScan was added. So, ...

(1) Principle is that FDW driver knows what qualifiers were pushed down
and how does it kept in the private field. So, fdw_recheck_quals is
redundant and to be reverted.

(2) Even though the principle is as described in (1), however,
wired logic in ForeignRecheck() and fdw_recheck_quals are useful
default for most of FDW drivers. So, it shall be kept and valid
only if RecheckForeignScan callback is not defined.

Which is better approach for the v3 patch?
My preference is (1), because fdw_recheck_quals is a new feature,
thus, FDW driver has to be adjusted in v9.5 more or less, even if
it already supports qualifier push-down.
In general, interface becomes more graceful to stick its principle.

fdw_recheck_quals seems likely to be very convenient for FDW authors,
and I think ripping it out would be a terrible decision.

I think ForeignRecheck should first call ExecQual to test
fdw_recheck_quals. If it returns false, return false. If it returns
true, then give the FDW callback a chance, if one is defined. If that
returns false, return false. If we haven't yet returned false,
return true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#7)

On Fri, Nov 6, 2015 at 9:42 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

This patch needs to be rebased.
One thing different from the latest version is fdw_recheck_quals of
ForeignScan was added. So, ...

(1) Principle is that FDW driver knows what qualifiers were pushed down
and how does it kept in the private field. So, fdw_recheck_quals is
redundant and to be reverted.

(2) Even though the principle is as described in (1), however,
wired logic in ForeignRecheck() and fdw_recheck_quals are useful
default for most of FDW drivers. So, it shall be kept and valid
only if RecheckForeignScan callback is not defined.

Which is better approach for the v3 patch?
My preference is (1), because fdw_recheck_quals is a new feature,
thus, FDW driver has to be adjusted in v9.5 more or less, even if
it already supports qualifier push-down.
In general, interface becomes more graceful to stick its principle.

fdw_recheck_quals seems likely to be very convenient for FDW authors,
and I think ripping it out would be a terrible decision.

OK, I try to co-exist fdw_recheck_quals and RecheckForeignScan callback.

I think ForeignRecheck should first call ExecQual to test
fdw_recheck_quals. If it returns false, return false. If it returns
true, then give the FDW callback a chance, if one is defined. If that
returns false, return false. If we haven't yet returned false,
return true.

I think ExecQual on fdw_recheck_quals shall be called next to the
RecheckForeignScan callback, because econtext->ecxt_scantuple shall
not be reconstructed unless RecheckForeignScan callback is not called
if scanrelid==0.

If RecheckForeignScan is called prior to ExecQual, FDW driver can
take either of two options according to its preference.

(1) RecheckForeignScan callback reconstruct a joined tuple based on
the primitive EPQ slots, but nothing are rechecked by itself.
ForeignRecheck runs ExecQual on fdw_recheck_quals that represents
qualifiers of base relations and join condition.

(2) RecheckForeignScan callback reconstruct a joined tuple based on
the primitive EPQ slots, then rechecks qualifiers of base relations
and join condition by itself. It put NIL on fdw_recheck_quals, so
ExecQual in ForeignRecheck() always true.

In either case, we cannot use ExecQual prior to reconstruction of
a joined tuple because only FDW driver knows how to reconstruct it.
So, it means ForeignScan with scanrelid==0 always has to set NIL on
fdw_recheck_quals, if we would put ExecQual prior to the callback.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#8)
1 attachment(s)

-----Original Message-----
From: Kaigai Kouhei(海外 浩平)
Sent: Sunday, November 08, 2015 12:38 AM
To: 'Robert Haas'
Cc: Etsuro Fujita; Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org;
Shigeru Hanada
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On Fri, Nov 6, 2015 at 9:42 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

This patch needs to be rebased.
One thing different from the latest version is fdw_recheck_quals of
ForeignScan was added. So, ...

(1) Principle is that FDW driver knows what qualifiers were pushed down
and how does it kept in the private field. So, fdw_recheck_quals is
redundant and to be reverted.

(2) Even though the principle is as described in (1), however,
wired logic in ForeignRecheck() and fdw_recheck_quals are useful
default for most of FDW drivers. So, it shall be kept and valid
only if RecheckForeignScan callback is not defined.

Which is better approach for the v3 patch?
My preference is (1), because fdw_recheck_quals is a new feature,
thus, FDW driver has to be adjusted in v9.5 more or less, even if
it already supports qualifier push-down.
In general, interface becomes more graceful to stick its principle.

fdw_recheck_quals seems likely to be very convenient for FDW authors,
and I think ripping it out would be a terrible decision.

OK, I try to co-exist fdw_recheck_quals and RecheckForeignScan callback.

I think ForeignRecheck should first call ExecQual to test
fdw_recheck_quals. If it returns false, return false. If it returns
true, then give the FDW callback a chance, if one is defined. If that
returns false, return false. If we haven't yet returned false,
return true.

I think ExecQual on fdw_recheck_quals shall be called next to the
RecheckForeignScan callback, because econtext->ecxt_scantuple shall
not be reconstructed unless RecheckForeignScan callback is not called
if scanrelid==0.

If RecheckForeignScan is called prior to ExecQual, FDW driver can
take either of two options according to its preference.

(1) RecheckForeignScan callback reconstruct a joined tuple based on
the primitive EPQ slots, but nothing are rechecked by itself.
ForeignRecheck runs ExecQual on fdw_recheck_quals that represents
qualifiers of base relations and join condition.

(2) RecheckForeignScan callback reconstruct a joined tuple based on
the primitive EPQ slots, then rechecks qualifiers of base relations
and join condition by itself. It put NIL on fdw_recheck_quals, so
ExecQual in ForeignRecheck() always true.

In either case, we cannot use ExecQual prior to reconstruction of
a joined tuple because only FDW driver knows how to reconstruct it.
So, it means ForeignScan with scanrelid==0 always has to set NIL on
fdw_recheck_quals, if we would put ExecQual prior to the callback.

The attached patch is an adjusted version of the previous one.
Even though it co-exists a new callback and fdw_recheck_quals,
the callback is kicked first as follows.

----------------<cut here>----------------
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)

ResetExprContext(econtext);

+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}
 	return ExecQual(node->fdw_recheck_quals, econtext, false);
 }
----------------<cut here>----------------

If callback is invoked first, FDW driver can reconstruct a joined tuple
with its comfortable way, then remaining checks can be done by ExecQual
and fds_recheck_quals on the caller side.
If callback would be located on the tail, FDW driver has no choice.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fdw-epq-recheck.v3.patchapplication/octet-stream; name=pgsql-fdw-epq-recheck.v3.patchDownload
 doc/src/sgml/fdwhandler.sgml            | 26 ++++++++++++++++++++++++-
 src/backend/commands/explain.c          | 23 ++++++++++++++++++++++
 src/backend/executor/execScan.c         | 34 +++++++++++++++++++++++++++++----
 src/backend/executor/nodeForeignscan.c  | 13 +++++++++++++
 src/backend/nodes/copyfuncs.c           |  1 +
 src/backend/nodes/nodeFuncs.c           |  7 +++++++
 src/backend/nodes/outfuncs.c            |  2 ++
 src/backend/nodes/readfuncs.c           |  1 +
 src/backend/optimizer/plan/createplan.c | 13 ++++++++++++-
 src/backend/optimizer/plan/setrefs.c    | 14 ++++++++++++++
 src/backend/optimizer/plan/subselect.c  | 11 +++++++++++
 src/include/foreign/fdwapi.h            |  7 ++++++-
 src/include/nodes/execnodes.h           |  1 +
 src/include/nodes/plannodes.h           |  1 +
 src/include/nodes/relation.h            |  1 +
 15 files changed, 148 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 1533a6b..13bfad9 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -168,7 +168,8 @@ GetForeignPlan (PlannerInfo *root,
                 Oid foreigntableid,
                 ForeignPath *best_path,
                 List *tlist,
-                List *scan_clauses);
+                List *scan_clauses,
+                List *fdw_plans)
 </programlisting>
 
      Create a <structname>ForeignScan</> plan node from the selected foreign
@@ -259,6 +260,29 @@ IterateForeignScan (ForeignScanState *node);
 
     <para>
 <programlisting>
+bool
+RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
+</programlisting>
+     Rechecks visibility of the EPQ tuples according to the qualifiers
+     pushed-down.
+     This callback is optional, if this <structname>ForeignScanState</>
+     runs on a base foreign table. <structfield>fdw_recheck_quals</>
+     can be used instead to recheck on the target EPQ tuple by the backend.
+    </para>
+    <para>
+     On the other hands, if <literal>scanrelid</> equals zero thus it
+     represents a join sub-tree of foreign tables, this callback is
+     expected to reconstruct a joined tuple using the primitive EPQ
+     tuples and fill up the supplied <literal>slot</> according to
+     the <structfield>fdw_scan_tlist</> definition.
+     Also, this callback can or must recheck scan qualifiers and join
+     conditions which are pushed down. Especially, it needs special
+     handling if not simple inner-join, instead of the backend support
+     by <structfield>fdw_recheck_quals</>.
+    </para>
+
+    <para>
+<programlisting>
 void
 ReScanForeignScan (ForeignScanState *node);
 </programlisting>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7fb8a14..60522ef 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -114,6 +114,8 @@ static void ExplainMemberNodes(List *plans, PlanState **planstates,
 				   List *ancestors, ExplainState *es);
 static void ExplainSubPlans(List *plans, List *ancestors,
 				const char *relationship, ExplainState *es);
+static void ExplainForeignChildren(ForeignScanState *fss,
+								   List *ancestors, ExplainState *es);
 static void ExplainCustomChildren(CustomScanState *css,
 					  List *ancestors, ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *value,
@@ -1547,6 +1549,8 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		IsA(plan, BitmapAnd) ||
 		IsA(plan, BitmapOr) ||
 		IsA(plan, SubqueryScan) ||
+		(IsA(planstate, ForeignScanState) &&
+		 ((ForeignScanState *) planstate)->fdw_ps != NIL) ||
 		(IsA(planstate, CustomScanState) &&
 		 ((CustomScanState *) planstate)->custom_ps != NIL) ||
 		planstate->subPlan;
@@ -1603,6 +1607,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			ExplainNode(((SubqueryScanState *) planstate)->subplan, ancestors,
 						"Subquery", NULL, es);
 			break;
+		case T_ForeignScan:
+			ExplainForeignChildren((ForeignScanState *) planstate,
+								   ancestors, es);
+			break;
 		case T_CustomScan:
 			ExplainCustomChildren((CustomScanState *) planstate,
 								  ancestors, es);
@@ -2643,6 +2651,21 @@ ExplainSubPlans(List *plans, List *ancestors,
 }
 
 /*
+ * Explain a list of children of a ForeignScan.
+ */
+static void
+ExplainForeignChildren(ForeignScanState *fss,
+					   List *ancestors, ExplainState *es)
+{
+	ListCell   *cell;
+	const char *label =
+		(list_length(fss->fdw_ps) != 1 ? "children" : "child");
+
+	foreach(cell, fss->fdw_ps)
+		ExplainNode((PlanState *) lfirst(cell), ancestors, label, NULL, es);
+}
+
+/*
  * Explain a list of children of a CustomScan.
  */
 static void
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index a96e826..b472bf7 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -49,8 +49,16 @@ ExecScanFetch(ScanState *node,
 		 */
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
-		if (estate->es_epqTupleSet[scanrelid - 1])
+		if (scanrelid == 0)
+		{
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (estate->es_epqTupleSet[scanrelid - 1])
 		{
 			TupleTableSlot *slot = node->ss_ScanTupleSlot;
 
@@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
 	{
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
+
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *) node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *) node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int)nodeTag(node->ps.plan));
 
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
 	}
 }
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 6165e4a..1344c32 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -73,6 +73,7 @@ ForeignNext(ForeignScanState *node)
 static bool
 ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 {
+	FdwRoutine	*fdwroutine = node->fdwroutine;
 	ExprContext *econtext;
 
 	/*
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 
 	ResetExprContext(econtext);
 
+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}
 	return ExecQual(node->fdw_recheck_quals, econtext, false);
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c176ff9..21df5ce 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -645,6 +645,7 @@ _copyForeignScan(const ForeignScan *from)
 	 * copy remainder of node
 	 */
 	COPY_SCALAR_FIELD(fs_server);
+	COPY_NODE_FIELD(fdw_plans);
 	COPY_NODE_FIELD(fdw_exprs);
 	COPY_NODE_FIELD(fdw_private);
 	COPY_NODE_FIELD(fdw_scan_tlist);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index a11cb9f..99e03a9 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3485,6 +3485,13 @@ planstate_tree_walker(PlanState *planstate, bool (*walker) (), void *context)
 			if (walker(((SubqueryScanState *) planstate)->subplan, context))
 				return true;
 			break;
+		case T_ForeignScan:
+			foreach (lc, ((ForeignScanState *) planstate)->fdw_ps)
+			{
+				if (walker((PlanState *) lfirst(lc), context))
+					return true;
+			}
+			break;
 		case T_CustomScan:
 			foreach (lc, ((CustomScanState *) planstate)->custom_ps)
 			{
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3e75cd1..fafd6b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -591,6 +591,7 @@ _outForeignScan(StringInfo str, const ForeignScan *node)
 	_outScanInfo(str, (const Scan *) node);
 
 	WRITE_OID_FIELD(fs_server);
+	WRITE_NODE_FIELD(fdw_plans);
 	WRITE_NODE_FIELD(fdw_exprs);
 	WRITE_NODE_FIELD(fdw_private);
 	WRITE_NODE_FIELD(fdw_scan_tlist);
@@ -1680,6 +1681,7 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 
 	_outPathInfo(str, (const Path *) node);
 
+	WRITE_NODE_FIELD(fdw_paths);
 	WRITE_NODE_FIELD(fdw_private);
 }
 
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 94ba6dc..4b54016 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1795,6 +1795,7 @@ _readForeignScan(void)
 	ReadCommonScan(&local_node->scan);
 
 	READ_OID_FIELD(fs_server);
+	READ_NODE_FIELD(fdw_plans);
 	READ_NODE_FIELD(fdw_exprs);
 	READ_NODE_FIELD(fdw_private);
 	READ_NODE_FIELD(fdw_scan_tlist);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 791b64e..9dc445e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2095,11 +2095,20 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	Index		scan_relid = rel->relid;
 	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
+	List	   *fdw_plans = NIL;
 	ListCell   *lc;
 	int			i;
 
 	Assert(rel->fdwroutine != NULL);
 
+	/* Recursively transform child paths. */
+	foreach (lc, best_path->fdw_paths)
+	{
+		Plan   *plan = create_plan_recurse(root, (Path *) lfirst(lc));
+
+		fdw_plans = lappend(fdw_plans, plan);
+	}
+
 	/*
 	 * If we're scanning a base relation, fetch its OID.  (Irrelevant if
 	 * scanning a join relation.)
@@ -2129,7 +2138,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 */
 	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
-												tlist, scan_clauses);
+												tlist,
+												scan_clauses,
+												fdw_plans);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_path_costsize(&scan_plan->scan.plan, &best_path->path);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 48d6e6f..7e4d092 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1102,6 +1102,8 @@ set_foreignscan_references(PlannerInfo *root,
 						   ForeignScan *fscan,
 						   int rtoffset)
 {
+	ListCell   *lc;
+
 	/* Adjust scanrelid if it's valid */
 	if (fscan->scan.scanrelid > 0)
 		fscan->scan.scanrelid += rtoffset;
@@ -1129,6 +1131,12 @@ set_foreignscan_references(PlannerInfo *root,
 						   itlist,
 						   INDEX_VAR,
 						   rtoffset);
+		fscan->fdw_recheck_quals = (List *)
+			fix_upper_expr(root,
+						   (Node *) fscan->fdw_recheck_quals,
+						   itlist,
+						   INDEX_VAR,
+						   rtoffset);
 		pfree(itlist);
 		/* fdw_scan_tlist itself just needs fix_scan_list() adjustments */
 		fscan->fdw_scan_tlist =
@@ -1147,6 +1155,12 @@ set_foreignscan_references(PlannerInfo *root,
 			fix_scan_list(root, fscan->fdw_recheck_quals, rtoffset);
 	}
 
+	/* Adjust child plan-nodes recursively, if needed */
+	foreach (lc, fscan->fdw_plans)
+	{
+		lfirst(lc) = set_plan_refs(root, (Plan *) lfirst(lc), rtoffset);
+	}
+
 	/* Adjust fs_relids if needed */
 	if (rtoffset > 0)
 	{
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 82414d4..7b50455 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2396,6 +2396,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 		case T_ForeignScan:
 			{
 				ForeignScan *fscan = (ForeignScan *) plan;
+				ListCell	*lc;
 
 				finalize_primnode((Node *) fscan->fdw_exprs,
 								  &context);
@@ -2405,6 +2406,16 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 				/* We assume fdw_scan_tlist cannot contain Params */
 				context.paramids = bms_add_members(context.paramids,
 												   scan_params);
+				/* child nodes if any */
+				foreach (lc, fscan->fdw_plans)
+				{
+					context.paramids =
+						bms_add_members(context.paramids,
+										finalize_plan(root,
+													  (Plan *) lfirst(lc),
+													  valid_params,
+													  scan_params));
+				}
 			}
 			break;
 
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 69b48b4..4a41351 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -36,13 +36,17 @@ typedef ForeignScan *(*GetForeignPlan_function) (PlannerInfo *root,
 														  Oid foreigntableid,
 													  ForeignPath *best_path,
 															 List *tlist,
-														 List *scan_clauses);
+												 List *scan_clauses,
+												 List *fdw_plans);
 
 typedef void (*BeginForeignScan_function) (ForeignScanState *node,
 													   int eflags);
 
 typedef TupleTableSlot *(*IterateForeignScan_function) (ForeignScanState *node);
 
+typedef bool (*RecheckForeignScan_function) (ForeignScanState *node,
+											 TupleTableSlot *slot);
+
 typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
 
 typedef void (*EndForeignScan_function) (ForeignScanState *node);
@@ -138,6 +142,7 @@ typedef struct FdwRoutine
 	GetForeignPlan_function GetForeignPlan;
 	BeginForeignScan_function BeginForeignScan;
 	IterateForeignScan_function IterateForeignScan;
+	RecheckForeignScan_function RecheckForeignScan;
 	ReScanForeignScan_function ReScanForeignScan;
 	EndForeignScan_function EndForeignScan;
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 58ec889..c5c89de 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1582,6 +1582,7 @@ typedef struct ForeignScanState
 	List	   *fdw_recheck_quals;	/* original quals not in ss.ps.qual */
 	/* use struct pointer to avoid including fdwapi.h here */
 	struct FdwRoutine *fdwroutine;
+	List	   *fdw_ps;			/* list of child PlanState nodes, if any */
 	void	   *fdw_state;		/* foreign-data wrapper can keep state here */
 } ForeignScanState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 6b28c8e..bd73371 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -526,6 +526,7 @@ typedef struct ForeignScan
 {
 	Scan		scan;
 	Oid			fs_server;		/* OID of foreign server */
+	List	   *fdw_plans;		/* list of Plan nodes, if any */
 	List	   *fdw_exprs;		/* expressions that FDW may evaluate */
 	List	   *fdw_private;	/* private data for FDW */
 	List	   *fdw_scan_tlist; /* optional tlist describing scan tuple */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6cf2e24..707927c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -907,6 +907,7 @@ typedef struct TidPath
 typedef struct ForeignPath
 {
 	Path		path;
+	List	   *fdw_paths;
 	List	   *fdw_private;
 } ForeignPath;
 
#10Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#9)

On 2015/11/09 9:26, Kouhei Kaigai wrote:

I think ForeignRecheck should first call ExecQual to test
fdw_recheck_quals. If it returns false, return false. If it returns
true, then give the FDW callback a chance, if one is defined. If that
returns false, return false. If we haven't yet returned false,
return true.

I think ExecQual on fdw_recheck_quals shall be called next to the
RecheckForeignScan callback, because econtext->ecxt_scantuple shall
not be reconstructed unless RecheckForeignScan callback is not called
if scanrelid==0.

I agree with KaiGai-san. I think we can define fdw_recheck_quals for
the foreign-join case as quals not in scan.plan.qual, the same way as
the simple foreign scan case. (In other words, the quals would be
defind as "otherclauses", ie, rinfo->is_pushed_down=true, that have been
pushed down to the remote server. For checking the fdw_recheck_quals,
however, I think we should reconstruct the join tuple first, which I
think is essential for cases where an outer join is performed remotely,
to avoid changing the semantics. BTW, in my patch [1]/messages/by-id/5624D583.10202@lab.ntt.co.jp, a secondary plan
will be created to evaluate such otherclauses after reconstructing the
join tuple.

The attached patch is an adjusted version of the previous one.
Even though it co-exists a new callback and fdw_recheck_quals,
the callback is kicked first as follows.

Thanks for the patch!

----------------<cut here>----------------
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)

ResetExprContext(econtext);

+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}
return ExecQual(node->fdw_recheck_quals, econtext, false);
}
----------------<cut here>----------------

If callback is invoked first, FDW driver can reconstruct a joined tuple
with its comfortable way, then remaining checks can be done by ExecQual
and fds_recheck_quals on the caller side.
If callback would be located on the tail, FDW driver has no choice.

To test this change, I think we should update the postgres_fdw patch so
as to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding
the callback routine, to be honest. I know KaiGai-san considers that
that would be useful for custom joins, but I don't think that that would
be useful even for foreign joins, because I think that in case of
foreign joins, the practical implementation of that routine in FDWs
would be to create a secondary plan and execute that plan by performing
ExecProcNode, as my patch does [1]/messages/by-id/5624D583.10202@lab.ntt.co.jp. Maybe I'm missing something, though.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/5624D583.10202@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#10)
----------------<cut here>----------------
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot

*slot)

ResetExprContext(econtext);

+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}
return ExecQual(node->fdw_recheck_quals, econtext, false);
}
----------------<cut here>----------------

If callback is invoked first, FDW driver can reconstruct a joined tuple
with its comfortable way, then remaining checks can be done by ExecQual
and fds_recheck_quals on the caller side.
If callback would be located on the tail, FDW driver has no choice.

To test this change, I think we should update the postgres_fdw patch so
as to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding
the callback routine, to be honest. I know KaiGai-san considers that
that would be useful for custom joins, but I don't think that that would
be useful even for foreign joins, because I think that in case of
foreign joins, the practical implementation of that routine in FDWs
would be to create a secondary plan and execute that plan by performing
ExecProcNode, as my patch does [1]. Maybe I'm missing something, though.

I've never denied that alternative local sub-plan is one of the best
approach for postgres_fdw, however, I've also never heard why you can
say the best approach for postgres_fdw is definitely also best for
others.
If we would justify less flexible interface specification because of
comfort for a particular extension, it should not be an extension,
but a built-in feature.
My standpoint has been consistent through the discussion; we can never
predicate which feature shall be implemented on FDW interface, therefore,
we also cannot predicate which implementation is best for EPQ rechecks
also. Only FDW driver knows which is the "best" for them, not us.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#11)

On 2015/11/09 13:40, Kouhei Kaigai wrote:

Having said that, as I said previously, I don't see much value in adding
the callback routine, to be honest. I know KaiGai-san considers that
that would be useful for custom joins, but I don't think that that would
be useful even for foreign joins, because I think that in case of
foreign joins, the practical implementation of that routine in FDWs
would be to create a secondary plan and execute that plan by performing
ExecProcNode, as my patch does [1]. Maybe I'm missing something, though.

I've never denied that alternative local sub-plan is one of the best
approach for postgres_fdw, however, I've also never heard why you can
say the best approach for postgres_fdw is definitely also best for
others.
If we would justify less flexible interface specification because of
comfort for a particular extension, it should not be an extension,
but a built-in feature.
My standpoint has been consistent through the discussion; we can never
predicate which feature shall be implemented on FDW interface, therefore,
we also cannot predicate which implementation is best for EPQ rechecks
also. Only FDW driver knows which is the "best" for them, not us.

What the RecheckForeignScan routine does for the foreign-join case would
be the following for tuples stored in estate->es_epqTuple[]:

1. Apply relevant restriction clauses, including fdw_recheck_quals, to
the tuples for the baserels involved in a foreign-join, and see if the
tuples still pass the clauses.

2. If so, form a join tuple, while applying relevant join clauses to the
tuples, and set the join tuple in the given slot. Else set empty.

I think these would be more efficiently processed internally in core
than externally in FDWs. That's why I don't see much value in adding
the routine. I have to admit that that means no flexibility, though.

However, the routine as-is doesn't seem good enough, either. For
example, since the routine is called after each of the tuples was
re-fetched from the remote end or re-computed from the whole-row var and
stored in the corresponding estate->es_epqTuple[], the routine wouldn't
allow for what Robert proposed in [2]/messages/by-id/CA+TgmoZdPU_fcSpOzXxpD1xvyq3cZCAwD7-x3aVWbKgSFoHvRA@mail.gmail.com. To do such a thing, I think we
would probably need to change the existing EPQ machinery more
drastically and rethink the right place for calling the routine.

Best regards,
Etsuro Fujita

[2]: /messages/by-id/CA+TgmoZdPU_fcSpOzXxpD1xvyq3cZCAwD7-x3aVWbKgSFoHvRA@mail.gmail.com
/messages/by-id/CA+TgmoZdPU_fcSpOzXxpD1xvyq3cZCAwD7-x3aVWbKgSFoHvRA@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#12)

On 2015/11/09 13:40, Kouhei Kaigai wrote:

Having said that, as I said previously, I don't see much value in adding
the callback routine, to be honest. I know KaiGai-san considers that
that would be useful for custom joins, but I don't think that that would
be useful even for foreign joins, because I think that in case of
foreign joins, the practical implementation of that routine in FDWs
would be to create a secondary plan and execute that plan by performing
ExecProcNode, as my patch does [1]. Maybe I'm missing something, though.

I've never denied that alternative local sub-plan is one of the best
approach for postgres_fdw, however, I've also never heard why you can
say the best approach for postgres_fdw is definitely also best for
others.
If we would justify less flexible interface specification because of
comfort for a particular extension, it should not be an extension,
but a built-in feature.
My standpoint has been consistent through the discussion; we can never
predicate which feature shall be implemented on FDW interface, therefore,
we also cannot predicate which implementation is best for EPQ rechecks
also. Only FDW driver knows which is the "best" for them, not us.

What the RecheckForeignScan routine does for the foreign-join case would
be the following for tuples stored in estate->es_epqTuple[]:

1. Apply relevant restriction clauses, including fdw_recheck_quals, to
the tuples for the baserels involved in a foreign-join, and see if the
tuples still pass the clauses.

It depends on how FDW driver has restriction clauses, but you should not
use fdw_recheck_quals to recheck individual base relations, because it is
initialized to run on the joined tuple according to fdw_scan_tlist, so
restriction clauses has to be kept in other private field.

2. If so, form a join tuple, while applying relevant join clauses to the
tuples, and set the join tuple in the given slot. Else set empty.

No need to form a joined tuple after the rechecks of base relations's
clauses. If FDW support only inner-join, it can reconstruct a joined
tuple first, then run fdw_recheck_quals (by caller) that contains both
relation's clauses and join clause.
FDW driver can choose its comfortable way according to its implementation
and capability.

I think these would be more efficiently processed internally in core
than externally in FDWs. That's why I don't see much value in adding
the routine. I have to admit that that means no flexibility, though.

Something like "efficiently", "better", "reasonable" and etc... are
your opinions from your standpoint. Things important is why you
thought X is better and Y is worse. It is what I've wanted to see for
three months, but never seen.

Discussion will become unproductive without understanding of the
reason of different conclusion. Please don't omit why you think it
is "efficient" that can justify to enforce all FDW drivers
a particular implementation manner, as a part of interface contract.

However, the routine as-is doesn't seem good enough, either. For
example, since the routine is called after each of the tuples was
re-fetched from the remote end or re-computed from the whole-row var and
stored in the corresponding estate->es_epqTuple[], the routine wouldn't
allow for what Robert proposed in [2]. To do such a thing, I think we
would probably need to change the existing EPQ machinery more
drastically and rethink the right place for calling the routine.

Please also see my message:
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8011617C6@BPXM15GP.gisp.nec.co.jp

And, why Robert thought here is a tough challenge:
/messages/by-id/CA+TgmoY5Lf+vYy1Bha=U7__S3qtMQP7d+gSSfd+LN4Xz6Fybkg@mail.gmail.com

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#10)

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

To test this change, I think we should update the postgres_fdw patch so as
to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding the
callback routine, to be honest. I know KaiGai-san considers that that would
be useful for custom joins, but I don't think that that would be useful even
for foreign joins, because I think that in case of foreign joins, the
practical implementation of that routine in FDWs would be to create a
secondary plan and execute that plan by performing ExecProcNode, as my patch
does [1]. Maybe I'm missing something, though.

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case. From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that. So no problem: postgres_fdw and any other FDWs where the
remote side is a database can easily delegate to a subplan, and
anybody who wants to do something else still can.

What is not to like about that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#14)

On 2015/11/12 2:53, Robert Haas wrote:

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

To test this change, I think we should update the postgres_fdw patch so as
to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding the
callback routine, to be honest. I know KaiGai-san considers that that would
be useful for custom joins, but I don't think that that would be useful even
for foreign joins, because I think that in case of foreign joins, the
practical implementation of that routine in FDWs would be to create a
secondary plan and execute that plan by performing ExecProcNode, as my patch
does [1]. Maybe I'm missing something, though.

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case.

Really? I think there would be not a little burden on an FDW author;
when postgres_fdw delegates to the subplan to the remote server, for
example, it would need to create a remote join query by looking at
tuples possibly fetched and stored in estate->es_epqTuple[], send the
query and receive the result during the callback routine. Furthermore,
what I'm most concerned about is that wouldn't be efficient. So, my
question about that approach is whether FDWs really do some thing like
that during the callback routine, instead of performing a secondary join
plan locally. As I said before, I know that KaiGai-san considers that
that approach would be useful for custom joins. But I see zero evidence
that there is a good use-case for an FDW.

From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that. So no problem: postgres_fdw and any other FDWs where the
remote side is a database can easily delegate to a subplan, and
anybody who wants to do something else still can.

What is not to like about that?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#15)

-----Original Message-----
From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
Sent: Thursday, November 12, 2015 2:54 PM
To: Robert Haas
Cc: Kaigai Kouhei(海外 浩平); Tom Lane; Kyotaro HORIGUCHI;
pgsql-hackers@postgresql.org; Shigeru Hanada
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On 2015/11/12 2:53, Robert Haas wrote:

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

To test this change, I think we should update the postgres_fdw patch so as
to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding the
callback routine, to be honest. I know KaiGai-san considers that that would
be useful for custom joins, but I don't think that that would be useful even
for foreign joins, because I think that in case of foreign joins, the
practical implementation of that routine in FDWs would be to create a
secondary plan and execute that plan by performing ExecProcNode, as my patch
does [1]. Maybe I'm missing something, though.

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case.

Really? I think there would be not a little burden on an FDW author;
when postgres_fdw delegates to the subplan to the remote server, for
example, it would need to create a remote join query by looking at
tuples possibly fetched and stored in estate->es_epqTuple[], send the
query and receive the result during the callback routine.

I cannot understand why it is the only solution.
Our assumption is, FDW driver knows the best way to do. So, you can
take the best way for your FDW driver - including what you want to
implement in the built-in feature.

Furthermore,
what I'm most concerned about is that wouldn't be efficient. So, my

You have to add "because ..." sentence here because I and Robert
think a little inefficiency is not a problem. If you try to
persuade other parsons who have different opinion, you need to
introduce WHY you have different conclusion. (Of course, we might
oversight something)
Please don't start the sentence from "I think ...". We all knows
your opinion, but what I've wanted to see is "the reason why my
approach is valuable is ...".

I never suggest something technically difficult, but it is
a problem on communication.

question about that approach is whether FDWs really do some thing like
that during the callback routine, instead of performing a secondary join
plan locally.

Nobody prohibits postgres_fdw performs a secondary join here.
All you need to do is, picking up a sub-plan tree from FDW's private
field then call ExecProcNode() inside the callback.

As I said before, I know that KaiGai-san considers that
that approach would be useful for custom joins. But I see zero evidence
that there is a good use-case for an FDW.

From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that. So no problem: postgres_fdw and any other FDWs where the
remote side is a database can easily delegate to a subplan, and
anybody who wants to do something else still can.

What is not to like about that?

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#15)

Hello,

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case.

Really? I think there would be not a little burden on an FDW author;
when postgres_fdw delegates to the subplan to the remote server, for
example, it would need to create a remote join query by looking at
tuples possibly fetched and stored in estate->es_epqTuple[], send the
query and receive the result during the callback routine.

Do you mind that FDW cannot generate a plan so that make a tuple
from eqpTules then apply fdw_quals from predefined executor
nodes?

The returned tuple itself can be stored in fdw_private as I think
Kiagai-san said before. So it is enough if we can fabricate a
Result node outerPlan of which is ForeignScan, which somehow
returns the tuple to examine.

I should be missing something, though.

regards,

Furthermore, what I'm most concerned about is that wouldn't be
efficient. So, my question about that approach is whether FDWs really
do some thing like that during the callback routine, instead of
performing a secondary join plan locally. As I said before, I know
that KaiGai-san considers that that approach would be useful for
custom joins. But I see zero evidence that there is a good use-case
for an FDW.

From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that. So no problem: postgres_fdw and any other FDWs where the
remote side is a database can easily delegate to a subplan, and
anybody who wants to do something else still can.

What is not to like about that?

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#16)

Robert and Kaigai-san,

Sorry, I sent in an unfinished email.

On 2015/11/12 15:30, Kouhei Kaigai wrote:

On 2015/11/12 2:53, Robert Haas wrote:

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

To test this change, I think we should update the postgres_fdw patch so as
to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding the
callback routine, to be honest. I know KaiGai-san considers that that would
be useful for custom joins, but I don't think that that would be useful even
for foreign joins, because I think that in case of foreign joins, the
practical implementation of that routine in FDWs would be to create a
secondary plan and execute that plan by performing ExecProcNode, as my patch
does [1]. Maybe I'm missing something, though.

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case.

Really? I think there would be not a little burden on an FDW author;
when postgres_fdw delegates to the subplan to the remote server, for
example, it would need to create a remote join query by looking at
tuples possibly fetched and stored in estate->es_epqTuple[], send the
query and receive the result during the callback routine.

I cannot understand why it is the only solution.

I didn't say that.

Furthermore,
what I'm most concerned about is that wouldn't be efficient. So, my

You have to add "because ..." sentence here because I and Robert
think a little inefficiency is not a problem.

Sorry, my explanation was not enough. The reason for that is that in
the above postgres_fdw case for example, the overhead in sending the
query to the remote end and transferring the result to the local end
would not be negligible. Yeah, we might be able to apply a special
handling for the improved efficiency when using early row locking, but
otherwise can we do the same thing?

Please don't start the sentence from "I think ...". We all knows
your opinion, but what I've wanted to see is "the reason why my
approach is valuable is ...".

I didn't say that my approach is *valuable* either. What I think is, I
see zero evidence that there is a good use-case for an FDW to do
something other than doing an ExecProcNode in the callback routine, as I
said below, so I don't see the need to add such a routine while that
would cause maybe not a large, but not a little burden for writing such
a routine on FDW authors.

Nobody prohibits postgres_fdw performs a secondary join here.
All you need to do is, picking up a sub-plan tree from FDW's private
field then call ExecProcNode() inside the callback.

As I said before, I know that KaiGai-san considers that
that approach would be useful for custom joins. But I see zero evidence
that there is a good use-case for an FDW.

From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

I did the same thing in an earlier version of the patch I posted.
Although I agreed on Robert's comment "The Plan tree and the PlanState
tree should be mirror images of each other; breaking that equivalence
will cause confusion, at least.", I think that that would make code much
simpler, especially the code for setting chgParam for inner/outer
subplans. But one thing I'm concerned about is enable both inner and
outer plans, because I think that that would make the planner
postprocessing complicated, depending on what the foreign scans do by
the inner/outer subplans. Is it worth doing so? Maybe I'm missing
something, though.

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that.

Yeah, I think FDWs would probably need to create a subplan accordingly
at planning time, and then initializing/closing the plan at execution
time. I think we could facilitate subplan creation by providing helper
functions for that, though.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#17)

Horiguchi-san,

On 2015/11/12 16:10, Kyotaro HORIGUCHI wrote:

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case.

Really? I think there would be not a little burden on an FDW author;
when postgres_fdw delegates to the subplan to the remote server, for
example, it would need to create a remote join query by looking at
tuples possibly fetched and stored in estate->es_epqTuple[], send the
query and receive the result during the callback routine.

Do you mind that FDW cannot generate a plan so that make a tuple
from eqpTules then apply fdw_quals from predefined executor
nodes?

No. Please see my previous email. Sorry for my unfinished email.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#18)

-----Original Message-----
From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
Sent: Thursday, November 12, 2015 6:54 PM
To: Kaigai Kouhei(海外 浩平); Robert Haas
Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

Robert and Kaigai-san,

Sorry, I sent in an unfinished email.

On 2015/11/12 15:30, Kouhei Kaigai wrote:

On 2015/11/12 2:53, Robert Haas wrote:

On Sun, Nov 8, 2015 at 11:13 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

To test this change, I think we should update the postgres_fdw patch so as
to add the RecheckForeignScan.

Having said that, as I said previously, I don't see much value in adding

the

callback routine, to be honest. I know KaiGai-san considers that that would
be useful for custom joins, but I don't think that that would be useful even
for foreign joins, because I think that in case of foreign joins, the
practical implementation of that routine in FDWs would be to create a
secondary plan and execute that plan by performing ExecProcNode, as my patch
does [1]. Maybe I'm missing something, though.

I really don't see why you're fighting on this point. Making this a
generic feature will require only a few extra lines of code for FDW
authors. If this were going to cause some great inconvenience for FDW
authors, then I'd agree it isn't worth it. But I see zero evidence
that this is actually the case.

Really? I think there would be not a little burden on an FDW author;
when postgres_fdw delegates to the subplan to the remote server, for
example, it would need to create a remote join query by looking at
tuples possibly fetched and stored in estate->es_epqTuple[], send the
query and receive the result during the callback routine.

I cannot understand why it is the only solution.

I didn't say that.

Furthermore,
what I'm most concerned about is that wouldn't be efficient. So, my

You have to add "because ..." sentence here because I and Robert
think a little inefficiency is not a problem.

Sorry, my explanation was not enough. The reason for that is that in
the above postgres_fdw case for example, the overhead in sending the
query to the remote end and transferring the result to the local end
would not be negligible. Yeah, we might be able to apply a special
handling for the improved efficiency when using early row locking, but
otherwise can we do the same thing?

It is trade-off. Late locking semantics allows to lock relatively smaller
number of remote rows, it will take extra latency.
Also, it became clear we have a challenge to pull a joined tuple at once.

Please don't start the sentence from "I think ...". We all knows
your opinion, but what I've wanted to see is "the reason why my
approach is valuable is ...".

I didn't say that my approach is *valuable* either. What I think is, I
see zero evidence that there is a good use-case for an FDW to do
something other than doing an ExecProcNode in the callback routine, as I
said below, so I don't see the need to add such a routine while that
would cause maybe not a large, but not a little burden for writing such
a routine on FDW authors.

It is quite natural because we cannot predicate what kind of extension
is implemented on FDW interface. You might know the initial version of
PG-Strom is implemented on FDW (about 4 years before...). If I would
continue to stick FDW, it became a FDW driver with own join engine.
(cstore_fdw may potentially support own join logic on top of their
columnar storage for instance?)

From the standpoint of interface design, if we would not admit flexibility
of implementation unless community don't see a working example, a reasonable
tactics *for extension author* is to follow the interface restriction even
if it is not best approach from his standpoint.
It does not mean the approach by majority is also best for the minority.
It just requires the minority a compromise.

Nobody prohibits postgres_fdw performs a secondary join here.
All you need to do is, picking up a sub-plan tree from FDW's private
field then call ExecProcNode() inside the callback.

As I said before, I know that KaiGai-san considers that
that approach would be useful for custom joins. But I see zero evidence
that there is a good use-case for an FDW.

From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

I did the same thing in an earlier version of the patch I posted.
Although I agreed on Robert's comment "The Plan tree and the PlanState
tree should be mirror images of each other; breaking that equivalence
will cause confusion, at least.", I think that that would make code much
simpler, especially the code for setting chgParam for inner/outer
subplans. But one thing I'm concerned about is enable both inner and
outer plans, because I think that that would make the planner
postprocessing complicated, depending on what the foreign scans do by
the inner/outer subplans. Is it worth doing so? Maybe I'm missing
something, though.

If you persuade other person who has different opinion, you need to
explain why was it complicated, how much complicated and what was
the solution you tried at that time.
The "complicated" is a subjectively-based term. At least, we don't
share your experience, so it is hard to understand the how complexity.

I guess it is similar to what built-in logic is usually doing, thus,
it should not be a problem we cannot solve. A utility routine FDW
driver can call will solve the issue (even if it is not supported
on v9.5 yet).

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that.

Yeah, I think FDWs would probably need to create a subplan accordingly
at planning time, and then initializing/closing the plan at execution
time. I think we could facilitate subplan creation by providing helper
functions for that, though.

I can agree with we ought to provide a utility routine to construct
a local alternative subplan, however, we are in beta2 stage for v9.5.
So, I'd like to suggest only callback on v9.5 (FDW driver can handle
its subplan by itself, no need to path the backend), then design the
utility routine for this.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#20)

Hello, I also uncertain about what exactly is the blocker..

At Fri, 13 Nov 2015 02:31:53 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F80116F7AF@BPXM15GP.gisp.nec.co.jp>

Sorry, my explanation was not enough. The reason for that is that in
the above postgres_fdw case for example, the overhead in sending the
query to the remote end and transferring the result to the local end
would not be negligible. Yeah, we might be able to apply a special
handling for the improved efficiency when using early row locking, but
otherwise can we do the same thing?

It is trade-off. Late locking semantics allows to lock relatively smaller
number of remote rows, it will take extra latency.
Also, it became clear we have a challenge to pull a joined tuple at once.

Late row locking anyway needs to send query to the remote side
and needs to generate the joined row in either side of the
connection. Early row locking on FDW don't need that since the
necessary tuples are already in out hand. Is there any
performance issue in this? Unfortunately I've not comprehend what
is the problem:(

Or, Are you Fujita-san thinking about bulk late row locking or
such? If so, it is a matter of future, as update/insert
pushdown, I suppose.

I didn't say that my approach is *valuable* either. What I think is, I
see zero evidence that there is a good use-case for an FDW to do
something other than doing an ExecProcNode in the callback routine, as I
said below, so I don't see the need to add such a routine while that
would cause maybe not a large, but not a little burden for writing such
a routine on FDW authors.

It is quite natural because we cannot predicate what kind of extension
is implemented on FDW interface. You might know the initial version of
PG-Strom is implemented on FDW (about 4 years before...). If I would
continue to stick FDW, it became a FDW driver with own join engine.
(cstore_fdw may potentially support own join logic on top of their
columnar storage for instance?)

From the standpoint of interface design, if we would not admit flexibility
of implementation unless community don't see a working example, a reasonable
tactics *for extension author* is to follow the interface restriction even
if it is not best approach from his standpoint.
It does not mean the approach by majority is also best for the minority.
It just requires the minority a compromise.

Or try to open the way to introduce the feature he/she wants. If
workable postgres_fdw with join pushdown based on this API to any
extent be shown here, we can envestigate on the problem
there. But perhaps the deadline is just before us..

I did the same thing in an earlier version of the patch I posted.
Although I agreed on Robert's comment "The Plan tree and the PlanState
tree should be mirror images of each other; breaking that equivalence
will cause confusion, at least.", I think that that would make code much
simpler, especially the code for setting chgParam for inner/outer
subplans.

I see that the Kaigai-san's patch doesn't put different nodes
from paths during plan creation, in other words, it doesn't break
coherence between paths and plans as long as core's point of
view. The Fujita-san's patch mentioned above altered a node in
core's sight. I understand that it is the most significant
difference between them..

But one thing I'm concerned about is enable both inner and
outer plans, because I think that that would make the planner
postprocessing complicated, depending on what the foreign scans do by
the inner/outer subplans. Is it worth doing so? Maybe I'm missing
something, though.

This is discussion about late row locking? Join pushdown itself
is a kind of complicated process. And since it fools planner in
one aspect, the additional feature would be inevitable to be
complex to some extent. We could discuss on that after some
specific problem comes in out sight.

If you persuade other person who has different opinion, you need to
explain why was it complicated, how much complicated and what was
the solution you tried at that time.
The "complicated" is a subjectively-based term. At least, we don't
share your experience, so it is hard to understand the how complexity.

Mee too. It surely might be complicated (though the extent is
mainly in indivisual's mind..) but also I don't see how the
Fujita-san's patch resolves that "problem".

I guess it is similar to what built-in logic is usually doing, thus,
it should not be a problem we cannot solve. A utility routine FDW
driver can call will solve the issue (even if it is not supported
on v9.5 yet).

(2) Add a recheck callback.

If the foreign data wrapper wants to adopt the solution you're
proposing, the recheck callback can call
ExecProcNode(outerPlanState(node)). I don't think this should end up
being more than a few lines of code, although of course we should
verify that.

Yeah, I think FDWs would probably need to create a subplan accordingly
at planning time, and then initializing/closing the plan at execution
time. I think we could facilitate subplan creation by providing helper
functions for that, though.

I can agree with we ought to provide a utility routine to construct
a local alternative subplan, however, we are in beta2 stage for v9.5.
So, I'd like to suggest only callback on v9.5 (FDW driver can handle
its subplan by itself, no need to path the backend), then design the
utility routine for this.

Support routine itself won't be a blocker since it can be copied
into FDW for the memoent, then we can propose to expose them if
it found to be essential. It would be a problem if some essential
data be found out of reach, but I guess we have all required data
already in hand.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#20)

On 2015/11/13 11:31, Kouhei Kaigai wrote:

On 2015/11/12 2:53, Robert Haas wrote:

From my point of view I'm now
thinking this solution has two parts:

(1) Let foreign scans have inner and outer subplans. For this
purpose, we only need one, but it's no more work to enable both, so we
may as well. If we had some reason, we could add a list of subplans
of arbitrary length, but there doesn't seem to be an urgent need for
that.

I wrote:

But one thing I'm concerned about is enable both inner and
outer plans, because I think that that would make the planner
postprocessing complicated, depending on what the foreign scans do by
the inner/outer subplans. Is it worth doing so? Maybe I'm missing
something, though.

If you persuade other person who has different opinion, you need to
explain why was it complicated, how much complicated and what was
the solution you tried at that time.
The "complicated" is a subjectively-based term. At least, we don't
share your experience, so it is hard to understand the how complexity.

I don't mean to object that idea. I'm unfamiliar with that idea, so I
just wanted to know the reason, or use cases.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#21)

On 2015/11/13 13:44, Kyotaro HORIGUCHI wrote:

I wrote:

What I think is, I
see zero evidence that there is a good use-case for an FDW to do
something other than doing an ExecProcNode in the callback routine, as I
said below, so I don't see the need to add such a routine while that
would cause maybe not a large, but not a little burden for writing such
a routine on FDW authors.

KaiGai-san wrote:

It is quite natural because we cannot predicate what kind of extension
is implemented on FDW interface. You might know the initial version of
PG-Strom is implemented on FDW (about 4 years before...). If I would
continue to stick FDW, it became a FDW driver with own join engine.

From the standpoint of interface design, if we would not admit flexibility
of implementation unless community don't see a working example, a reasonable
tactics *for extension author* is to follow the interface restriction even
if it is not best approach from his standpoint.
It does not mean the approach by majority is also best for the minority.
It just requires the minority a compromise.

Or try to open the way to introduce the feature he/she wants.

I think the biggest difference between KaiGai-san's patch and mine is
that KaiGai-san's patch introduces a callback routine to allow an FDW
author not only to execute a secondary plan but to do something else,
instead of executing the plan, if he/she wants to do so. His approach
would provide the flexibility, but IMHO I think major FDWs that would be
implementing join pushdown, such as postgres_fdw, wouldn't be utilizing
the flexibility; probably, they would be just executing the secondary
plan in the routine. Furthermore, since that for executing the plan,
his approach would require that an FDW author has to add code not only
for creating the plan but for initializing/executing/ending it to
his/her FDW by itself while in my approach, he/she only has to add code
for the plan creation, his approach would impose a more development
burden on such major FDWs' authors than mine. I think the flexibility
would be a good thing, but I also think it's important not to burden FDW
authors. Maybe I'm missing something, though.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#23)

On 2015/11/13 13:44, Kyotaro HORIGUCHI wrote:

I wrote:

What I think is, I
see zero evidence that there is a good use-case for an FDW to do
something other than doing an ExecProcNode in the callback routine, as I
said below, so I don't see the need to add such a routine while that
would cause maybe not a large, but not a little burden for writing such
a routine on FDW authors.

KaiGai-san wrote:

It is quite natural because we cannot predicate what kind of extension
is implemented on FDW interface. You might know the initial version of
PG-Strom is implemented on FDW (about 4 years before...). If I would
continue to stick FDW, it became a FDW driver with own join engine.

From the standpoint of interface design, if we would not admit flexibility
of implementation unless community don't see a working example, a reasonable
tactics *for extension author* is to follow the interface restriction even
if it is not best approach from his standpoint.
It does not mean the approach by majority is also best for the minority.
It just requires the minority a compromise.

Or try to open the way to introduce the feature he/she wants.

I think the biggest difference between KaiGai-san's patch and mine is
that KaiGai-san's patch introduces a callback routine to allow an FDW
author not only to execute a secondary plan but to do something else,
instead of executing the plan, if he/she wants to do so. His approach
would provide the flexibility, but IMHO I think major FDWs that would be
implementing join pushdown, such as postgres_fdw, wouldn't be utilizing
the flexibility; probably, they would be just executing the secondary
plan in the routine.

Yes, my approach never deny.

Furthermore, since that for executing the plan,
his approach would require that an FDW author has to add code not only
for creating the plan but for initializing

Pick up a plan from fdw_plans, then call ExecInitNode()

executing

Pick up a plan-state from fdw_ps then call ExecProcNode()

ending it to

Also, call ExecEndNode() towards the plan-state.

his/her FDW by itself while in my approach, he/she only has to add code
for the plan creation, his approach would impose a more development
burden on such major FDWs' authors than mine.

It looks to me the more development burden is additional three lines.

Both of our approaches commonly needs to construct alternative local
plan, likely unparametalized nest-loop, on planner phase, it shall be
supported by a utility function in the core background.
So, one more additional line will be eventually needed.

I think the flexibility
would be a good thing, but I also think it's important not to burden FDW
authors. Maybe I'm missing something, though.

The actual pain is people cannot design/implement their module as
they want. I've repeatedly pointed out FDW driver can have own join
implementation and people potentially want to use own logic than
local plan. At least, if PG-Strom would still run on FDW, I *want*
to reuse its CPU-fallback routine instead of the alternative sub-plan.

Could you introduce us why above sequence (a few additional lines) are
unacceptable burden and can justify to eliminate flexibility for
minorities?
If you can implement the "common part" for majority, we can implement
same stuff as utility functions can be called from the callbacks.

My questions are:
* How much lines do you expect for the additional burden?
* Why does it justify to eliminate flexibility of the interface?
* Why cannot we implement the common part as utility routines that
can be called from the callback?

Please don't hesitate to point out flaw of my proposition, if you
noticed something significant we have never noticed.
However, at this moment, it does not seems to me your concern is
something significant.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#15)

On Thu, Nov 12, 2015 at 12:54 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Really? I think there would be not a little burden on an FDW author; when
postgres_fdw delegates to the subplan to the remote server, for example, it
would need to create a remote join query by looking at tuples possibly
fetched and stored in estate->es_epqTuple[], send the query and receive the
result during the callback routine. Furthermore, what I'm most concerned
about is that wouldn't be efficient. So, my question about that approach is
whether FDWs really do some thing like that during the callback routine,
instead of performing a secondary join plan locally. As I said before, I
know that KaiGai-san considers that that approach would be useful for custom
joins. But I see zero evidence that there is a good use-case for an FDW.

It could do that. But it could also just invoke a subplan as you are
proposing. Or at least, I think we should set it up so that such a
thing is possible. In which case I don't see the problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#9)

On Sun, Nov 8, 2015 at 7:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is an adjusted version of the previous one.
Even though it co-exists a new callback and fdw_recheck_quals,
the callback is kicked first as follows.

This seems excessive to me: why would we need an arbitrary-length list
of plans for an FDW? I think we should just allow an outer child and
an inner child, which is probably one more than we'll ever need in
practice.

This looks like an independent bug fix:

+               fscan->fdw_recheck_quals = (List *)
+                       fix_upper_expr(root,
+                                                  (Node *)
fscan->fdw_recheck_quals,
+                                                  itlist,
+                                                  INDEX_VAR,
+                                                  rtoffset);
                pfree(itlist);

If so, it should be committed separately and back-patched to 9.5.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#26)

On Sun, Nov 8, 2015 at 7:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is an adjusted version of the previous one.
Even though it co-exists a new callback and fdw_recheck_quals,
the callback is kicked first as follows.

This seems excessive to me: why would we need an arbitrary-length list
of plans for an FDW? I think we should just allow an outer child and
an inner child, which is probably one more than we'll ever need in
practice.

It just intends to keep code symmetry with custom-scan case, so not
a significant reason.
And, I expected ForeignScan will also need multiple sub-plans soon
to support more intelligent push-down like:
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F47DA@BPXM15GP.gisp.nec.co.jp

It is a separate discussion, of course, so I don't have strong preference
here.

This looks like an independent bug fix:

+               fscan->fdw_recheck_quals = (List *)
+                       fix_upper_expr(root,
+                                                  (Node *)
fscan->fdw_recheck_quals,
+                                                  itlist,
+                                                  INDEX_VAR,
+                                                  rtoffset);
pfree(itlist);

If so, it should be committed separately and back-patched to 9.5.

OK, I'll split the patch into two.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kouhei Kaigai (#27)
1 attachment(s)

On Sun, Nov 8, 2015 at 7:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is an adjusted version of the previous one.
Even though it co-exists a new callback and fdw_recheck_quals,
the callback is kicked first as follows.

This seems excessive to me: why would we need an arbitrary-length list
of plans for an FDW? I think we should just allow an outer child and
an inner child, which is probably one more than we'll ever need in
practice.

It just intends to keep code symmetry with custom-scan case, so not
a significant reason.
And, I expected ForeignScan will also need multiple sub-plans soon
to support more intelligent push-down like:
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F47D
A@BPXM15GP.gisp.nec.co.jp

It is a separate discussion, of course, so I don't have strong preference
here.

This looks like an independent bug fix:

+               fscan->fdw_recheck_quals = (List *)
+                       fix_upper_expr(root,
+                                                  (Node *)
fscan->fdw_recheck_quals,
+                                                  itlist,
+                                                  INDEX_VAR,
+                                                  rtoffset);
pfree(itlist);

If so, it should be committed separately and back-patched to 9.5.

OK, I'll split the patch into two.

The attached patch is the portion cut from the previous EPQ recheck
patch.

Regarding of the fdw_plans or fdw_plan, I'll follow your suggestion.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-bugfix-fdw_recheck_quals-on-setrefs.patchapplication/octet-stream; name=pgsql-bugfix-fdw_recheck_quals-on-setrefs.patchDownload
 src/backend/optimizer/plan/setrefs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e1e1d7a..77a694a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1120,6 +1120,12 @@ set_foreignscan_references(PlannerInfo *root,
 						   itlist,
 						   INDEX_VAR,
 						   rtoffset);
+		fscan->fdw_recheck_quals = (List *)
+			fix_upper_expr(root,
+						   (Node *) fscan->fdw_recheck_quals,
+						   itlist,
+						   INDEX_VAR,
+						   rtoffset);
 		pfree(itlist);
 		/* fdw_scan_tlist itself just needs fix_scan_list() adjustments */
 		fscan->fdw_scan_tlist =
#29Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#25)

On 2015/11/18 3:19, Robert Haas wrote:

On Thu, Nov 12, 2015 at 12:54 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Really? I think there would be not a little burden on an FDW author; when
postgres_fdw delegates to the subplan to the remote server, for example, it
would need to create a remote join query by looking at tuples possibly
fetched and stored in estate->es_epqTuple[], send the query and receive the
result during the callback routine. Furthermore, what I'm most concerned
about is that wouldn't be efficient. So, my question about that approach is
whether FDWs really do some thing like that during the callback routine,
instead of performing a secondary join plan locally. As I said before, I
know that KaiGai-san considers that that approach would be useful for custom
joins. But I see zero evidence that there is a good use-case for an FDW.

It could do that. But it could also just invoke a subplan as you are
proposing. Or at least, I think we should set it up so that such a
thing is possible. In which case I don't see the problem.

I suppose you (and KaiGai-san) are probably right, but I really fail to
see it actually doing that.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#27)

On Tue, Nov 17, 2015 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It just intends to keep code symmetry with custom-scan case, so not
a significant reason.
And, I expected ForeignScan will also need multiple sub-plans soon
to support more intelligent push-down like:
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F47DA@BPXM15GP.gisp.nec.co.jp

I might be missing something, but why would that require multiple child plans?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#30)

On Tue, Nov 17, 2015 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

It just intends to keep code symmetry with custom-scan case, so not
a significant reason.
And, I expected ForeignScan will also need multiple sub-plans soon
to support more intelligent push-down like:

/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F47D
A@BPXM15GP.gisp.nec.co.jp

I might be missing something, but why would that require multiple child plans?

Apart from EPQ rechecks, the above aggressive push-down idea allows to send
contents of multiple relations to the remote side. In this case, ForeignScan
needs to have multiple sub-plans.

For example, please assume here is three relations; tbl_A and tbl_B are
local and small, tbl_F is remote and large.
In case when both of (tbl_A JOIN tbl_F) and (tbl_B JOIN tbl_F) produces
large number of rows thus consumes deserved amount of network traffic but
(tbl_A JOIN tbl_B JOIN tbl_F) produce small number of rows, the optimal
strategy is to send local contents to the remote side once then run
a remote query here to produce relatively smaller rows.
In the implementation level, ForeignScan shall represent (tbl_A JOIN tbl_B
JOIN tbl_F), then it returns a bunch of joined tuples. Its remote query
contains VALUES(...) clauses to pack contents of the tbl_A and tbl_B, thus,
it needs to be capable to execute underlying multiple scan plans and fetch
tuples prior to remote query execution.
So, ForeignScan may also have multiple sub-plans.

Of course, it is an independent feature from the EPQ rechecks.
It is not a matter even if we will extend this field later.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#28)

On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is the portion cut from the previous EPQ recheck
patch.

Thanks, committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#29)

On Tue, Nov 17, 2015 at 9:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I suppose you (and KaiGai-san) are probably right, but I really fail to see
it actually doing that.

Noted, but let's do it that way and move on. It would be a shame if
we didn't end up with a working FDW join pushdown system in 9.6
because of a disagreement on this point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#31)

On Tue, Nov 17, 2015 at 10:22 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Apart from EPQ rechecks, the above aggressive push-down idea allows to send
contents of multiple relations to the remote side. In this case, ForeignScan
needs to have multiple sub-plans.

For example, please assume here is three relations; tbl_A and tbl_B are
local and small, tbl_F is remote and large.
In case when both of (tbl_A JOIN tbl_F) and (tbl_B JOIN tbl_F) produces
large number of rows thus consumes deserved amount of network traffic but
(tbl_A JOIN tbl_B JOIN tbl_F) produce small number of rows, the optimal
strategy is to send local contents to the remote side once then run
a remote query here to produce relatively smaller rows.
In the implementation level, ForeignScan shall represent (tbl_A JOIN tbl_B
JOIN tbl_F), then it returns a bunch of joined tuples. Its remote query
contains VALUES(...) clauses to pack contents of the tbl_A and tbl_B, thus,
it needs to be capable to execute underlying multiple scan plans and fetch
tuples prior to remote query execution.

Hmm, maybe. I'm not entirely sure multiple subplans is the best way
to implement that, but let's argue about that another day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#33)

On 2015/11/19 12:34, Robert Haas wrote:

On Tue, Nov 17, 2015 at 9:30 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I suppose you (and KaiGai-san) are probably right, but I really fail to see
it actually doing that.

Noted, but let's do it that way and move on. It would be a shame if
we didn't end up with a working FDW join pushdown system in 9.6
because of a disagreement on this point.

Another idea would be to consider join pushdown as unsupported for now
when select-for-update is involved in 9.5, as described in [1]https://wiki.postgresql.org/wiki/Open_Items, and
revisit this issue when adding join pushdown to postgres_fdw in 9.6.

Best regards,
Etsuro Fujita

[1]: https://wiki.postgresql.org/wiki/Open_Items

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#35)

On Tue, Nov 17, 2015 at 10:22 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Apart from EPQ rechecks, the above aggressive push-down idea allows to send
contents of multiple relations to the remote side. In this case, ForeignScan
needs to have multiple sub-plans.

For example, please assume here is three relations; tbl_A and tbl_B are
local and small, tbl_F is remote and large.
In case when both of (tbl_A JOIN tbl_F) and (tbl_B JOIN tbl_F) produces
large number of rows thus consumes deserved amount of network traffic but
(tbl_A JOIN tbl_B JOIN tbl_F) produce small number of rows, the optimal
strategy is to send local contents to the remote side once then run
a remote query here to produce relatively smaller rows.
In the implementation level, ForeignScan shall represent (tbl_A JOIN tbl_B
JOIN tbl_F), then it returns a bunch of joined tuples. Its remote query
contains VALUES(...) clauses to pack contents of the tbl_A and tbl_B, thus,
it needs to be capable to execute underlying multiple scan plans and fetch
tuples prior to remote query execution.

Hmm, maybe. I'm not entirely sure multiple subplans is the best way
to implement that, but let's argue about that another day.

So, are you suggesting to make a patch that allows ForeignScan to have
multiple sub-plans right now? Or, one sub-plan?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#36)

On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, are you suggesting to make a patch that allows ForeignScan to have
multiple sub-plans right now? Or, one sub-plan?

Two:

/messages/by-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ-14yKfkgKREAQ@mail.gmail.com

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#35)

On Wed, Nov 18, 2015 at 10:54 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Noted, but let's do it that way and move on. It would be a shame if
we didn't end up with a working FDW join pushdown system in 9.6
because of a disagreement on this point.

Another idea would be to consider join pushdown as unsupported for now when
select-for-update is involved in 9.5, as described in [1], and revisit this
issue when adding join pushdown to postgres_fdw in 9.6.

Well, I think it's probably too late to squeeze this into 9.5 at this
point, but I'm eager to get it fixed for 9.6.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#38)

On 2015/11/20 6:57, Robert Haas wrote:

On Wed, Nov 18, 2015 at 10:54 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Noted, but let's do it that way and move on. It would be a shame if
we didn't end up with a working FDW join pushdown system in 9.6
because of a disagreement on this point.

Another idea would be to consider join pushdown as unsupported for now when
select-for-update is involved in 9.5, as described in [1], and revisit this
issue when adding join pushdown to postgres_fdw in 9.6.

Well, I think it's probably too late to squeeze this into 9.5 at this
point, but I'm eager to get it fixed for 9.6.

OK, I'll update the postgres_fdw-join-pushdown patch so as to work with
that callback routine, if needed.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#37)

On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, are you suggesting to make a patch that allows ForeignScan to have
multiple sub-plans right now? Or, one sub-plan?

Two:

/messages/by-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ-
14yKfkgKREAQ@mail.gmail.com

Hmm. Two is a bit mysterious for me because two sub-plans (likely)
means this ForeignScan node checks join clauses and reconstruct
a joined tuple by itself but does not check scan clauses pushed-
down (it is job of inner/outer scan plan, isn't it?).
In this case, how do we treat N-way remote join cases (N>2) if we
assume such a capability in FDW driver?

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.
However, I cannot explain two subplans, but not multiple, well.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#32)

On 2015/11/19 12:32, Robert Haas wrote:

On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is the portion cut from the previous EPQ recheck
patch.

Thanks, committed.

Thanks, Robert and KaiGai-san.

Sorry, I'm a bit late to the party. Here are my questions:

* This patch means we can define fdw_recheck_quals even for the case of
foreign tables with non-NIL fdw_scan_tlist. However, we discussed in
another thread [1]/messages/by-id/55AF3C08.1070409@lab.ntt.co.jp that such foreign tables might break EvalPlanQual
tests. Where are we on that issue?

* For the case of foreign joins, I think fdw_recheck_quals can be
defined for example, the same way as for the case of foreign tables, ie,
quals not in scan.plan.qual, or ones defined as "otherclauses"
(rinfo->is_pushed_down=true) pushed down to the remote. But since it's
required that the FDW has to add to the fdw_scan_tlist the set of
columns needed to check quals in fdw_recheck_quals in preparation for
EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being
long, leading to an increase in a total data transfer amount from the
remote. So, that seems not practical to me. Maybe I'm missing
something, but what use cases are you thinking?

Best regards,
Etsuro Fujita

[1]: /messages/by-id/55AF3C08.1070409@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#41)

On 2015/11/19 12:32, Robert Haas wrote:

On Tue, Nov 17, 2015 at 8:47 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch is the portion cut from the previous EPQ recheck
patch.

Thanks, committed.

Thanks, Robert and KaiGai-san.

Sorry, I'm a bit late to the party. Here are my questions:

* This patch means we can define fdw_recheck_quals even for the case of
foreign tables with non-NIL fdw_scan_tlist. However, we discussed in
another thread [1] that such foreign tables might break EvalPlanQual
tests. Where are we on that issue?

In case of later locking, RefetchForeignRow() will set a base tuple
that have compatible layout of the base relation, not fdw_scan_tlist,
because RefetchForeignRow() does not have information about scan node.
Here is two solutions. 1) You should not use fdw_scan_tlist for the
FDW that uses late locking mechanism. 2) recheck callback applies
projection to fit fdw_scan_tlist (that is not difficult to provide
as a utility function by the core).

Even though we allow to set up fdw_scan_tlist on simple scan cases,
it does not mean it works for any cases.

* For the case of foreign joins, I think fdw_recheck_quals can be
defined for example, the same way as for the case of foreign tables, ie,
quals not in scan.plan.qual, or ones defined as "otherclauses"
(rinfo->is_pushed_down=true) pushed down to the remote. But since it's
required that the FDW has to add to the fdw_scan_tlist the set of
columns needed to check quals in fdw_recheck_quals in preparation for
EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being
long, leading to an increase in a total data transfer amount from the
remote. So, that seems not practical to me. Maybe I'm missing
something, but what use cases are you thinking?

It is trade-off. What solution do you think we can have?
To avoid data transfer used for EPQ recheck only, we can implement
FDW driver to issue remote join again on EPQ recheck, however, it
is not a wise design, isn't it?

If we would be able to have no extra data transfer and no remote
join execution during EPQ recheck, it is a perfect.
However, we have to take both advantage and disadvantage when
we determine an implementation. We usually choose a way that
has more advantage than disadvantage, but it does not mean no
disadvantage.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#40)

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, are you suggesting to make a patch that allows ForeignScan to have
multiple sub-plans right now? Or, one sub-plan?

Two:

/messages/by-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ-
14yKfkgKREAQ@mail.gmail.com

Hmm. Two is a bit mysterious for me because two sub-plans (likely)
means this ForeignScan node checks join clauses and reconstruct
a joined tuple by itself but does not check scan clauses pushed-
down (it is job of inner/outer scan plan, isn't it?).
In this case, how do we treat N-way remote join cases (N>2) if we
assume such a capability in FDW driver?

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.
However, I cannot explain two subplans, but not multiple, well.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#43)

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

On Thu, Nov 19, 2015 at 6:39 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, are you suggesting to make a patch that allows ForeignScan to have
multiple sub-plans right now? Or, one sub-plan?

Two:

/messages/by-id/CA+TgmoYZeje+ot1kX4wdoB7R7DPS0CWXAzfqZ-

14yKfkgKREAQ@mail.gmail.com

Hmm. Two is a bit mysterious for me because two sub-plans (likely)
means this ForeignScan node checks join clauses and reconstruct
a joined tuple by itself but does not check scan clauses pushed-
down (it is job of inner/outer scan plan, isn't it?).
In this case, how do we treat N-way remote join cases (N>2) if we
assume such a capability in FDW driver?

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.
However, I cannot explain two subplans, but not multiple, well.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now.

I'd like to agree the last sentence. Having one sub-plan is better
(but the second best from my standpoint) than fixed two subplans,
because ...

I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

Yep, if we would have a special ForeignJoinPath to handle two foreign-
tables join, it will be natural. However, our choice allows N-way join
at once if sub-plan is consists of three or more foreign-tables.
In this case, ForeignScan (scanrelid==0) can represents a sub-plan that
shall be equivalent to a stack of joins; that looks like a ForeignScan
has inner, outer and variable number of "middler" input streams.

If and when we assume ForeignScan has own join mechanism but processes
scan-qualifiers by local sub-plans, fixed-number sub-plans are not
sufficient. (Probably, it is minority case although.)

I'm inclined to put just one outer path at this moment, because the
purpose of the FDW sub-plans is EPQ recheck right now. So, we will
be able to enhance the feature when we implement other stuffs - more
aggressive join push-down for example.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#43)

On 2015/11/24 2:41, Robert Haas wrote:

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

I'd vote for only allowing an outer subplan.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#42)

On 2015/11/20 22:45, Kouhei Kaigai wrote:
I wrote:

* This patch means we can define fdw_recheck_quals even for the case of
foreign tables with non-NIL fdw_scan_tlist. However, we discussed in
another thread [1] that such foreign tables might break EvalPlanQual
tests. Where are we on that issue?

In case of later locking, RefetchForeignRow() will set a base tuple
that have compatible layout of the base relation, not fdw_scan_tlist,
because RefetchForeignRow() does not have information about scan node.

IIUC, I think the base tuple would be stored into EPQ state not only in
case of late row locking but in case of early row locking.

* For the case of foreign joins, I think fdw_recheck_quals can be
defined for example, the same way as for the case of foreign tables, ie,
quals not in scan.plan.qual, or ones defined as "otherclauses"
(rinfo->is_pushed_down=true) pushed down to the remote. But since it's
required that the FDW has to add to the fdw_scan_tlist the set of
columns needed to check quals in fdw_recheck_quals in preparation for
EvalPlanQual tests, it's likely that fdw_scan_tlist will end up being
long, leading to an increase in a total data transfer amount from the
remote. So, that seems not practical to me. Maybe I'm missing
something, but what use cases are you thinking?

It is trade-off. What solution do you think we can have?
To avoid data transfer used for EPQ recheck only, we can implement
FDW driver to issue remote join again on EPQ recheck, however, it
is not a wise design, isn't it?

If we would be able to have no extra data transfer and no remote
join execution during EPQ recheck, it is a perfect.

I was thinking that in an approach using a local join execution plan, I
would just set fdw_recheck_quals set to NIL and evaluate the
otherclauses as part of the local join execution plan, so that
fdw_scan_tlist won't end up being longer, as in the patch [1]/messages/by-id/5624D583.10202@lab.ntt.co.jp. (Note
that in that patch, remote_exprs==NIL when calling make_foreignscan
during postgresGetForeignPlan in case of foreign joins.)

Best regards,
Etsuro Fujita

[1]: /messages/by-id/5624D583.10202@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#9)

On 2015/11/09 9:26, Kouhei Kaigai wrote:

The attached patch is an adjusted version of the previous one.

There seems to be no changes to make_foreignscan. Is that OK?

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#45)
1 attachment(s)

-----Original Message-----
From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
Sent: Tuesday, November 24, 2015 12:45 PM
To: Robert Haas; Kaigai Kouhei(海外 浩平)
Cc: Tom Lane; Kyotaro HORIGUCHI; pgsql-hackers@postgresql.org; Shigeru Hanada
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On 2015/11/24 2:41, Robert Haas wrote:

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

I'd vote for only allowing an outer subplan.

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.
We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively
small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.
It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-fdw-epq-recheck.v4.patchapplication/octet-stream; name=pgsql-fdw-epq-recheck.v4.patchDownload
 doc/src/sgml/fdwhandler.sgml            | 26 ++++++++++++++++++++++++-
 src/backend/executor/execScan.c         | 34 +++++++++++++++++++++++++++++----
 src/backend/executor/nodeForeignscan.c  | 13 +++++++++++++
 src/backend/nodes/outfuncs.c            |  1 +
 src/backend/optimizer/plan/createplan.c | 14 +++++++++++---
 src/backend/optimizer/util/pathnode.c   |  2 ++
 src/include/foreign/fdwapi.h            |  7 ++++++-
 src/include/nodes/relation.h            |  1 +
 src/include/optimizer/pathnode.h        |  1 +
 src/include/optimizer/planmain.h        |  3 ++-
 10 files changed, 92 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 1533a6b..7113862 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -168,7 +168,8 @@ GetForeignPlan (PlannerInfo *root,
                 Oid foreigntableid,
                 ForeignPath *best_path,
                 List *tlist,
-                List *scan_clauses);
+                List *scan_clauses,
+                Plan *fdw_outerplan)
 </programlisting>
 
      Create a <structname>ForeignScan</> plan node from the selected foreign
@@ -259,6 +260,29 @@ IterateForeignScan (ForeignScanState *node);
 
     <para>
 <programlisting>
+bool
+RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
+</programlisting>
+     Rechecks visibility of the EPQ tuples according to the qualifiers
+     pushed-down.
+     This callback is optional, if this <structname>ForeignScanState</>
+     runs on a base foreign table. <structfield>fdw_recheck_quals</>
+     can be used instead to recheck on the target EPQ tuple by the backend.
+    </para>
+    <para>
+     On the other hands, if <literal>scanrelid</> equals zero thus it
+     represents a join sub-tree of foreign tables, this callback is
+     expected to reconstruct a joined tuple using the primitive EPQ
+     tuples and fill up the supplied <literal>slot</> according to
+     the <structfield>fdw_scan_tlist</> definition.
+     Also, this callback can or must recheck scan qualifiers and join
+     conditions which are pushed down. Especially, it needs special
+     handling if not simple inner-join, instead of the backend support
+     by <structfield>fdw_recheck_quals</>.
+    </para>
+
+    <para>
+<programlisting>
 void
 ReScanForeignScan (ForeignScanState *node);
 </programlisting>
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index a96e826..b472bf7 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -49,8 +49,16 @@ ExecScanFetch(ScanState *node,
 		 */
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
-		if (estate->es_epqTupleSet[scanrelid - 1])
+		if (scanrelid == 0)
+		{
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/* Check if it meets the access-method conditions */
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (estate->es_epqTupleSet[scanrelid - 1])
 		{
 			TupleTableSlot *slot = node->ss_ScanTupleSlot;
 
@@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
 	{
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
+
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *) node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *) node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int)nodeTag(node->ps.plan));
 
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
 	}
 }
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 6165e4a..1344c32 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -73,6 +73,7 @@ ForeignNext(ForeignScanState *node)
 static bool
 ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 {
+	FdwRoutine	*fdwroutine = node->fdwroutine;
 	ExprContext *econtext;
 
 	/*
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 
 	ResetExprContext(econtext);
 
+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}
 	return ExecQual(node->fdw_recheck_quals, econtext, false);
 }
 
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 012c14b..aff27ea 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1683,6 +1683,7 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 
 	_outPathInfo(str, (const Path *) node);
 
+	WRITE_NODE_FIELD(fdw_outerpath);
 	WRITE_NODE_FIELD(fdw_private);
 }
 
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 411b36c..51bdd8f 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2095,11 +2095,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	Index		scan_relid = rel->relid;
 	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
+	Plan	   *fdw_outerplan = NULL;
 	ListCell   *lc;
 	int			i;
 
 	Assert(rel->fdwroutine != NULL);
 
+	/* transform the child path if any */
+	if (best_path->fdw_outerpath)
+		fdw_outerplan = create_plan_recurse(root, best_path->fdw_outerpath);
+
 	/*
 	 * If we're scanning a base relation, fetch its OID.  (Irrelevant if
 	 * scanning a join relation.)
@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 */
 	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
-												tlist, scan_clauses);
+												tlist,
+												scan_clauses,
+												fdw_outerplan);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
@@ -3747,7 +3754,8 @@ make_foreignscan(List *qptlist,
 				 List *fdw_exprs,
 				 List *fdw_private,
 				 List *fdw_scan_tlist,
-				 List *fdw_recheck_quals)
+				 List *fdw_recheck_quals,
+				 Plan *fdw_outerplan)
 {
 	ForeignScan *node = makeNode(ForeignScan);
 	Plan	   *plan = &node->scan.plan;
@@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist,
 	/* cost will be filled in by create_foreignscan_plan */
 	plan->targetlist = qptlist;
 	plan->qual = qpqual;
-	plan->lefttree = NULL;
+	plan->lefttree = fdw_outerplan;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	/* fs_server will be filled in by create_foreignscan_plan */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 09c3244..ec0910d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1507,6 +1507,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private)
 {
 	ForeignPath *pathnode = makeNode(ForeignPath);
@@ -1521,6 +1522,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
+	pathnode->fdw_outerpath = fdw_outerpath;
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 69b48b4..aa0fc18 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -36,13 +36,17 @@ typedef ForeignScan *(*GetForeignPlan_function) (PlannerInfo *root,
 														  Oid foreigntableid,
 													  ForeignPath *best_path,
 															 List *tlist,
-														 List *scan_clauses);
+												 List *scan_clauses,
+												 Plan *fdw_outerplan);
 
 typedef void (*BeginForeignScan_function) (ForeignScanState *node,
 													   int eflags);
 
 typedef TupleTableSlot *(*IterateForeignScan_function) (ForeignScanState *node);
 
+typedef bool (*RecheckForeignScan_function) (ForeignScanState *node,
+											 TupleTableSlot *slot);
+
 typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
 
 typedef void (*EndForeignScan_function) (ForeignScanState *node);
@@ -138,6 +142,7 @@ typedef struct FdwRoutine
 	GetForeignPlan_function GetForeignPlan;
 	BeginForeignScan_function BeginForeignScan;
 	IterateForeignScan_function IterateForeignScan;
+	RecheckForeignScan_function RecheckForeignScan;
 	ReScanForeignScan_function ReScanForeignScan;
 	EndForeignScan_function EndForeignScan;
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9a0dd28..b072e1e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -909,6 +909,7 @@ typedef struct TidPath
 typedef struct ForeignPath
 {
 	Path		path;
+	Path	   *fdw_outerpath;
 	List	   *fdw_private;
 } ForeignPath;
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f28b4e2..35e17e7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -86,6 +86,7 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 1fb8504..dc12841 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -45,7 +45,8 @@ extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
 				 Index scanrelid, List *fdw_exprs, List *fdw_private,
-				 List *fdw_scan_tlist, List *fdw_recheck_quals);
+				 List *fdw_scan_tlist, List *fdw_recheck_quals,
+				 Plan *fdw_outerplan);
 extern Append *make_append(List *appendplans, List *tlist);
 extern RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree, Plan *righttree, int wtParam,
#49Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#48)

Hello,

At Thu, 26 Nov 2015 05:04:32 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F801176205@BPXM15GP.gisp.nec.co.jp>

On 2015/11/24 2:41, Robert Haas wrote:

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

I'd vote for only allowing an outer subplan.

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.

It is named "outerpath/plan". Surely we used the term 'outer' in
association with other nodes for disign decision but is it valid
to call it outer? Addition to that, there's no innerpath in this
patch and have "path" instead.

After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.
We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively

Plan->outerPlan => Plan->lefttree?

small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.

Perhaps we need a comment about foreignscan as a fake join for
the case with scanrelid == 0 in ExecScanReScan.

It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

In ForeignRecheck, ExecQual on fdw_recheck_quals is executed if
RecheckForeignScan returns true, which I think the signal that
the returned tuple matches the recheck qual. Whether do you think
have the responsibility to check the reconstructed tuple when
RecheckCoreignScan is registered, RecheckForeignScan or ExecQual?

The documentation says as following so I think the former has.

# I don't understhad what 'can or must' means, though... 'can and
# must'?

+     Also, this callback can or must recheck scan qualifiers and join
+     conditions which are pushed down. Especially, it needs special

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

regardes,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#48)

On 2015/11/26 14:04, Kouhei Kaigai wrote:

On 2015/11/24 2:41, Robert Haas wrote:

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

I'd vote for only allowing an outer subplan.

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but
actually, that seems a bit complicated to me. Instead, could we keep
the fdw_outerpath in the fdw_private of a ForeignPath node when creating
the path node during GetForeignPaths, and then create an outerplan
accordingly from the fdw_outerpath stored into the fdw_private during
GetForeignPlan, by using create_plan_recurse there? I think that that
would make the core involvment much simpler.

We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively
small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Another idea would be to add the core support for
initializing/closing/rescanning the outerplan tree when the tree is given.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.
It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}

Maybe I'm missing something, but I think we should let FDW do the work
if scanrelid==0, not just if fdwroutine->RecheckForeignScan is given.
(And if scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we
should abort the transaction.)

Another thing I'm concerned about is

@@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
{
Index scanrelid = ((Scan *) node->ps.plan)->scanrelid;

-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
+
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *) node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *) node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int)nodeTag(node->ps.plan));
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
  	}

That seems the outerplan's business to me, so I think it'd be better to
just return, right before the assertion, as I said before. Seen from
another angle, ISTM that FDWs that don't use a local join execution plan
wouldn't need to be aware of handling the es_epqScanDone flags. (Do you
think that such FDWs should do something like what ExecScanFtch is doing
about the flags, in their RecheckForeignScans? If so, I think we need
docs for that.)

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

OK

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

Will do.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kyotaro HORIGUCHI (#49)

At Thu, 26 Nov 2015 05:04:32 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote
in <9A28C8860F777E439AA12E8AEA7694F801176205@BPXM15GP.gisp.nec.co.jp>

On 2015/11/24 2:41, Robert Haas wrote:

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

I'd vote for only allowing an outer subplan.

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.

It is named "outerpath/plan". Surely we used the term 'outer' in
association with other nodes for disign decision but is it valid
to call it outer? Addition to that, there's no innerpath in this
patch and have "path" instead.

Just "path" is too simple, not to inform people the expected usage
of the node.
If we would assign another name, my preference is "fdw_subpath" or
"fdw_altpath".

After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.
We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively

Plan->outerPlan => Plan->lefttree?

Yes, s/outerPlan/lefttree/g

small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.

Perhaps we need a comment about foreignscan as a fake join for
the case with scanrelid == 0 in ExecScanReScan.

Indeed,

It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

In ForeignRecheck, ExecQual on fdw_recheck_quals is executed if
RecheckForeignScan returns true, which I think the signal that
the returned tuple matches the recheck qual. Whether do you think
have the responsibility to check the reconstructed tuple when
RecheckCoreignScan is registered, RecheckForeignScan or ExecQual?

Only RecheckForeignScan can reconstruct a joined tuple. On the other
hands, both of facility can recheck scan-qualifiers and join-clauses.
FDW author can choose its design according to his preference.
If fdw_recheck_quals==NIL, FDW can apply all the rechecks within
RecheckForeignScan callback.

The documentation says as following so I think the former has.

# I don't understhad what 'can or must' means, though... 'can and
# must'?

+     Also, this callback can or must recheck scan qualifiers and join
+     conditions which are pushed down. Especially, it needs special

If fdw_recheck_quals is set up correctly and join type is inner join,
FDW driver does not recheck by itself. Elsewhere, it has to recheck
the joined tuple, not only reconstruction.
I try to revise the SGML stuff.

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

regardes,

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#50)

On 2015/11/26 14:04, Kouhei Kaigai wrote:

On 2015/11/24 2:41, Robert Haas wrote:

On Fri, Nov 20, 2015 at 12:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

One subplan means FDW driver run an entire join sub-tree with local
alternative sub-plan; that is my expectation for the majority case.

What I'm imagining is that we'd add handling that allows the
ForeignScan to have inner and outer children. If the FDW wants to
delegate the EvalPlanQual handling to a local plan, it can use the
outer child for that. Or the inner one, if it likes. The other one
is available for some other purposes which we can't imagine yet. If
this is too weird, we can only add handling for an outer subplan and
forget about having an inner subplan for now. I just thought to make
it symmetric, since outer and inner subplans are pretty deeply baked
into the structure of the system.

I'd vote for only allowing an outer subplan.

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but
actually, that seems a bit complicated to me. Instead, could we keep
the fdw_outerpath in the fdw_private of a ForeignPath node when creating
the path node during GetForeignPaths, and then create an outerplan
accordingly from the fdw_outerpath stored into the fdw_private during
GetForeignPlan, by using create_plan_recurse there? I think that that
would make the core involvment much simpler.

How to use create_plan_recurse by extension? It is a static function.

We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively
small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Another idea would be to add the core support for
initializing/closing/rescanning the outerplan tree when the tree is given.

No. Please don't repeat same discussion again.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.
It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}

Maybe I'm missing something, but I think we should let FDW do the work
if scanrelid==0, not just if fdwroutine->RecheckForeignScan is given.
(And if scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we
should abort the transaction.)

It should be Assert(). The node with scanrelid==0 never happen
unless FDW driver does not add such a path explicitly.

Another thing I'm concerned about is

@@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
{
Index scanrelid = ((Scan *)
node->ps.plan)->scanrelid;

-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
+
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *)
node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *)
node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int)nodeTag(node->ps.plan));
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			while ((rtindex = bms_next_member(relids, rtindex)) >=
0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
}

That seems the outerplan's business to me, so I think it'd be better to
just return, right before the assertion, as I said before. Seen from
another angle, ISTM that FDWs that don't use a local join execution plan
wouldn't need to be aware of handling the es_epqScanDone flags. (Do you
think that such FDWs should do something like what ExecScanFtch is doing
about the flags, in their RecheckForeignScans? If so, I think we need
docs for that.)

Execution of alternative local subplan (outerplan) is discretional.
We have to pay attention FDW drivers which handles EPQ recheck by
itself. Even though you argue callback can violate state of
es_epqScanDone flags, it is safe to follow the existing behavior.

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

OK

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

Will do.

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#51)

On 2015/11/27 0:14, Kouhei Kaigai wrote:

The documentation says as following so I think the former has.

# I don't understhad what 'can or must' means, though... 'can and
# must'?

+     Also, this callback can or must recheck scan qualifiers and join
+     conditions which are pushed down. Especially, it needs special

If fdw_recheck_quals is set up correctly and join type is inner join,
FDW driver does not recheck by itself. Elsewhere, it has to recheck
the joined tuple, not only reconstruction.

Sorry, I don't understand this. In my understanding, fdw_recheck_quals
can be defined for a foreign join, regardless of the join type, and when
the fdw_recheck_quals are defined, the RecheckForeignScan callback
routine doesn't need to evaluate the fdw_recheck_quals by itself. No?

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#53)

-----Original Message-----
From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp]
Sent: Friday, November 27, 2015 2:40 PM
To: Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI
Cc: robertmhaas@gmail.com; tgl@sss.pgh.pa.us; pgsql-hackers@postgresql.org;
shigeru.hanada@gmail.com
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On 2015/11/27 0:14, Kouhei Kaigai wrote:

The documentation says as following so I think the former has.

# I don't understhad what 'can or must' means, though... 'can and
# must'?

+     Also, this callback can or must recheck scan qualifiers and join
+     conditions which are pushed down. Especially, it needs special

If fdw_recheck_quals is set up correctly and join type is inner join,
FDW driver does not recheck by itself. Elsewhere, it has to recheck
the joined tuple, not only reconstruction.

Sorry, I don't understand this. In my understanding, fdw_recheck_quals
can be defined for a foreign join, regardless of the join type,

Yes, "can be defined", but will not be workable if either side of
joined tuple is NULL because of outer join. SQL functions returns
NULL prior to evaluation, then ExecQual() treats this result as FALSE.
However, a joined tuple that has NULL fields may be a valid tuple.

We don't need to care about unmatched tuple if INNER JOIN.

and when
the fdw_recheck_quals are defined, the RecheckForeignScan callback
routine doesn't need to evaluate the fdw_recheck_quals by itself. No?

Yes, it does not need to run fdw_recheck_quals by itself (if they
can guarantee correct results for any corner cases).
Of course, if FDW driver keep expression for scan-qualifiers and
join-clauses on another place (like fdw_exprs), it is FDW driver's
responsibility to execute it, regardless of fdw_recheck_quals.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#52)

On 2015/11/27 0:14, Kouhei Kaigai wrote:

On 2015/11/26 14:04, Kouhei Kaigai wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but
actually, that seems a bit complicated to me. Instead, could we keep
the fdw_outerpath in the fdw_private of a ForeignPath node when creating
the path node during GetForeignPaths, and then create an outerplan
accordingly from the fdw_outerpath stored into the fdw_private during
GetForeignPlan, by using create_plan_recurse there? I think that that
would make the core involvment much simpler.

How to use create_plan_recurse by extension? It is a static function.

I was just thinking a change to make that function extern.

We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively
small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Another idea would be to add the core support for
initializing/closing/rescanning the outerplan tree when the tree is given.

No. Please don't repeat same discussion again.

IIUC, I think your point is to allow FDWs to do something else, instead
of performing a local join execution plan, during RecheckForeignScan.
So, what's wrong with the core doing that support in that case?

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+	/*
+	 * FDW driver has to recheck visibility of EPQ tuple towards
+	 * the scan qualifiers once it gets pushed down.
+	 * In addition, if this node represents a join sub-tree, not
+	 * a scan, FDW driver is also responsible to reconstruct
+	 * a joined tuple according to the primitive EPQ tuples.
+	 */
+	if (fdwroutine->RecheckForeignScan)
+	{
+		if (!fdwroutine->RecheckForeignScan(node, slot))
+			return false;
+	}

Maybe I'm missing something, but I think we should let FDW do the work
if scanrelid==0, not just if fdwroutine->RecheckForeignScan is given.
(And if scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we
should abort the transaction.)

It should be Assert(). The node with scanrelid==0 never happen
unless FDW driver does not add such a path explicitly.

That's an idea. But the abort seems to me more consistent with other
places (see eg, RefetchForeignRow in EvalPlanQualFetchRowMarks).

Another thing I'm concerned about is

@@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
{
Index scanrelid = ((Scan *)
node->ps.plan)->scanrelid;

-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
+
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *)
node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *)
node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int)nodeTag(node->ps.plan));
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			while ((rtindex = bms_next_member(relids, rtindex)) >=
0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
}

That seems the outerplan's business to me, so I think it'd be better to
just return, right before the assertion, as I said before. Seen from
another angle, ISTM that FDWs that don't use a local join execution plan
wouldn't need to be aware of handling the es_epqScanDone flags. (Do you
think that such FDWs should do something like what ExecScanFtch is doing
about the flags, in their RecheckForeignScans? If so, I think we need
docs for that.)

Execution of alternative local subplan (outerplan) is discretional.
We have to pay attention FDW drivers which handles EPQ recheck by
itself. Even though you argue callback can violate state of
es_epqScanDone flags, it is safe to follow the existing behavior.

So, I think the documentation needs more work.

Yet another thing that I'm concerned about is

@@ -3747,7 +3754,8 @@ make_foreignscan(List *qptlist,
List *fdw_exprs,
List *fdw_private,
List *fdw_scan_tlist,
- List *fdw_recheck_quals)
+ List *fdw_recheck_quals,
+ Plan *fdw_outerplan)
{
ForeignScan *node = makeNode(ForeignScan);
Plan *plan = &node->scan.plan;
@@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist,
/* cost will be filled in by create_foreignscan_plan */
plan->targetlist = qptlist;
plan->qual = qpqual;
- plan->lefttree = NULL;
+ plan->lefttree = fdw_outerplan;
plan->righttree = NULL;
node->scan.scanrelid = scanrelid;

I think that that would break the EXPLAIN output. One option to avoid
that is to set the fdw_outerplan in ExecInitForeignScan as in my patch
[1]: /messages/by-id/55DEF5F0.308@lab.ntt.co.jp
that the Plan tree and the PlanState tree should be mirror images of
each other, but I think that that break would be harmless.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/55DEF5F0.308@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#50)

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but actually,
that seems a bit complicated to me. Instead, could we keep the
fdw_outerpath in the fdw_private of a ForeignPath node when creating the
path node during GetForeignPaths, and then create an outerplan accordingly
from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
using create_plan_recurse there? I think that that would make the core
involvment much simpler.

I can't see how it's going to get much simpler than this. The core
core is well under a hundred lines, and it all looks pretty
straightforward to me. All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+       /*
+        * FDW driver has to recheck visibility of EPQ tuple towards
+        * the scan qualifiers once it gets pushed down.
+        * In addition, if this node represents a join sub-tree, not
+        * a scan, FDW driver is also responsible to reconstruct
+        * a joined tuple according to the primitive EPQ tuples.
+        */
+       if (fdwroutine->RecheckForeignScan)
+       {
+               if (!fdwroutine->RecheckForeignScan(node, slot))
+                       return false;
+       }

Maybe I'm missing something, but I think we should let FDW do the work if
scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
abort the transaction.)

That would be unnecessarily restrictive. On the one hand, even if
scanrelid != 0, the FDW can decide that it prefers to do the rechecks
using RecheckForeignScan rather than fdw_recheck_quals. For most
FDWs, I expect using fdw_recheck_quals to be more convenient, but
there may be cases where somebody prefers to use RecheckForeignScan,
and allowing that costs nothing. On the flip side, an FDW could
choose to support join pushdown but not worry about EPQ rechecks: it
can just refuse to push down joins when any rowmarks are present.
Requiring the FDW author to supply a dummy RecheckForeignScan method
in that case is pointless. So I think KaiGai's check is exactly
right.

Another thing I'm concerned about is

@@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
{
Index scanrelid = ((Scan *)
node->ps.plan)->scanrelid;

-               Assert(scanrelid > 0);
+               if (scanrelid > 0)
+                       estate->es_epqScanDone[scanrelid - 1] = false;
+               else
+               {
+                       Bitmapset  *relids;
+                       int                     rtindex = -1;
+
+                       if (IsA(node->ps.plan, ForeignScan))
+                               relids = ((ForeignScan *)
node->ps.plan)->fs_relids;
+                       else if (IsA(node->ps.plan, CustomScan))
+                               relids = ((CustomScan *)
node->ps.plan)->custom_relids;
+                       else
+                               elog(ERROR, "unexpected scan node: %d",
+                                        (int)nodeTag(node->ps.plan));
-               estate->es_epqScanDone[scanrelid - 1] = false;
+                       while ((rtindex = bms_next_member(relids, rtindex))

= 0)

+                       {
+                               Assert(rtindex > 0);
+                               estate->es_epqScanDone[rtindex - 1] = false;
+                       }
+               }
}

That seems the outerplan's business to me, so I think it'd be better to just
return, right before the assertion, as I said before. Seen from another
angle, ISTM that FDWs that don't use a local join execution plan wouldn't
need to be aware of handling the es_epqScanDone flags. (Do you think that
such FDWs should do something like what ExecScanFtch is doing about the
flags, in their RecheckForeignScans? If so, I think we need docs for that.)

I noticed this too when reviewing KaiGai's patch, but ultimately I
think the way KaiGai has it is fine. It may not be useful in some
cases, but AFAICS it should be harmless.

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

Will do.

That would be great.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#55)

On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote: Plan *plan =
&node->scan.plan;

@@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist,
/* cost will be filled in by create_foreignscan_plan */
plan->targetlist = qptlist;
plan->qual = qpqual;
- plan->lefttree = NULL;
+ plan->lefttree = fdw_outerplan;
plan->righttree = NULL;
node->scan.scanrelid = scanrelid;

I think that that would break the EXPLAIN output.

In what way? EXPLAIN recurses into the left and right trees of every
plan node regardless of what type it is, so superficially I feel like
this ought to just work. What problem do you foresee?

I do think that ExecInitForeignScan ought to be changed to
ExecInitNode on it's outer plan if present rather than leaving that to
the FDW's BeginForeignScan method.

One option to avoid that
is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or
BeginForeignScan as you proposed. That breaks the equivalence that the Plan
tree and the PlanState tree should be mirror images of each other, but I
think that that break would be harmless.

I'm not sure how many times I have to say this, but we are not doing
that. I will not commit any patch that does that, and I will
vigorously argue against anyone else committing such a patch either.
That *would* break EXPLAIN, because EXPLAIN relies on being able to
walk the PlanState tree and find all the Plan nodes from the
corresponding PlanState nodes. Now you might think that it would be
OK to omit a plan node that we decided we weren't ever going to
execute, but today we don't do that, and I don't think we should. I
think it could be very confusing if EXPLAIN and EXPLAIN ANALYZE show
different sets of plan nodes for the same query. Quite apart from
EXPLAIN, there are numerous other places that assume that they can
walk the PlanState tree and find all the Plan nodes. Breaking that
assumption would be bad news.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#54)

On Fri, Nov 27, 2015 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry, I don't understand this. In my understanding, fdw_recheck_quals
can be defined for a foreign join, regardless of the join type,

Yes, "can be defined", but will not be workable if either side of
joined tuple is NULL because of outer join. SQL functions returns
NULL prior to evaluation, then ExecQual() treats this result as FALSE.
However, a joined tuple that has NULL fields may be a valid tuple.

We don't need to care about unmatched tuple if INNER JOIN.

This is a really good point, and a very strong argument for the design
KaiGai has chosen here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#57)

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita

One option to avoid that
is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or
BeginForeignScan as you proposed. That breaks the equivalence that the Plan
tree and the PlanState tree should be mirror images of each other, but I
think that that break would be harmless.

I'm not sure how many times I have to say this, but we are not doing
that. I will not commit any patch that does that, and I will
vigorously argue against anyone else committing such a patch either.

And I'll back him up. That's a horrible idea. You're proposing to break
a very fundamental structural property for the convenience of one little
corner of the system.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60Robert Haas
robertmhaas@gmail.com
In reply to: Kouhei Kaigai (#48)
1 attachment(s)

On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.
We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively
small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.
It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

I have done some editing and some small revisions on this patch.
Here's what I came up with. The revisions are mostly cosmetic, but I
revised it a bit so that the signature of GetForeignPlan need not
change. Also, I made nodeForeignScan.c do some of the outer plan
handling automatically, and I fixed the compile breaks in
contrib/file_fdw and contrib/postgres_fdw.

Comments/review/testing are very welcome.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

epq-recheck-v5.patchtext/x-patch; charset=US-ASCII; name=epq-recheck-v5.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 5ce8f90..1966b51 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -525,6 +525,7 @@ fileGetForeignPaths(PlannerInfo *root,
 									 total_cost,
 									 NIL,		/* no pathkeys */
 									 NULL,		/* no outer rel either */
+									 NULL,		/* no extra plan */
 									 coptions));
 
 	/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a6ba672..dd63159 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -535,6 +535,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
+								   NULL,		/* no extra plan */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
@@ -589,6 +590,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 										 total_cost,
 										 usable_pathkeys,
 										 NULL,
+										 NULL,
 										 NIL));
 	}
 
@@ -756,6 +758,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 									   total_cost,
 									   NIL,		/* no pathkeys */
 									   param_info->ppi_req_outer,
+									   NULL,
 									   NIL);	/* no fdw_private list */
 		add_path(baserel, (Path *) path);
 	}
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 1533a6b..a646b2a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -765,6 +765,35 @@ RefetchForeignRow (EState *estate,
      See <xref linkend="fdw-row-locking"> for more information.
     </para>
 
+    <para>
+<programlisting>
+bool
+RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
+</programlisting>
+     Recheck that a previously-returned tuple still matches the relevant
+     scan and join qualifiers, and possibly provide a modified version of
+     the tuple.  For foreign data wrappers which do not perform join pushdown,
+     it will typically be more convenient to set this to <literal>NULL</> and
+     instead set <structfield>fdw_recheck_quals</structfield> appropriately.
+     When outer joins are pushed down, however, it isn't sufficient to
+     reapply the checks relevant to all the base tables to the result tuple,
+     even if all needed attributes are present, because failure to match some
+     qualifier might result in some attributes going to NULL, rather than in
+     no tuple being returned.  <literal>RecheckForeignScan</> can recheck
+     qualifiers and return true if they are still satisfied and false
+     otherwise, but it can also store a replacement tuple into the supplied
+     slot.
+    </para>
+
+    <para>
+     To implement join pushdown, a foreign data wrapper will typically
+     construct an alternative local join plan which is used only for
+     rechecks; this will become the outer subplan of the
+     <literal>ForeignScan</>.  When a recheck is required, this subplan
+     can be executed and the resulting tuple can be stored in the slot.
+     This plan need not be efficient since no base table will return more
+     that one row; for example, it may implement all joins as nested loops.
+    </para>
    </sect2>
 
    <sect2 id="fdw-callbacks-explain">
@@ -1137,11 +1166,17 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
     <para>
      Any clauses removed from the plan node's qual list must instead be added
-     to <literal>fdw_recheck_quals</> in order to ensure correct behavior
+     to <literal>fdw_recheck_quals</> or rechecked by
+     <literal>RecheckForeignScan</> in order to ensure correct behavior
      at the <literal>READ COMMITTED</> isolation level.  When a concurrent
      update occurs for some other table involved in the query, the executor
      may need to verify that all of the original quals are still satisfied for
-     the tuple, possibly against a different set of parameter values.
+     the tuple, possibly against a different set of parameter values.  Using
+     <literal>fdw_recheck_quals</> is typically easier than implementing checks
+     inside <literal>RecheckForeignScan</>, but this method will be
+     insufficient when outer joins have been pushed down, since the join tuples
+     in that case might have some fields go to NULL without rejecting the
+     tuple entirely.
     </para>
 
     <para>
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index a96e826..3faf7f9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -49,8 +49,21 @@ ExecScanFetch(ScanState *node,
 		 */
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
-		if (estate->es_epqTupleSet[scanrelid - 1])
+		if (scanrelid == 0)
+		{
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (estate->es_epqTupleSet[scanrelid - 1])
 		{
 			TupleTableSlot *slot = node->ss_ScanTupleSlot;
 
@@ -347,8 +360,31 @@ ExecScanReScan(ScanState *node)
 	{
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
 
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			/*
+			 * If an FDW or custom scan provider has replaced the join with a
+			 * scan, there are multiple RTIs; reset the epqScanDone flag for
+			 * all of them.
+			 */
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *) node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *) node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int) nodeTag(node->ps.plan));
+
+			while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
 	}
 }
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 6165e4a..fdf88ff 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -73,6 +73,7 @@ ForeignNext(ForeignScanState *node)
 static bool
 ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 {
+	FdwRoutine *fdwroutine = node->fdwroutine;
 	ExprContext *econtext;
 
 	/*
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 
 	ResetExprContext(econtext);
 
+	/*
+	 * If an outer join is pushed down, RecheckForeignScan may need to store a
+	 * different tuple in the slot, because a different set of columns may go
+	 * to NULL upon recheck.  Otherwise, it shouldn't need to change the slot
+	 * contents, just return true or false to indicate whether the quals still
+	 * pass.  For simple cases, setting fdw_recheck_quals may be easier than
+	 * providing this callback.
+	 */
+	if (fdwroutine->RecheckForeignScan &&
+		!fdwroutine->RecheckForeignScan(node, slot))
+		return false;
+
 	return ExecQual(node->fdw_recheck_quals, econtext, false);
 }
 
@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
+	/* Initialize any outer plan. */
+	if (outerPlanState(scanstate))
+		outerPlanState(scanstate) =
+			ExecInitNode(outerPlan(node), estate, eflags);
+
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
@@ -225,6 +243,10 @@ ExecEndForeignScan(ForeignScanState *node)
 	/* Let the FDW shut down */
 	node->fdwroutine->EndForeignScan(node);
 
+	/* Shut down any outer plan. */
+	if (outerPlanState(node))
+		ExecEndNode(outerPlanState(node));
+
 	/* Free the exprcontext */
 	ExecFreeExprContext(&node->ss.ps);
 
@@ -246,7 +268,17 @@ ExecEndForeignScan(ForeignScanState *node)
 void
 ExecReScanForeignScan(ForeignScanState *node)
 {
+	PlanState  *outerPlan = outerPlanState(node);
+
 	node->fdwroutine->ReScanForeignScan(node);
 
+	/*
+	 * If chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.  outerPlan may also be NULL, in which case there
+	 * is nothing to rescan at all.
+	 */
+	if (outerPlan != NULL && outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
 	ExecScanReScan(&node->ss);
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 012c14b..aff27ea 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1683,6 +1683,7 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 
 	_outPathInfo(str, (const Path *) node);
 
+	WRITE_NODE_FIELD(fdw_outerpath);
 	WRITE_NODE_FIELD(fdw_private);
 }
 
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 411b36c..7335579 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2095,11 +2095,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	Index		scan_relid = rel->relid;
 	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
+	Plan	   *fdw_outerplan = NULL;
 	ListCell   *lc;
 	int			i;
 
 	Assert(rel->fdwroutine != NULL);
 
+	/* transform the child path if any */
+	if (best_path->fdw_outerpath)
+		fdw_outerplan = create_plan_recurse(root, best_path->fdw_outerpath);
+
 	/*
 	 * If we're scanning a base relation, fetch its OID.  (Irrelevant if
 	 * scanning a join relation.)
@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 */
 	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
-												tlist, scan_clauses);
+												tlist,
+												scan_clauses);
+	outerPlan(scan_plan) = fdw_outerplan;
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
@@ -3755,7 +3762,6 @@ make_foreignscan(List *qptlist,
 	/* cost will be filled in by create_foreignscan_plan */
 	plan->targetlist = qptlist;
 	plan->qual = qpqual;
-	plan->lefttree = NULL;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	/* fs_server will be filled in by create_foreignscan_plan */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 09c3244..ec0910d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1507,6 +1507,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private)
 {
 	ForeignPath *pathnode = makeNode(ForeignPath);
@@ -1521,6 +1522,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
+	pathnode->fdw_outerpath = fdw_outerpath;
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 69b48b4..1a5e1fd 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -36,13 +36,16 @@ typedef ForeignScan *(*GetForeignPlan_function) (PlannerInfo *root,
 														  Oid foreigntableid,
 													  ForeignPath *best_path,
 															 List *tlist,
-														 List *scan_clauses);
+												 List *scan_clauses);
 
 typedef void (*BeginForeignScan_function) (ForeignScanState *node,
 													   int eflags);
 
 typedef TupleTableSlot *(*IterateForeignScan_function) (ForeignScanState *node);
 
+typedef bool (*RecheckForeignScan_function) (ForeignScanState *node,
+											 TupleTableSlot *slot);
+
 typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
 
 typedef void (*EndForeignScan_function) (ForeignScanState *node);
@@ -162,6 +165,7 @@ typedef struct FdwRoutine
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
+	RecheckForeignScan_function RecheckForeignScan;
 
 	/* Support functions for EXPLAIN */
 	ExplainForeignScan_function ExplainForeignScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9a0dd28..b072e1e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -909,6 +909,7 @@ typedef struct TidPath
 typedef struct ForeignPath
 {
 	Path		path;
+	Path	   *fdw_outerpath;
 	List	   *fdw_private;
 } ForeignPath;
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f28b4e2..35e17e7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -86,6 +86,7 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
#61Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Robert Haas (#60)

Hello, thank you for taking time for this.

At Tue, 1 Dec 2015 14:56:54 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoY+1Cq0bjXBP+coeKtkOMbpUMVQsfL2fJQY+ws7Nu=wgg@mail.gmail.com>

On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

I have done some editing and some small revisions on this patch.
Here's what I came up with. The revisions are mostly cosmetic, but I
revised it a bit so that the signature of GetForeignPlan need not
change. Also, I made nodeForeignScan.c do some of the outer plan
handling automatically, and I fixed the compile breaks in
contrib/file_fdw and contrib/postgres_fdw.

Comments/review/testing are very welcome.

Applied on HEAD with no error. Regtests of core, postgres_fdw and
file_fdw finished with no error. (I haven't done any further testing)

nodeScan.c:

The comments in nodeScan.c looks way clearer. Thank you for rewriting.

nodeForeignscan.c:

Is this a mistake?

@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
scanstate->fdwroutine = fdwroutine;
scanstate->fdw_state = NULL;

+ /* Initialize any outer plan. */

-> +	if (outerPlanState(scanstate))
+> +	if (outerPlanState(node))

+ outerPlanState(scanstate) =

createplan.c, planmain.h:

I agree with reverting the signature of GetForeignPlan.

fdwapi.h:

The reverting of the additional parameter of ForeignScan leaves
only change of indentation of the last parameter.

fdwhandler.sgml:

This is easy to understand to me. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#48)

Sorry, I made a mistake.

At Wed, 02 Dec 2015 10:29:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20151202.102917.50152198.horiguchi.kyotaro@lab.ntt.co.jp>

Hello, thank you for editing.

At Tue, 1 Dec 2015 14:56:54 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoY+1Cq0bjXBP+coeKtkOMbpUMVQsfL2fJQY+ws7Nu=wgg@mail.gmail.com>

On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

I have done some editing and some small revisions on this patch.
Here's what I came up with. The revisions are mostly cosmetic, but I
revised it a bit so that the signature of GetForeignPlan need not
change. Also, I made nodeForeignScan.c do some of the outer plan
handling automatically, and I fixed the compile breaks in
contrib/file_fdw and contrib/postgres_fdw.

Comments/review/testing are very welcome.

Applied on HEAD with no error. Regtests of core, postgres_fdw and
file_fdw finished with no error.

nodeScan.c:

The comments in nodeScan.c looks way clearer. Thank you for rewriting.

nodeForeignscan.c:

Is this a mistake?

@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
scanstate->fdwroutine = fdwroutine;
scanstate->fdw_state = NULL;

+ /* Initialize any outer plan. */

-> +	if (outerPlanState(scanstate))
+> +	if (outerPlanState(node))

+ outerPlanState(scanstate) =

No, the above is wrong.

-> +	if (outerPlanState(scanstate))
+> +	if (outerPlan(node))

+ outerPlanState(scanstate) =

createplan.c, planmain.h:

I agree with reverting the signature of GetForeignPlan.

fdwapi.h:

The reverting of the additional parameter of ForeignScan leaves
only change of indentation of the last parameter.

fdwhandler.sgml:

This is easy to understand to me. Thank you.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#56)

On 2015/12/02 1:41, Robert Haas wrote:

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but actually,
that seems a bit complicated to me. Instead, could we keep the
fdw_outerpath in the fdw_private of a ForeignPath node when creating the
path node during GetForeignPaths, and then create an outerplan accordingly
from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
using create_plan_recurse there? I think that that would make the core
involvment much simpler.

I can't see how it's going to get much simpler than this. The core
core is well under a hundred lines, and it all looks pretty
straightforward to me. All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.

One thing I can think of is that we can keep both the structure of a
ForeignPath node and the API of create_foreignscan_path as-is. The
latter is a good thing for FDW authors. And IIUC the patch you posted
today, I think we could make create_foreignscan_plan a bit simpler too.
Ie, in your patch, you modified that function as follows:

@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, 
ForeignPath *best_path,
  	 */
  	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
  												best_path,
-												tlist, scan_clauses);
+												tlist,
+												scan_clauses);
+	outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more
here about the fdw_outerplan's targetlist and qual; I think that the
targetlist needs to be changed to fdw_scan_tlist, as in the patch [1]/messages/by-id/5624D583.10202@lab.ntt.co.jp,
and that it'd be better to change the qual to remote conditions, ie,
quals not in the scan_plan's scan.plan.qual, to avoid duplicate
evaluation of local conditions. (In the patch [1]/messages/by-id/5624D583.10202@lab.ntt.co.jp, I didn't do anything
about the qual because the current postgres_fdw join pushdown patch
assumes that all the the scan_plan's scan.plan.qual are pushed down.)
Or, FDW authors might want to do something about fdw_recheck_quals for a
foreign-join while creating the fdw_outerplan. So if we do that during
GetForeignPlan, I think we could make create_foreignscan_plan a bit
simpler, or provide flexibility to FDW authors.

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+       /*
+        * FDW driver has to recheck visibility of EPQ tuple towards
+        * the scan qualifiers once it gets pushed down.
+        * In addition, if this node represents a join sub-tree, not
+        * a scan, FDW driver is also responsible to reconstruct
+        * a joined tuple according to the primitive EPQ tuples.
+        */
+       if (fdwroutine->RecheckForeignScan)
+       {
+               if (!fdwroutine->RecheckForeignScan(node, slot))
+                       return false;
+       }

Maybe I'm missing something, but I think we should let FDW do the work if
scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
abort the transaction.)

That would be unnecessarily restrictive. On the one hand, even if
scanrelid != 0, the FDW can decide that it prefers to do the rechecks
using RecheckForeignScan rather than fdw_recheck_quals. For most
FDWs, I expect using fdw_recheck_quals to be more convenient, but
there may be cases where somebody prefers to use RecheckForeignScan,
and allowing that costs nothing.

I suppose that the flexibility would probably be a good thing, but I'm a
little bit concerned that that might be rather confusing to FDW authors.
Maybe I'm missing something, though.

Best regards,
Etsuro Fujita

[1]: /messages/by-id/5624D583.10202@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#63)

On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.
We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
The Plan->outerPlan is a common field, so patch size become relatively
small. FDW driver can initialize this plan at BeginForeignScan, then
execute this sub-plan-tree on demand.

Remaining portions are as previous version. ExecScanFetch is revised
to call recheckMtd always when scanrelid==0, then FDW driver can get
control using RecheckForeignScan callback.
It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
(2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
by its preferable implementation - including execution of an alternative
sub-plan.

There seems to be no changes to make_foreignscan. Is that OK?

create_foreignscan_path(), not only make_foreignscan().

This patch is not tested by actual FDW extensions, so it is helpful
to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

I have done some editing and some small revisions on this patch.
Here's what I came up with. The revisions are mostly cosmetic, but I
revised it a bit so that the signature of GetForeignPlan need not
change.

Thanks for the revising. (I could not be online for a few days, sorry.)

Also, I made nodeForeignScan.c do some of the outer plan
handling automatically,

It's OK for me. We may omit initialization/shutdown of sub-plan when
it is not actually needed, even if FDW driver set up. However, it is
very tiny advantage.

and I fixed the compile breaks in
contrib/file_fdw and contrib/postgres_fdw.

Sorry, I didn't fix up contrib side.

Comments/review/testing are very welcome.

One small point:

@@ -3755,7 +3762,6 @@ make_foreignscan(List *qptlist,
/* cost will be filled in by create_foreignscan_plan */
plan->targetlist = qptlist;
plan->qual = qpqual;
- plan->lefttree = NULL;
plan->righttree = NULL;
node->scan.scanrelid = scanrelid;
/* fs_server will be filled in by create_foreignscan_plan */

Although it is harmless, I prefer this line is kept because caller
of make_foreignscan() expects a ForeignScan node with empty lefttree,
even if it is filled up later.

Best regards,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Etsuro Fujita (#63)

On 2015/12/02 1:41, Robert Haas wrote:

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but actually,
that seems a bit complicated to me. Instead, could we keep the
fdw_outerpath in the fdw_private of a ForeignPath node when creating the
path node during GetForeignPaths, and then create an outerplan accordingly
from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
using create_plan_recurse there? I think that that would make the core
involvment much simpler.

I can't see how it's going to get much simpler than this. The core
core is well under a hundred lines, and it all looks pretty
straightforward to me. All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.

One thing I can think of is that we can keep both the structure of a
ForeignPath node and the API of create_foreignscan_path as-is. The
latter is a good thing for FDW authors. And IIUC the patch you posted
today, I think we could make create_foreignscan_plan a bit simpler too.
Ie, in your patch, you modified that function as follows:

@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root,
ForeignPath *best_path,
*/
scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,

best_path,
-
tlist, scan_clauses);
+
tlist,
+
scan_clauses);
+ outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more
here about the fdw_outerplan's targetlist and qual; I think that the
targetlist needs to be changed to fdw_scan_tlist, as in the patch [1],

Hmm... you are right. The sub-plan shall generate a tuple according to
the fdw_scan_tlist, if valid. Do you think the surgical operation is best
to apply alternative target-list than build_path_tlist()?

and that it'd be better to change the qual to remote conditions, ie,
quals not in the scan_plan's scan.plan.qual, to avoid duplicate
evaluation of local conditions. (In the patch [1], I didn't do anything
about the qual because the current postgres_fdw join pushdown patch
assumes that all the the scan_plan's scan.plan.qual are pushed down.)
Or, FDW authors might want to do something about fdw_recheck_quals for a
foreign-join while creating the fdw_outerplan. So if we do that during
GetForeignPlan, I think we could make create_foreignscan_plan a bit
simpler, or provide flexibility to FDW authors.

So, you suggest it is better to pass fdw_outerplan on the GetForeignPlan
callback, to allow FDW to adjust target-list and quals of sub-plans.
I think it is reasonable argue. Only FDW knows which qualifiers are
executable on remote side, so it is not easy to remove qualifiers to be
executed on host-side only, from the sub-plan tree.

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+       /*
+        * FDW driver has to recheck visibility of EPQ tuple towards
+        * the scan qualifiers once it gets pushed down.
+        * In addition, if this node represents a join sub-tree, not
+        * a scan, FDW driver is also responsible to reconstruct
+        * a joined tuple according to the primitive EPQ tuples.
+        */
+       if (fdwroutine->RecheckForeignScan)
+       {
+               if (!fdwroutine->RecheckForeignScan(node, slot))
+                       return false;
+       }

Maybe I'm missing something, but I think we should let FDW do the work if
scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
abort the transaction.)

That would be unnecessarily restrictive. On the one hand, even if
scanrelid != 0, the FDW can decide that it prefers to do the rechecks
using RecheckForeignScan rather than fdw_recheck_quals. For most
FDWs, I expect using fdw_recheck_quals to be more convenient, but
there may be cases where somebody prefers to use RecheckForeignScan,
and allowing that costs nothing.

I suppose that the flexibility would probably be a good thing, but I'm a
little bit concerned that that might be rather confusing to FDW authors.

We expect FDW authors, like Hanada-san, have deep knowledge about PostgreSQL
internal. It is not a feature for SQL newbie.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Maybe I'm missing something, though.

Best regards,
Etsuro Fujita

[1] /messages/by-id/5624D583.10202@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#57)

On 2015/12/02 1:53, Robert Haas wrote:

On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote: Plan *plan =
&node->scan.plan;

@@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist,
/* cost will be filled in by create_foreignscan_plan */
plan->targetlist = qptlist;
plan->qual = qpqual;
- plan->lefttree = NULL;
+ plan->lefttree = fdw_outerplan;
plan->righttree = NULL;
node->scan.scanrelid = scanrelid;

I think that that would break the EXPLAIN output.

In what way? EXPLAIN recurses into the left and right trees of every
plan node regardless of what type it is, so superficially I feel like
this ought to just work. What problem do you foresee?

I do think that ExecInitForeignScan ought to be changed to
ExecInitNode on it's outer plan if present rather than leaving that to
the FDW's BeginForeignScan method.

IIUC, I think the EXPLAIN output for eg,

select localtab.* from localtab, ft1, ft2 where localtab.a = ft1.a and
ft1.a = ft2.a for update

would be something like this:

LockRows
-> Nested Loop
Join Filter: (ft1.a = localtab.a)
-> Seq Scan on localtab
-> Foreign Scan on ft1/ft2-foreign-join
-> Nested Loop
Join Filter: (ft1.a = ft2.a)
-> Foreign Scan on ft1
-> Foreign Scan on ft2

The subplan below the Foreign Scan on the foreign-join seems odd to me.
One option to avoid that is to handle the subplan as in my patch [2]/messages/by-id/5624D583.10202@lab.ntt.co.jp,
which I created to address your comment that we should not break the
equivalence discussed below. I'm not sure that the patch's handling of
chgParam for the subplan is a good idea, though.

One option to avoid that
is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or
BeginForeignScan as you proposed. That breaks the equivalence that the Plan
tree and the PlanState tree should be mirror images of each other, but I
think that that break would be harmless.

I'm not sure how many times I have to say this, but we are not doing
that. I will not commit any patch that does that, and I will
vigorously argue against anyone else committing such a patch either.
That *would* break EXPLAIN, because EXPLAIN relies on being able to
walk the PlanState tree and find all the Plan nodes from the
corresponding PlanState nodes. Now you might think that it would be
OK to omit a plan node that we decided we weren't ever going to
execute, but today we don't do that, and I don't think we should. I
think it could be very confusing if EXPLAIN and EXPLAIN ANALYZE show
different sets of plan nodes for the same query. Quite apart from
EXPLAIN, there are numerous other places that assume that they can
walk the PlanState tree and find all the Plan nodes. Breaking that
assumption would be bad news.

Agreed. Thanks for the explanation!

Best regards,
Etsuro Fujita

[2]: /messages/by-id/5624D583.10202@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#67Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#65)

On 2015/12/02 14:54, Kouhei Kaigai wrote:

On 2015/12/02 1:41, Robert Haas wrote:

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.

I understand this, as I also did the same thing in my patches, but actually,
that seems a bit complicated to me. Instead, could we keep the
fdw_outerpath in the fdw_private of a ForeignPath node when creating the
path node during GetForeignPaths, and then create an outerplan accordingly
from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
using create_plan_recurse there? I think that that would make the core
involvment much simpler.

I can't see how it's going to get much simpler than this. The core
core is well under a hundred lines, and it all looks pretty
straightforward to me. All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.

One thing I can think of is that we can keep both the structure of a
ForeignPath node and the API of create_foreignscan_path as-is. The
latter is a good thing for FDW authors. And IIUC the patch you posted
today, I think we could make create_foreignscan_plan a bit simpler too.
Ie, in your patch, you modified that function as follows:

@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root,
ForeignPath *best_path,
*/
scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,

best_path,
-
tlist, scan_clauses);
+
tlist,
+
scan_clauses);
+ outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more
here about the fdw_outerplan's targetlist and qual; I think that the
targetlist needs to be changed to fdw_scan_tlist, as in the patch [1],

Hmm... you are right. The sub-plan shall generate a tuple according to
the fdw_scan_tlist, if valid. Do you think the surgical operation is best
to apply alternative target-list than build_path_tlist()?

Sorry, I'm not sure about that. I thought changing it to fdw_scan_tlist
just because that's simple.

and that it'd be better to change the qual to remote conditions, ie,
quals not in the scan_plan's scan.plan.qual, to avoid duplicate
evaluation of local conditions. (In the patch [1], I didn't do anything
about the qual because the current postgres_fdw join pushdown patch
assumes that all the the scan_plan's scan.plan.qual are pushed down.)
Or, FDW authors might want to do something about fdw_recheck_quals for a
foreign-join while creating the fdw_outerplan. So if we do that during
GetForeignPlan, I think we could make create_foreignscan_plan a bit
simpler, or provide flexibility to FDW authors.

So, you suggest it is better to pass fdw_outerplan on the GetForeignPlan
callback, to allow FDW to adjust target-list and quals of sub-plans.

I think that is one option for us. Another option, which I proposed
above, is 1) store an fdw_outerpath in the fdw_private when creating the
ForeignPath node in GetForeignPaths, and then 2) create an fdw_outerplan
from the fdw_outerpath stored into the fdw_private when creating the
ForeignScan node in GetForeignPlan, by using create_plan_recurse in
GetForeignPlan. (To do so, I was thinking to make that function
extern.) One good point about that is that we can keep the API of
create_foreignscan_path as-is, which I think would be a good thing for
FDW authors that don't care about join pushdown.

I think it is reasonable argue. Only FDW knows which qualifiers are
executable on remote side, so it is not easy to remove qualifiers to be
executed on host-side only, from the sub-plan tree.

Yeah, we could provide the flexibility to FDW authors.

@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

ResetExprContext(econtext);

+       /*
+        * FDW driver has to recheck visibility of EPQ tuple towards
+        * the scan qualifiers once it gets pushed down.
+        * In addition, if this node represents a join sub-tree, not
+        * a scan, FDW driver is also responsible to reconstruct
+        * a joined tuple according to the primitive EPQ tuples.
+        */
+       if (fdwroutine->RecheckForeignScan)
+       {
+               if (!fdwroutine->RecheckForeignScan(node, slot))
+                       return false;
+       }

Maybe I'm missing something, but I think we should let FDW do the work if
scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
abort the transaction.)

That would be unnecessarily restrictive. On the one hand, even if
scanrelid != 0, the FDW can decide that it prefers to do the rechecks
using RecheckForeignScan rather than fdw_recheck_quals. For most
FDWs, I expect using fdw_recheck_quals to be more convenient, but
there may be cases where somebody prefers to use RecheckForeignScan,
and allowing that costs nothing.

I suppose that the flexibility would probably be a good thing, but I'm a
little bit concerned that that might be rather confusing to FDW authors.

We expect FDW authors, like Hanada-san, have deep knowledge about PostgreSQL
internal. It is not a feature for SQL newbie.

That's right!

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#58)

On 2015/12/02 1:54, Robert Haas wrote:

On Fri, Nov 27, 2015 at 1:25 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Sorry, I don't understand this. In my understanding, fdw_recheck_quals
can be defined for a foreign join, regardless of the join type,

Yes, "can be defined", but will not be workable if either side of
joined tuple is NULL because of outer join. SQL functions returns
NULL prior to evaluation, then ExecQual() treats this result as FALSE.
However, a joined tuple that has NULL fields may be a valid tuple.

We don't need to care about unmatched tuple if INNER JOIN.

This is a really good point, and a very strong argument for the design
KaiGai has chosen here.

Maybe my explanation was not enough. Sorry about that. But I mean that
we define fdw_recheck_quals for a foreign-join as quals that 1) were
extracted by extract_actual_join_clauses as "otherclauses"
(rinfo->is_pushed_down=true) and that 2) were pushed down to the remote
server, not scan quals relevant to all the base tables invoved in the
foreign-join. So in this definition, I think fdw_recheck_quals for a
foreign-join will be workable, regardless of the join type.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#63)
1 attachment(s)

On Tue, Dec 1, 2015 at 10:20 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

One thing I can think of is that we can keep both the structure of a
ForeignPath node and the API of create_foreignscan_path as-is. The latter
is a good thing for FDW authors. And IIUC the patch you posted today, I
think we could make create_foreignscan_plan a bit simpler too. Ie, in your
patch, you modified that function as follows:

@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath
*best_path,
*/
scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,

best_path,
-
tlist, scan_clauses);
+
tlist,
+
scan_clauses);
+       outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more here
about the fdw_outerplan's targetlist and qual; I think that the targetlist
needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be
better to change the qual to remote conditions, ie, quals not in the
scan_plan's scan.plan.qual, to avoid duplicate evaluation of local
conditions. (In the patch [1], I didn't do anything about the qual because
the current postgres_fdw join pushdown patch assumes that all the the
scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to
do something about fdw_recheck_quals for a foreign-join while creating the
fdw_outerplan. So if we do that during GetForeignPlan, I think we could
make create_foreignscan_plan a bit simpler, or provide flexibility to FDW
authors.

It's certainly true that we need the alternative plan's tlist to match
that of the main plan; otherwise, it's going to be difficult for the
FDW to make use of that alternative subplan to fill its slot, which is
kinda the point of all this. However, I'm quite reluctant to
introduce code into create_foreignscan_plan() that forces the
subplan's tlist to match that of the main plan. For one thing, that
would likely foreclose the possibility of an FDW ever using the outer
plan for any purpose other than EPQ rechecks. It may be hard to
imagine what else you'd do with the outer plan as things are today,
but right now the two haves of the patch - letting FDWs have an outer
subplan, and providing them with a way of overriding the EPQ recheck
behavior - are technically independent. Putting tlist-altering
behavior into create_foreignscan_plan() ties those two things together
irrevocably.

Instead, I think we should go the opposite direction and pass the
outerplan to GetForeignPlan after all. I was lulled into a full sense
of security by the realization that every FDW that uses this feature
MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true,
but irrelevant. The point is that the FDW might want to do something
additional, like frob the outer plan's tlist, and it can't do that if
we don't pass it fdw_outerplan. So we should do that, after all.

Updated patch attached. This fixes a couple of whitespace issues that
were pointed out, also.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

epq-recheck-v6.patchapplication/x-patch; name=epq-recheck-v6.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 5ce8f90..83bbfa1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -121,7 +121,8 @@ static ForeignScan *fileGetForeignPlan(PlannerInfo *root,
 				   Oid foreigntableid,
 				   ForeignPath *best_path,
 				   List *tlist,
-				   List *scan_clauses);
+				   List *scan_clauses,
+				   Plan *outer_plan);
 static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
 static void fileBeginForeignScan(ForeignScanState *node, int eflags);
 static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
@@ -525,6 +526,7 @@ fileGetForeignPaths(PlannerInfo *root,
 									 total_cost,
 									 NIL,		/* no pathkeys */
 									 NULL,		/* no outer rel either */
+									 NULL,		/* no extra plan */
 									 coptions));
 
 	/*
@@ -544,7 +546,8 @@ fileGetForeignPlan(PlannerInfo *root,
 				   Oid foreigntableid,
 				   ForeignPath *best_path,
 				   List *tlist,
-				   List *scan_clauses)
+				   List *scan_clauses,
+				   Plan *outer_plan)
 {
 	Index		scan_relid = baserel->relid;
 
@@ -564,7 +567,8 @@ fileGetForeignPlan(PlannerInfo *root,
 							NIL,	/* no expressions to evaluate */
 							best_path->fdw_private,
 							NIL,	/* no custom tlist */
-							NIL /* no remote quals */ );
+							NIL,	/* no remote quals */
+							outer_plan);
 }
 
 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a6ba672..9a014d4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -214,7 +214,8 @@ static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
 					   Oid foreigntableid,
 					   ForeignPath *best_path,
 					   List *tlist,
-					   List *scan_clauses);
+					   List *scan_clauses,
+					   Plan *outer_plan);
 static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
 static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
 static void postgresReScanForeignScan(ForeignScanState *node);
@@ -535,6 +536,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
+								   NULL,		/* no extra plan */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
@@ -589,6 +591,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 										 total_cost,
 										 usable_pathkeys,
 										 NULL,
+										 NULL,
 										 NIL));
 	}
 
@@ -756,6 +759,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 									   total_cost,
 									   NIL,		/* no pathkeys */
 									   param_info->ppi_req_outer,
+									   NULL,
 									   NIL);	/* no fdw_private list */
 		add_path(baserel, (Path *) path);
 	}
@@ -771,7 +775,8 @@ postgresGetForeignPlan(PlannerInfo *root,
 					   Oid foreigntableid,
 					   ForeignPath *best_path,
 					   List *tlist,
-					   List *scan_clauses)
+					   List *scan_clauses,
+					   Plan *outer_plan)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	Index		scan_relid = baserel->relid;
@@ -915,7 +920,8 @@ postgresGetForeignPlan(PlannerInfo *root,
 							params_list,
 							fdw_private,
 							NIL,	/* no custom tlist */
-							remote_exprs);
+							remote_exprs,
+							outer_plan);
 }
 
 /*
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 1533a6b..0090e24 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -168,7 +168,8 @@ GetForeignPlan (PlannerInfo *root,
                 Oid foreigntableid,
                 ForeignPath *best_path,
                 List *tlist,
-                List *scan_clauses);
+                List *scan_clauses,
+                Plan *outer_plan);
 </programlisting>
 
      Create a <structname>ForeignScan</> plan node from the selected foreign
@@ -765,6 +766,35 @@ RefetchForeignRow (EState *estate,
      See <xref linkend="fdw-row-locking"> for more information.
     </para>
 
+    <para>
+<programlisting>
+bool
+RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
+</programlisting>
+     Recheck that a previously-returned tuple still matches the relevant
+     scan and join qualifiers, and possibly provide a modified version of
+     the tuple.  For foreign data wrappers which do not perform join pushdown,
+     it will typically be more convenient to set this to <literal>NULL</> and
+     instead set <structfield>fdw_recheck_quals</structfield> appropriately.
+     When outer joins are pushed down, however, it isn't sufficient to
+     reapply the checks relevant to all the base tables to the result tuple,
+     even if all needed attributes are present, because failure to match some
+     qualifier might result in some attributes going to NULL, rather than in
+     no tuple being returned.  <literal>RecheckForeignScan</> can recheck
+     qualifiers and return true if they are still satisfied and false
+     otherwise, but it can also store a replacement tuple into the supplied
+     slot.
+    </para>
+
+    <para>
+     To implement join pushdown, a foreign data wrapper will typically
+     construct an alternative local join plan which is used only for
+     rechecks; this will become the outer subplan of the
+     <literal>ForeignScan</>.  When a recheck is required, this subplan
+     can be executed and the resulting tuple can be stored in the slot.
+     This plan need not be efficient since no base table will return more
+     that one row; for example, it may implement all joins as nested loops.
+    </para>
    </sect2>
 
    <sect2 id="fdw-callbacks-explain">
@@ -1137,11 +1167,17 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
     <para>
      Any clauses removed from the plan node's qual list must instead be added
-     to <literal>fdw_recheck_quals</> in order to ensure correct behavior
+     to <literal>fdw_recheck_quals</> or rechecked by
+     <literal>RecheckForeignScan</> in order to ensure correct behavior
      at the <literal>READ COMMITTED</> isolation level.  When a concurrent
      update occurs for some other table involved in the query, the executor
      may need to verify that all of the original quals are still satisfied for
-     the tuple, possibly against a different set of parameter values.
+     the tuple, possibly against a different set of parameter values.  Using
+     <literal>fdw_recheck_quals</> is typically easier than implementing checks
+     inside <literal>RecheckForeignScan</>, but this method will be
+     insufficient when outer joins have been pushed down, since the join tuples
+     in that case might have some fields go to NULL without rejecting the
+     tuple entirely.
     </para>
 
     <para>
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index a96e826..3faf7f9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -49,8 +49,21 @@ ExecScanFetch(ScanState *node,
 		 */
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
-		if (estate->es_epqTupleSet[scanrelid - 1])
+		if (scanrelid == 0)
+		{
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (estate->es_epqTupleSet[scanrelid - 1])
 		{
 			TupleTableSlot *slot = node->ss_ScanTupleSlot;
 
@@ -347,8 +360,31 @@ ExecScanReScan(ScanState *node)
 	{
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
 
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			/*
+			 * If an FDW or custom scan provider has replaced the join with a
+			 * scan, there are multiple RTIs; reset the epqScanDone flag for
+			 * all of them.
+			 */
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *) node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *) node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int) nodeTag(node->ps.plan));
+
+			while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
 	}
 }
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 6165e4a..fdf88ff 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -73,6 +73,7 @@ ForeignNext(ForeignScanState *node)
 static bool
 ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 {
+	FdwRoutine *fdwroutine = node->fdwroutine;
 	ExprContext *econtext;
 
 	/*
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 
 	ResetExprContext(econtext);
 
+	/*
+	 * If an outer join is pushed down, RecheckForeignScan may need to store a
+	 * different tuple in the slot, because a different set of columns may go
+	 * to NULL upon recheck.  Otherwise, it shouldn't need to change the slot
+	 * contents, just return true or false to indicate whether the quals still
+	 * pass.  For simple cases, setting fdw_recheck_quals may be easier than
+	 * providing this callback.
+	 */
+	if (fdwroutine->RecheckForeignScan &&
+		!fdwroutine->RecheckForeignScan(node, slot))
+		return false;
+
 	return ExecQual(node->fdw_recheck_quals, econtext, false);
 }
 
@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
+	/* Initialize any outer plan. */
+	if (outerPlanState(scanstate))
+		outerPlanState(scanstate) =
+			ExecInitNode(outerPlan(node), estate, eflags);
+
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
@@ -225,6 +243,10 @@ ExecEndForeignScan(ForeignScanState *node)
 	/* Let the FDW shut down */
 	node->fdwroutine->EndForeignScan(node);
 
+	/* Shut down any outer plan. */
+	if (outerPlanState(node))
+		ExecEndNode(outerPlanState(node));
+
 	/* Free the exprcontext */
 	ExecFreeExprContext(&node->ss.ps);
 
@@ -246,7 +268,17 @@ ExecEndForeignScan(ForeignScanState *node)
 void
 ExecReScanForeignScan(ForeignScanState *node)
 {
+	PlanState  *outerPlan = outerPlanState(node);
+
 	node->fdwroutine->ReScanForeignScan(node);
 
+	/*
+	 * If chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.  outerPlan may also be NULL, in which case there
+	 * is nothing to rescan at all.
+	 */
+	if (outerPlan != NULL && outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
 	ExecScanReScan(&node->ss);
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 012c14b..aff27ea 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1683,6 +1683,7 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 
 	_outPathInfo(str, (const Path *) node);
 
+	WRITE_NODE_FIELD(fdw_outerpath);
 	WRITE_NODE_FIELD(fdw_private);
 }
 
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 411b36c..32f903d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2095,11 +2095,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	Index		scan_relid = rel->relid;
 	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
+	Plan	   *outer_plan = NULL;
 	ListCell   *lc;
 	int			i;
 
 	Assert(rel->fdwroutine != NULL);
 
+	/* transform the child path if any */
+	if (best_path->fdw_outerpath)
+		outer_plan = create_plan_recurse(root, best_path->fdw_outerpath);
+
 	/*
 	 * If we're scanning a base relation, fetch its OID.  (Irrelevant if
 	 * scanning a join relation.)
@@ -2129,7 +2134,8 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 */
 	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
-												tlist, scan_clauses);
+												tlist, scan_clauses,
+												outer_plan);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
@@ -3747,7 +3753,8 @@ make_foreignscan(List *qptlist,
 				 List *fdw_exprs,
 				 List *fdw_private,
 				 List *fdw_scan_tlist,
-				 List *fdw_recheck_quals)
+				 List *fdw_recheck_quals,
+				 Plan *outer_plan)
 {
 	ForeignScan *node = makeNode(ForeignScan);
 	Plan	   *plan = &node->scan.plan;
@@ -3755,7 +3762,7 @@ make_foreignscan(List *qptlist,
 	/* cost will be filled in by create_foreignscan_plan */
 	plan->targetlist = qptlist;
 	plan->qual = qpqual;
-	plan->lefttree = NULL;
+	plan->lefttree = outer_plan;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	/* fs_server will be filled in by create_foreignscan_plan */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 09c3244..ec0910d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1507,6 +1507,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private)
 {
 	ForeignPath *pathnode = makeNode(ForeignPath);
@@ -1521,6 +1522,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
+	pathnode->fdw_outerpath = fdw_outerpath;
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 69b48b4..e9fdacd 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -36,13 +36,17 @@ typedef ForeignScan *(*GetForeignPlan_function) (PlannerInfo *root,
 														  Oid foreigntableid,
 													  ForeignPath *best_path,
 															 List *tlist,
-														 List *scan_clauses);
+														  List *scan_clauses,
+														   Plan *outer_plan);
 
 typedef void (*BeginForeignScan_function) (ForeignScanState *node,
 													   int eflags);
 
 typedef TupleTableSlot *(*IterateForeignScan_function) (ForeignScanState *node);
 
+typedef bool (*RecheckForeignScan_function) (ForeignScanState *node,
+													   TupleTableSlot *slot);
+
 typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
 
 typedef void (*EndForeignScan_function) (ForeignScanState *node);
@@ -162,6 +166,7 @@ typedef struct FdwRoutine
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
+	RecheckForeignScan_function RecheckForeignScan;
 
 	/* Support functions for EXPLAIN */
 	ExplainForeignScan_function ExplainForeignScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9a0dd28..b072e1e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -909,6 +909,7 @@ typedef struct TidPath
 typedef struct ForeignPath
 {
 	Path		path;
+	Path	   *fdw_outerpath;
 	List	   *fdw_private;
 } ForeignPath;
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f28b4e2..35e17e7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -86,6 +86,7 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 1fb8504..f96e9ee 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -45,7 +45,8 @@ extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
 				 Index scanrelid, List *fdw_exprs, List *fdw_private,
-				 List *fdw_scan_tlist, List *fdw_recheck_quals);
+				 List *fdw_scan_tlist, List *fdw_recheck_quals,
+				 Plan *outer_plan);
 extern Append *make_append(List *appendplans, List *tlist);
 extern RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree, Plan *righttree, int wtParam,
#70Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#69)

On 2015/12/05 5:15, Robert Haas wrote:

On Tue, Dec 1, 2015 at 10:20 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

One thing I can think of is that we can keep both the structure of a
ForeignPath node and the API of create_foreignscan_path as-is. The latter
is a good thing for FDW authors. And IIUC the patch you posted today, I
think we could make create_foreignscan_plan a bit simpler too. Ie, in your
patch, you modified that function as follows:

@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath
*best_path,
*/
scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,

best_path,
-
tlist, scan_clauses);
+
tlist,
+
scan_clauses);
+       outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more here
about the fdw_outerplan's targetlist and qual; I think that the targetlist
needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be
better to change the qual to remote conditions, ie, quals not in the
scan_plan's scan.plan.qual, to avoid duplicate evaluation of local
conditions. (In the patch [1], I didn't do anything about the qual because
the current postgres_fdw join pushdown patch assumes that all the the
scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to
do something about fdw_recheck_quals for a foreign-join while creating the
fdw_outerplan. So if we do that during GetForeignPlan, I think we could
make create_foreignscan_plan a bit simpler, or provide flexibility to FDW
authors.

It's certainly true that we need the alternative plan's tlist to match
that of the main plan; otherwise, it's going to be difficult for the
FDW to make use of that alternative subplan to fill its slot, which is
kinda the point of all this.

OK.

However, I'm quite reluctant to
introduce code into create_foreignscan_plan() that forces the
subplan's tlist to match that of the main plan. For one thing, that
would likely foreclose the possibility of an FDW ever using the outer
plan for any purpose other than EPQ rechecks. It may be hard to
imagine what else you'd do with the outer plan as things are today,
but right now the two haves of the patch - letting FDWs have an outer
subplan, and providing them with a way of overriding the EPQ recheck
behavior - are technically independent. Putting tlist-altering
behavior into create_foreignscan_plan() ties those two things together
irrevocably.

Agreed.

Instead, I think we should go the opposite direction and pass the
outerplan to GetForeignPlan after all. I was lulled into a full sense
of security by the realization that every FDW that uses this feature
MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true,
but irrelevant. The point is that the FDW might want to do something
additional, like frob the outer plan's tlist, and it can't do that if
we don't pass it fdw_outerplan. So we should do that, after all.

As I proposed upthread, another idea would be to 1) to store an
fdw_outerpath in the fdw_private list of a ForeignPath node, and then 2)
to create an fdw_outerplan from *the fdw_outerpath stored into
the fdw_private* in GetForeignPlan. One good thing for this is that we
keep the API of create_foreignscan_path as-is. What do you think about
that?

Updated patch attached. This fixes a couple of whitespace issues that
were pointed out, also.

Thanks for updating the patch!

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#70)

On Mon, Dec 7, 2015 at 12:25 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Instead, I think we should go the opposite direction and pass the
outerplan to GetForeignPlan after all. I was lulled into a full sense
of security by the realization that every FDW that uses this feature
MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true,
but irrelevant. The point is that the FDW might want to do something
additional, like frob the outer plan's tlist, and it can't do that if
we don't pass it fdw_outerplan. So we should do that, after all.

As I proposed upthread, another idea would be to 1) to store an
fdw_outerpath in the fdw_private list of a ForeignPath node, and then 2) to
create an fdw_outerplan from *the fdw_outerpath stored into
the fdw_private* in GetForeignPlan. One good thing for this is that we keep
the API of create_foreignscan_path as-is. What do you think about that?

I don't think it's a good idea, per what I said in the first paragraph
of this email:

/messages/by-id/CA+TgmoZ5G+ZGPh3STMGM6cWgTOywz3N1PjSw6Lvhz31ofgLZVw@mail.gmail.com

I think the core system likely needs visibility into where paths and
plans are present in node trees, and putting them somewhere inside
fdw_private would be going in the opposite direction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#71)

Robert Haas <robertmhaas@gmail.com> writes:

I think the core system likely needs visibility into where paths and
plans are present in node trees, and putting them somewhere inside
fdw_private would be going in the opposite direction.

Absolutely. You don't really want FDWs having to take responsibility
for setrefs.c processing of their node trees, for example. This is why
e.g. ForeignScan has both fdw_exprs and fdw_private.

I'm not too concerned about whether we have to adjust FDW-related APIs
as we go along. It's been clear from the beginning that we'd have to
do that, and we are nowhere near a point where we should promise that
we're done doing so.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#73Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Tom Lane (#72)
2 attachment(s)

On 2015/12/08 3:06, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I think the core system likely needs visibility into where paths and
plans are present in node trees, and putting them somewhere inside
fdw_private would be going in the opposite direction.

Absolutely. You don't really want FDWs having to take responsibility
for setrefs.c processing of their node trees, for example. This is why
e.g. ForeignScan has both fdw_exprs and fdw_private.

I'm not too concerned about whether we have to adjust FDW-related APIs
as we go along. It's been clear from the beginning that we'd have to
do that, and we are nowhere near a point where we should promise that
we're done doing so.

OK, I'd vote for Robert's idea, then. I'd like to discuss the next
thing about his patch. As I mentioned in [1]/messages/by-id/565EA539.1080703@lab.ntt.co.jp, the following change in
the patch will break the EXPLAIN output.

@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState
*estate, int eflags)
scanstate->fdwroutine = fdwroutine;
scanstate->fdw_state = NULL;

+	/* Initialize any outer plan. */
+	if (outerPlanState(scanstate))
+		outerPlanState(scanstate) =
+			ExecInitNode(outerPlan(node), estate, eflags);
+

As pointed out by Horiguchi-san, that's not correct, though; we should
initialize the outer plan if outerPlan(node) != NULL, not
outerPlanState(scanstate) != NULL. Attached is an updated version of
his patch. I'm also attaching an updated version of the postgres_fdw
join pushdown patch. You can find the breaking examples by doing the
regression tests in the postgres_fdw patch. Please apply the patches in
the following order:

epq-recheck-v6-efujita (attached)
usermapping_matching.patch in [2]/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
add_GetUserMappingById.patch in [2]/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
foreign_join_v16_efujita2.patch (attached)

As I proposed upthread, I think we could fix that by handling the outer
plan as in the patch [3]/messages/by-id/5624D583.10202@lab.ntt.co.jp; a) the core initializes the outer plan and
stores it into somewhere in the ForeignScanState node, not the lefttree
of the ForeignScanState node, during ExecInitForeignScan, and b) when
the RecheckForeignScan routine gets called, the FDW extracts the plan
from the given ForeignScanState node and executes it. What do you think
about that?

Best regards,
Etsuro Fujita

[1]: /messages/by-id/565EA539.1080703@lab.ntt.co.jp
[2]: /messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
[3]: /messages/by-id/5624D583.10202@lab.ntt.co.jp

Attachments:

epq-recheck-v6-efujita.patchapplication/x-patch; name=epq-recheck-v6-efujita.patchDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 5ce8f90..83bbfa1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -121,7 +121,8 @@ static ForeignScan *fileGetForeignPlan(PlannerInfo *root,
 				   Oid foreigntableid,
 				   ForeignPath *best_path,
 				   List *tlist,
-				   List *scan_clauses);
+				   List *scan_clauses,
+				   Plan *outer_plan);
 static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
 static void fileBeginForeignScan(ForeignScanState *node, int eflags);
 static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
@@ -525,6 +526,7 @@ fileGetForeignPaths(PlannerInfo *root,
 									 total_cost,
 									 NIL,		/* no pathkeys */
 									 NULL,		/* no outer rel either */
+									 NULL,		/* no extra plan */
 									 coptions));
 
 	/*
@@ -544,7 +546,8 @@ fileGetForeignPlan(PlannerInfo *root,
 				   Oid foreigntableid,
 				   ForeignPath *best_path,
 				   List *tlist,
-				   List *scan_clauses)
+				   List *scan_clauses,
+				   Plan *outer_plan)
 {
 	Index		scan_relid = baserel->relid;
 
@@ -564,7 +567,8 @@ fileGetForeignPlan(PlannerInfo *root,
 							NIL,	/* no expressions to evaluate */
 							best_path->fdw_private,
 							NIL,	/* no custom tlist */
-							NIL /* no remote quals */ );
+							NIL,	/* no remote quals */
+							outer_plan);
 }
 
 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a6ba672..9a014d4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -214,7 +214,8 @@ static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
 					   Oid foreigntableid,
 					   ForeignPath *best_path,
 					   List *tlist,
-					   List *scan_clauses);
+					   List *scan_clauses,
+					   Plan *outer_plan);
 static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
 static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
 static void postgresReScanForeignScan(ForeignScanState *node);
@@ -535,6 +536,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 								   fpinfo->total_cost,
 								   NIL, /* no pathkeys */
 								   NULL,		/* no outer rel either */
+								   NULL,		/* no extra plan */
 								   NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
@@ -589,6 +591,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 										 total_cost,
 										 usable_pathkeys,
 										 NULL,
+										 NULL,
 										 NIL));
 	}
 
@@ -756,6 +759,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 									   total_cost,
 									   NIL,		/* no pathkeys */
 									   param_info->ppi_req_outer,
+									   NULL,
 									   NIL);	/* no fdw_private list */
 		add_path(baserel, (Path *) path);
 	}
@@ -771,7 +775,8 @@ postgresGetForeignPlan(PlannerInfo *root,
 					   Oid foreigntableid,
 					   ForeignPath *best_path,
 					   List *tlist,
-					   List *scan_clauses)
+					   List *scan_clauses,
+					   Plan *outer_plan)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
 	Index		scan_relid = baserel->relid;
@@ -915,7 +920,8 @@ postgresGetForeignPlan(PlannerInfo *root,
 							params_list,
 							fdw_private,
 							NIL,	/* no custom tlist */
-							remote_exprs);
+							remote_exprs,
+							outer_plan);
 }
 
 /*
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 1533a6b..0090e24 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -168,7 +168,8 @@ GetForeignPlan (PlannerInfo *root,
                 Oid foreigntableid,
                 ForeignPath *best_path,
                 List *tlist,
-                List *scan_clauses);
+                List *scan_clauses,
+                Plan *outer_plan);
 </programlisting>
 
      Create a <structname>ForeignScan</> plan node from the selected foreign
@@ -765,6 +766,35 @@ RefetchForeignRow (EState *estate,
      See <xref linkend="fdw-row-locking"> for more information.
     </para>
 
+    <para>
+<programlisting>
+bool
+RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
+</programlisting>
+     Recheck that a previously-returned tuple still matches the relevant
+     scan and join qualifiers, and possibly provide a modified version of
+     the tuple.  For foreign data wrappers which do not perform join pushdown,
+     it will typically be more convenient to set this to <literal>NULL</> and
+     instead set <structfield>fdw_recheck_quals</structfield> appropriately.
+     When outer joins are pushed down, however, it isn't sufficient to
+     reapply the checks relevant to all the base tables to the result tuple,
+     even if all needed attributes are present, because failure to match some
+     qualifier might result in some attributes going to NULL, rather than in
+     no tuple being returned.  <literal>RecheckForeignScan</> can recheck
+     qualifiers and return true if they are still satisfied and false
+     otherwise, but it can also store a replacement tuple into the supplied
+     slot.
+    </para>
+
+    <para>
+     To implement join pushdown, a foreign data wrapper will typically
+     construct an alternative local join plan which is used only for
+     rechecks; this will become the outer subplan of the
+     <literal>ForeignScan</>.  When a recheck is required, this subplan
+     can be executed and the resulting tuple can be stored in the slot.
+     This plan need not be efficient since no base table will return more
+     that one row; for example, it may implement all joins as nested loops.
+    </para>
    </sect2>
 
    <sect2 id="fdw-callbacks-explain">
@@ -1137,11 +1167,17 @@ GetForeignServerByName(const char *name, bool missing_ok);
 
     <para>
      Any clauses removed from the plan node's qual list must instead be added
-     to <literal>fdw_recheck_quals</> in order to ensure correct behavior
+     to <literal>fdw_recheck_quals</> or rechecked by
+     <literal>RecheckForeignScan</> in order to ensure correct behavior
      at the <literal>READ COMMITTED</> isolation level.  When a concurrent
      update occurs for some other table involved in the query, the executor
      may need to verify that all of the original quals are still satisfied for
-     the tuple, possibly against a different set of parameter values.
+     the tuple, possibly against a different set of parameter values.  Using
+     <literal>fdw_recheck_quals</> is typically easier than implementing checks
+     inside <literal>RecheckForeignScan</>, but this method will be
+     insufficient when outer joins have been pushed down, since the join tuples
+     in that case might have some fields go to NULL without rejecting the
+     tuple entirely.
     </para>
 
     <para>
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index a96e826..3faf7f9 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -49,8 +49,21 @@ ExecScanFetch(ScanState *node,
 		 */
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
-		if (estate->es_epqTupleSet[scanrelid - 1])
+		if (scanrelid == 0)
+		{
+			TupleTableSlot *slot = node->ss_ScanTupleSlot;
+
+			/*
+			 * This is a ForeignScan or CustomScan which has pushed down a
+			 * join to the remote side.  The recheck method is responsible not
+			 * only for rechecking the scan/join quals but also for storing
+			 * the correct tuple in the slot.
+			 */
+			if (!(*recheckMtd) (node, slot))
+				ExecClearTuple(slot);	/* would not be returned by scan */
+			return slot;
+		}
+		else if (estate->es_epqTupleSet[scanrelid - 1])
 		{
 			TupleTableSlot *slot = node->ss_ScanTupleSlot;
 
@@ -347,8 +360,31 @@ ExecScanReScan(ScanState *node)
 	{
 		Index		scanrelid = ((Scan *) node->ps.plan)->scanrelid;
 
-		Assert(scanrelid > 0);
+		if (scanrelid > 0)
+			estate->es_epqScanDone[scanrelid - 1] = false;
+		else
+		{
+			Bitmapset  *relids;
+			int			rtindex = -1;
 
-		estate->es_epqScanDone[scanrelid - 1] = false;
+			/*
+			 * If an FDW or custom scan provider has replaced the join with a
+			 * scan, there are multiple RTIs; reset the epqScanDone flag for
+			 * all of them.
+			 */
+			if (IsA(node->ps.plan, ForeignScan))
+				relids = ((ForeignScan *) node->ps.plan)->fs_relids;
+			else if (IsA(node->ps.plan, CustomScan))
+				relids = ((CustomScan *) node->ps.plan)->custom_relids;
+			else
+				elog(ERROR, "unexpected scan node: %d",
+					 (int) nodeTag(node->ps.plan));
+
+			while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+			{
+				Assert(rtindex > 0);
+				estate->es_epqScanDone[rtindex - 1] = false;
+			}
+		}
 	}
 }
diff --git a/src/backend/executor/nodeForeignscan.c b/src/backend/executor/nodeForeignscan.c
index 6165e4a..62959e3 100644
--- a/src/backend/executor/nodeForeignscan.c
+++ b/src/backend/executor/nodeForeignscan.c
@@ -73,6 +73,7 @@ ForeignNext(ForeignScanState *node)
 static bool
 ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 {
+	FdwRoutine *fdwroutine = node->fdwroutine;
 	ExprContext *econtext;
 
 	/*
@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
 
 	ResetExprContext(econtext);
 
+	/*
+	 * If an outer join is pushed down, RecheckForeignScan may need to store a
+	 * different tuple in the slot, because a different set of columns may go
+	 * to NULL upon recheck.  Otherwise, it shouldn't need to change the slot
+	 * contents, just return true or false to indicate whether the quals still
+	 * pass.  For simple cases, setting fdw_recheck_quals may be easier than
+	 * providing this callback.
+	 */
+	if (fdwroutine->RecheckForeignScan &&
+		!fdwroutine->RecheckForeignScan(node, slot))
+		return false;
+
 	return ExecQual(node->fdw_recheck_quals, econtext, false);
 }
 
@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, int eflags)
 	scanstate->fdwroutine = fdwroutine;
 	scanstate->fdw_state = NULL;
 
+	/* Initialize any outer plan. */
+	if (outerPlan(node))
+		outerPlanState(scanstate) =
+			ExecInitNode(outerPlan(node), estate, eflags);
+
 	/*
 	 * Tell the FDW to initialize the scan.
 	 */
@@ -225,6 +243,10 @@ ExecEndForeignScan(ForeignScanState *node)
 	/* Let the FDW shut down */
 	node->fdwroutine->EndForeignScan(node);
 
+	/* Shut down any outer plan. */
+	if (outerPlanState(node))
+		ExecEndNode(outerPlanState(node));
+
 	/* Free the exprcontext */
 	ExecFreeExprContext(&node->ss.ps);
 
@@ -246,7 +268,17 @@ ExecEndForeignScan(ForeignScanState *node)
 void
 ExecReScanForeignScan(ForeignScanState *node)
 {
+	PlanState  *outerPlan = outerPlanState(node);
+
 	node->fdwroutine->ReScanForeignScan(node);
 
+	/*
+	 * If chgParam of subnode is not null then plan will be re-scanned by
+	 * first ExecProcNode.  outerPlan may also be NULL, in which case there
+	 * is nothing to rescan at all.
+	 */
+	if (outerPlan != NULL && outerPlan->chgParam == NULL)
+		ExecReScan(outerPlan);
+
 	ExecScanReScan(&node->ss);
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 012c14b..aff27ea 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1683,6 +1683,7 @@ _outForeignPath(StringInfo str, const ForeignPath *node)
 
 	_outPathInfo(str, (const Path *) node);
 
+	WRITE_NODE_FIELD(fdw_outerpath);
 	WRITE_NODE_FIELD(fdw_private);
 }
 
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 411b36c..32f903d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2095,11 +2095,16 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	Index		scan_relid = rel->relid;
 	Oid			rel_oid = InvalidOid;
 	Bitmapset  *attrs_used = NULL;
+	Plan	   *outer_plan = NULL;
 	ListCell   *lc;
 	int			i;
 
 	Assert(rel->fdwroutine != NULL);
 
+	/* transform the child path if any */
+	if (best_path->fdw_outerpath)
+		outer_plan = create_plan_recurse(root, best_path->fdw_outerpath);
+
 	/*
 	 * If we're scanning a base relation, fetch its OID.  (Irrelevant if
 	 * scanning a join relation.)
@@ -2129,7 +2134,8 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 */
 	scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
 												best_path,
-												tlist, scan_clauses);
+												tlist, scan_clauses,
+												outer_plan);
 
 	/* Copy cost data from Path to Plan; no need to make FDW do this */
 	copy_generic_path_info(&scan_plan->scan.plan, &best_path->path);
@@ -3747,7 +3753,8 @@ make_foreignscan(List *qptlist,
 				 List *fdw_exprs,
 				 List *fdw_private,
 				 List *fdw_scan_tlist,
-				 List *fdw_recheck_quals)
+				 List *fdw_recheck_quals,
+				 Plan *outer_plan)
 {
 	ForeignScan *node = makeNode(ForeignScan);
 	Plan	   *plan = &node->scan.plan;
@@ -3755,7 +3762,7 @@ make_foreignscan(List *qptlist,
 	/* cost will be filled in by create_foreignscan_plan */
 	plan->targetlist = qptlist;
 	plan->qual = qpqual;
-	plan->lefttree = NULL;
+	plan->lefttree = outer_plan;
 	plan->righttree = NULL;
 	node->scan.scanrelid = scanrelid;
 	/* fs_server will be filled in by create_foreignscan_plan */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 09c3244..ec0910d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1507,6 +1507,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private)
 {
 	ForeignPath *pathnode = makeNode(ForeignPath);
@@ -1521,6 +1522,7 @@ create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 	pathnode->path.total_cost = total_cost;
 	pathnode->path.pathkeys = pathkeys;
 
+	pathnode->fdw_outerpath = fdw_outerpath;
 	pathnode->fdw_private = fdw_private;
 
 	return pathnode;
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 69b48b4..e9fdacd 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -36,13 +36,17 @@ typedef ForeignScan *(*GetForeignPlan_function) (PlannerInfo *root,
 														  Oid foreigntableid,
 													  ForeignPath *best_path,
 															 List *tlist,
-														 List *scan_clauses);
+														  List *scan_clauses,
+														   Plan *outer_plan);
 
 typedef void (*BeginForeignScan_function) (ForeignScanState *node,
 													   int eflags);
 
 typedef TupleTableSlot *(*IterateForeignScan_function) (ForeignScanState *node);
 
+typedef bool (*RecheckForeignScan_function) (ForeignScanState *node,
+													   TupleTableSlot *slot);
+
 typedef void (*ReScanForeignScan_function) (ForeignScanState *node);
 
 typedef void (*EndForeignScan_function) (ForeignScanState *node);
@@ -162,6 +166,7 @@ typedef struct FdwRoutine
 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
 	GetForeignRowMarkType_function GetForeignRowMarkType;
 	RefetchForeignRow_function RefetchForeignRow;
+	RecheckForeignScan_function RecheckForeignScan;
 
 	/* Support functions for EXPLAIN */
 	ExplainForeignScan_function ExplainForeignScan;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9a0dd28..b072e1e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -909,6 +909,7 @@ typedef struct TidPath
 typedef struct ForeignPath
 {
 	Path		path;
+	Path	   *fdw_outerpath;
 	List	   *fdw_private;
 } ForeignPath;
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f28b4e2..35e17e7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -86,6 +86,7 @@ extern ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
 						double rows, Cost startup_cost, Cost total_cost,
 						List *pathkeys,
 						Relids required_outer,
+						Path *fdw_outerpath,
 						List *fdw_private);
 
 extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 1fb8504..f96e9ee 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -45,7 +45,8 @@ extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan);
 extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
 				 Index scanrelid, List *fdw_exprs, List *fdw_private,
-				 List *fdw_scan_tlist, List *fdw_recheck_quals);
+				 List *fdw_scan_tlist, List *fdw_recheck_quals,
+				 Plan *outer_plan);
 extern Append *make_append(List *appendplans, List *tlist);
 extern RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree, Plan *righttree, int wtParam,
foreign_join_v16_efujita2.patchapplication/x-patch; name=foreign_join_v16_efujita2.patchDownload
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***************
*** 44,51 ****
--- 44,54 ----
  #include "catalog/pg_proc.h"
  #include "catalog/pg_type.h"
  #include "commands/defrem.h"
+ #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
+ #include "nodes/plannodes.h"
  #include "optimizer/clauses.h"
+ #include "optimizer/prep.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
  #include "utils/builtins.h"
***************
*** 92,97 **** typedef struct deparse_expr_cxt
--- 95,102 ----
  	RelOptInfo *foreignrel;		/* the foreign relation we are planning for */
  	StringInfo	buf;			/* output buffer to append to */
  	List	  **params_list;	/* exprs that will become remote Params */
+ 	List	   *outertlist;		/* outer child's target list */
+ 	List	   *innertlist;		/* inner child's target list */
  } deparse_expr_cxt;
  
  /*
***************
*** 111,117 **** static void deparseTargetList(StringInfo buf,
  				  Index rtindex,
  				  Relation rel,
  				  Bitmapset *attrs_used,
! 				  List **retrieved_attrs);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  					 Index rtindex, Relation rel,
  					 bool trig_after_row,
--- 116,123 ----
  				  Index rtindex,
  				  Relation rel,
  				  Bitmapset *attrs_used,
! 				  List **retrieved_attrs,
! 				  bool alias);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  					 Index rtindex, Relation rel,
  					 bool trig_after_row,
***************
*** 139,144 **** static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
--- 145,151 ----
  				 deparse_expr_cxt *context);
  static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
  					   deparse_expr_cxt *context);
+ static const char *get_jointype_name(JoinType jointype);
  
  
  /*
***************
*** 261,267 **** foreign_expr_walker(Node *node,
  				 * Param's collation, ie it's not safe for it to have a
  				 * non-default collation.
  				 */
! 				if (var->varno == glob_cxt->foreignrel->relid &&
  					var->varlevelsup == 0)
  				{
  					/* Var belongs to foreign table */
--- 268,274 ----
  				 * Param's collation, ie it's not safe for it to have a
  				 * non-default collation.
  				 */
! 				if (bms_is_member(var->varno, glob_cxt->foreignrel->relids) &&
  					var->varlevelsup == 0)
  				{
  					/* Var belongs to foreign table */
***************
*** 703,720 **** deparse_type_name(Oid type_oid, int32 typemod)
   *
   * We also create an integer List of the columns being retrieved, which is
   * returned to *retrieved_attrs.
   */
  void
  deparseSelectSql(StringInfo buf,
  				 PlannerInfo *root,
  				 RelOptInfo *baserel,
  				 Bitmapset *attrs_used,
! 				 List **retrieved_attrs)
  {
  	RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
  	Relation	rel;
  
  	/*
  	 * Core code already has some lock on each rel being planned, so we can
  	 * use NoLock here.
  	 */
--- 710,798 ----
   *
   * We also create an integer List of the columns being retrieved, which is
   * returned to *retrieved_attrs.
+  *
+  * The relations is a string buffer for "Relations" portion of EXPLAIN output,
+  * or NULL if caller doesn't need it.  Note that it should have been
+  * initialized by caller.
+  *
+  * The alias is a flag to add aliases of columns and tables.  This should be
+  * false in the initial call, and will be set true when this function is called
+  * for building a part of a join query.
   */
  void
  deparseSelectSql(StringInfo buf,
  				 PlannerInfo *root,
  				 RelOptInfo *baserel,
  				 Bitmapset *attrs_used,
! 				 List *remote_conds,
! 				 List *pathkeys,
! 				 List **params_list,
! 				 List **fdw_scan_tlist,
! 				 List **retrieved_attrs,
! 				 StringInfo relations,
! 				 bool alias)
  {
+ 	PgFdwRelationInfo  *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
  	RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
  	Relation	rel;
  
  	/*
+ 	 * If given relation was a join relation, recursively construct statement
+ 	 * by putting each outer and inner relations in FROM clause as a subquery
+ 	 * with aliasing.
+ 	 */
+ 	if (baserel->reloptkind == RELOPT_JOINREL)
+ 	{
+ 		RelOptInfo		   *rel_o = fpinfo->outerrel;
+ 		RelOptInfo		   *rel_i = fpinfo->innerrel;
+ 		PgFdwRelationInfo  *fpinfo_o = (PgFdwRelationInfo *) rel_o->fdw_private;
+ 		PgFdwRelationInfo  *fpinfo_i = (PgFdwRelationInfo *) rel_i->fdw_private;
+ 		StringInfoData		sql_o;
+ 		StringInfoData		sql_i;
+ 		List			   *ret_attrs_tmp;	/* not used */
+ 		StringInfoData		relations_o;
+ 		StringInfoData		relations_i;
+ 		const char		   *jointype_str;
+ 
+ 		/*
+ 		 * Deparse query for outer and inner relation, and combine them into
+ 		 * a query.
+ 		 *
+ 		 * Here we don't pass fdw_scan_tlist because targets of underlying
+ 		 * relations are already put in joinrel->reltargetlist, and
+ 		 * deparseJoinRel() takes all care about it.
+ 		 */
+ 		initStringInfo(&sql_o);
+ 		initStringInfo(&relations_o);
+ 		deparseSelectSql(&sql_o, root, rel_o, fpinfo_o->attrs_used,
+ 						 fpinfo_o->remote_conds, NIL, params_list,
+ 						 NULL, &ret_attrs_tmp, &relations_o, true);
+ 		initStringInfo(&sql_i);
+ 		initStringInfo(&relations_i);
+ 		deparseSelectSql(&sql_i, root, rel_i, fpinfo_i->attrs_used,
+ 						 fpinfo_i->remote_conds, NIL, params_list,
+ 						 NULL, &ret_attrs_tmp, &relations_i, true);
+ 
+ 		/* For EXPLAIN output */
+ 		jointype_str = get_jointype_name(fpinfo->jointype);
+ 		if (relations)
+ 			appendStringInfo(relations, "(%s) %s JOIN (%s)",
+ 							 relations_o.data, jointype_str, relations_i.data);
+ 
+ 		deparseJoinSql(buf, root, baserel,
+ 					   fpinfo->outerrel,
+ 					   fpinfo->innerrel,
+ 					   sql_o.data,
+ 					   sql_i.data,
+ 					   fpinfo->jointype,
+ 					   fpinfo->joinclauses,
+ 					   fpinfo->otherclauses,
+ 					   fdw_scan_tlist,
+ 					   retrieved_attrs);
+ 		return;
+ 	}
+ 
+ 	/*
  	 * Core code already has some lock on each rel being planned, so we can
  	 * use NoLock here.
  	 */
***************
*** 725,731 **** deparseSelectSql(StringInfo buf,
  	 */
  	appendStringInfoString(buf, "SELECT ");
  	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
! 					  retrieved_attrs);
  
  	/*
  	 * Construct FROM clause
--- 803,809 ----
  	 */
  	appendStringInfoString(buf, "SELECT ");
  	deparseTargetList(buf, root, baserel->relid, rel, attrs_used,
! 					  retrieved_attrs, alias);
  
  	/*
  	 * Construct FROM clause
***************
*** 733,738 **** deparseSelectSql(StringInfo buf,
--- 811,901 ----
  	appendStringInfoString(buf, " FROM ");
  	deparseRelation(buf, rel);
  
+ 	/*
+ 	 * Return local relation name for EXPLAIN output.
+ 	 * We can't know VERBOSE option is specified or not, so always add shcema
+ 	 * name.
+ 	 */
+ 	if (relations)
+ 	{
+ 		const char	   *namespace;
+ 		const char	   *relname;
+ 		const char	   *refname;
+ 
+ 		namespace = get_namespace_name(get_rel_namespace(rte->relid));
+ 		relname = get_rel_name(rte->relid);
+ 		refname = rte->eref->aliasname;
+ 		appendStringInfo(relations, "%s.%s",
+ 						 quote_identifier(namespace),
+ 						 quote_identifier(relname));
+ 		if (*refname && strcmp(refname, relname) != 0)
+ 			appendStringInfo(relations, " %s",
+ 							 quote_identifier(rte->eref->aliasname));
+ 	}
+ 
+ 	/*
+ 	 * Construct WHERE clause
+ 	 */
+ 	if (remote_conds)
+ 		appendConditions(buf, root, baserel, NULL, NULL, remote_conds,
+ 						 " WHERE ", params_list);
+ 
+ 	/* Add ORDER BY clause if we found any useful pathkeys */
+ 	if (pathkeys)
+ 		appendOrderByClause(buf, root, baserel, pathkeys);
+ 
+ 	/*
+ 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
+ 	 * initial row fetch, rather than later on as is done for local tables.
+ 	 * The extra roundtrips involved in trying to duplicate the local
+ 	 * semantics exactly don't seem worthwhile (see also comments for
+ 	 * RowMarkType).
+ 	 *
+ 	 * Note: because we actually run the query as a cursor, this assumes
+ 	 * that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
+ 	 * before 8.3.
+ 	 */
+ 	if (baserel->relid == root->parse->resultRelation &&
+ 		(root->parse->commandType == CMD_UPDATE ||
+ 		 root->parse->commandType == CMD_DELETE))
+ 	{
+ 		/* Relation is UPDATE/DELETE target, so use FOR UPDATE */
+ 		appendStringInfoString(buf, " FOR UPDATE");
+ 	}
+ 	else
+ 	{
+ 		PlanRowMark *rc = get_plan_rowmark(root->rowMarks, baserel->relid);
+ 
+ 		if (rc)
+ 		{
+ 			/*
+ 			 * Relation is specified as a FOR UPDATE/SHARE target, so handle
+ 			 * that.  (But we could also see LCS_NONE, meaning this isn't a
+ 			 * target relation after all.)
+ 			 *
+ 			 * For now, just ignore any [NO] KEY specification, since (a)
+ 			 * it's not clear what that means for a remote table that we
+ 			 * don't have complete information about, and (b) it wouldn't
+ 			 * work anyway on older remote servers.  Likewise, we don't
+ 			 * worry about NOWAIT.
+ 			 */
+ 			switch (rc->strength)
+ 			{
+ 				case LCS_NONE:
+ 					/* No locking needed */
+ 					break;
+ 				case LCS_FORKEYSHARE:
+ 				case LCS_FORSHARE:
+ 					appendStringInfoString(buf, " FOR SHARE");
+ 					break;
+ 				case LCS_FORNOKEYUPDATE:
+ 				case LCS_FORUPDATE:
+ 					appendStringInfoString(buf, " FOR UPDATE");
+ 					break;
+ 			}
+ 		}
+ 	}
+ 
  	heap_close(rel, NoLock);
  }
  
***************
*** 749,755 **** deparseTargetList(StringInfo buf,
  				  Index rtindex,
  				  Relation rel,
  				  Bitmapset *attrs_used,
! 				  List **retrieved_attrs)
  {
  	TupleDesc	tupdesc = RelationGetDescr(rel);
  	bool		have_wholerow;
--- 912,919 ----
  				  Index rtindex,
  				  Relation rel,
  				  Bitmapset *attrs_used,
! 				  List **retrieved_attrs,
! 				  bool alias)
  {
  	TupleDesc	tupdesc = RelationGetDescr(rel);
  	bool		have_wholerow;
***************
*** 780,785 **** deparseTargetList(StringInfo buf,
--- 944,952 ----
  			first = false;
  
  			deparseColumnRef(buf, rtindex, i, root);
+ 			if (alias)
+ 				appendStringInfo(buf, " a%d",
+ 								 i - FirstLowInvalidHeapAttributeNumber);
  
  			*retrieved_attrs = lappend_int(*retrieved_attrs, i);
  		}
***************
*** 797,802 **** deparseTargetList(StringInfo buf,
--- 964,972 ----
  		first = false;
  
  		appendStringInfoString(buf, "ctid");
+ 		if (alias)
+ 			appendStringInfo(buf, " a%d",
+ 							 SelfItemPointerAttributeNumber - FirstLowInvalidHeapAttributeNumber);
  
  		*retrieved_attrs = lappend_int(*retrieved_attrs,
  									   SelfItemPointerAttributeNumber);
***************
*** 808,818 **** deparseTargetList(StringInfo buf,
  }
  
  /*
!  * Deparse WHERE clauses in given list of RestrictInfos and append them to buf.
   *
   * baserel is the foreign table we're planning for.
   *
!  * If no WHERE clause already exists in the buffer, is_first should be true.
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
--- 978,990 ----
  }
  
  /*
!  * Deparse conditions, such as WHERE clause and ON clause of JOIN, in the given
!  * list, consist of RestrictInfo or Expr, and append string representation of
!  * them to buf.
   *
   * baserel is the foreign table we're planning for.
   *
!  * prefix is placed before the conditions, if any.
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
***************
*** 822,837 **** deparseTargetList(StringInfo buf,
   * so Params and other-relation Vars should be replaced by dummy values.
   */
  void
! appendWhereClause(StringInfo buf,
! 				  PlannerInfo *root,
! 				  RelOptInfo *baserel,
! 				  List *exprs,
! 				  bool is_first,
! 				  List **params)
  {
  	deparse_expr_cxt context;
  	int			nestlevel;
  	ListCell   *lc;
  
  	if (params)
  		*params = NIL;			/* initialize result list to empty */
--- 994,1012 ----
   * so Params and other-relation Vars should be replaced by dummy values.
   */
  void
! appendConditions(StringInfo buf,
! 				 PlannerInfo *root,
! 				 RelOptInfo *baserel,
! 				 List *outertlist,
! 				 List *innertlist,
! 				 List *exprs,
! 				 const char *prefix,
! 				 List **params)
  {
  	deparse_expr_cxt context;
  	int			nestlevel;
  	ListCell   *lc;
+ 	bool		is_first = prefix == NULL ? false : true;
  
  	if (params)
  		*params = NIL;			/* initialize result list to empty */
***************
*** 841,862 **** appendWhereClause(StringInfo buf,
  	context.foreignrel = baserel;
  	context.buf = buf;
  	context.params_list = params;
  
  	/* Make sure any constants in the exprs are printed portably */
  	nestlevel = set_transmission_modes();
  
  	foreach(lc, exprs)
  	{
  		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
  
  		/* Connect expressions with "AND" and parenthesize each condition. */
  		if (is_first)
! 			appendStringInfoString(buf, " WHERE ");
  		else
  			appendStringInfoString(buf, " AND ");
  
  		appendStringInfoChar(buf, '(');
! 		deparseExpr(ri->clause, &context);
  		appendStringInfoChar(buf, ')');
  
  		is_first = false;
--- 1016,1051 ----
  	context.foreignrel = baserel;
  	context.buf = buf;
  	context.params_list = params;
+ 	context.outertlist = outertlist;
+ 	context.innertlist = innertlist;
  
  	/* Make sure any constants in the exprs are printed portably */
  	nestlevel = set_transmission_modes();
  
  	foreach(lc, exprs)
  	{
+ 		Node	   *node = (Node *) lfirst(lc);
  		RestrictInfo *ri = (RestrictInfo *) lfirst(lc);
+ 		Expr	   *expr = (Expr *) lfirst(lc);
+ 
+ 		if (IsA(node, RestrictInfo))
+ 		{
+ 			expr = ri->clause;
+ 		}
+ 		else
+ 		{
+ 			expr = ri->clause;
+ 			expr = (Expr *) node;
+ 		}
  
  		/* Connect expressions with "AND" and parenthesize each condition. */
  		if (is_first)
! 			appendStringInfoString(buf, prefix);
  		else
  			appendStringInfoString(buf, " AND ");
  
  		appendStringInfoChar(buf, '(');
! 		deparseExpr(expr, &context);
  		appendStringInfoChar(buf, ')');
  
  		is_first = false;
***************
*** 866,871 **** appendWhereClause(StringInfo buf,
--- 1055,1351 ----
  }
  
  /*
+  * Returns position index (start with 1) of given var in given target list, or
+  * 0 when not found.
+  */
+ static int
+ find_var_pos(Var *node, List *tlist)
+ {
+ 	int		pos = 1;
+ 	ListCell *lc;
+ 
+ 	foreach(lc, tlist)
+ 	{
+ 		Var *var = (Var *) lfirst(lc);
+ 
+ 		if (equal(var, node))
+ 		{
+ 			return pos;
+ 		}
+ 		pos++;
+ 	}
+ 
+ 	return 0;
+ }
+ 
+ /*
+  * Deparse given Var into buf.
+  */
+ static void
+ deparseJoinVar(Var *node, deparse_expr_cxt *context)
+ {
+ 	char		side;
+ 	int			pos;
+ 
+ 	pos = find_var_pos(node, context->outertlist);
+ 	if (pos > 0)
+ 		side = 'l';
+ 	else
+ 	{
+ 		side = 'r';
+ 		pos = find_var_pos(node, context->innertlist);
+ 	}
+ 
+ 	/*
+ 	 * We treat whole-row reference same as ordinary attribute references,
+ 	 * because such transformation should be done in lower level.
+ 	 */
+ 	appendStringInfo(context->buf, "%c.a%d", side, pos);
+ }
+ 
+ /*
+  * Deparse column alias list for a subquery in FROM clause.
+  */
+ static void
+ deparseColumnAliases(StringInfo buf, List *tlist)
+ {
+ 	int			pos;
+ 	ListCell   *lc;
+ 
+ 	pos = 1;
+ 	foreach(lc, tlist)
+ 	{
+ 		/* Deparse column alias for the subquery */
+ 		if (pos > 1)
+ 			appendStringInfoString(buf, ", ");
+ 		appendStringInfo(buf, "a%d", pos);
+ 		pos++;
+ 	}
+ }
+ 
+ /*
+  * Deparse "wrapper" SQL for a query which projects target lists in proper
+  * order and contents.  Note that this treatment is necessary only for queries
+  * used in FROM clause of a join query.
+  *
+  * Even if the SQL is enough simple (no ctid, no whole-row reference), the order
+  * of output column might different from underlying scan, so we always need to
+  * wrap the queries for join sources.
+  *
+  */
+ static const char *
+ deparseProjectionSql(PlannerInfo *root,
+ 					 RelOptInfo *baserel,
+ 					 const char *sql,
+ 					 char side)
+ {
+ 	StringInfoData wholerow;
+ 	StringInfoData buf;
+ 	ListCell   *lc;
+ 	bool		first;
+ 	bool		have_wholerow = false;
+ 
+ 	/*
+ 	 * We have nothing to do if the targetlist contains no special reference,
+ 	 * such as whole-row and ctid.
+ 	 */
+ 	foreach(lc, baserel->reltargetlist)
+ 	{
+ 		Var		   *var = (Var *) lfirst(lc);
+ 		if (var->varattno == 0)
+ 		{
+ 			have_wholerow = true;
+ 			break;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Construct whole-row reference with ROW() syntax
+ 	 */
+ 	if (have_wholerow)
+ 	{
+ 		RangeTblEntry *rte;
+ 		Relation		rel;
+ 		TupleDesc		tupdesc;
+ 		int				i;
+ 
+ 		/* Obtain TupleDesc for deparsing all valid columns */
+ 		rte = planner_rt_fetch(baserel->relid, root);
+ 		rel = heap_open(rte->relid, NoLock);
+ 		tupdesc = rel->rd_att;
+ 
+ 		/* Print all valid columns in ROW() to generate whole-row value */
+ 		initStringInfo(&wholerow);
+ 		appendStringInfoString(&wholerow, "ROW(");
+ 		first = true;
+ 		for (i = 1; i <= tupdesc->natts; i++)
+ 		{
+ 			Form_pg_attribute attr = tupdesc->attrs[i - 1];
+ 
+ 			/* Ignore dropped columns. */
+ 			if (attr->attisdropped)
+ 				continue;
+ 
+ 			if (!first)
+ 				appendStringInfoString(&wholerow, ", ");
+ 			first = false;
+ 
+ 			appendStringInfo(&wholerow, "%c.a%d", side,
+ 							 i - FirstLowInvalidHeapAttributeNumber);
+ 		}
+ 		appendStringInfoString(&wholerow, ")");
+ 
+ 		heap_close(rel, NoLock);
+ 	}
+ 
+ 	/*
+ 	 * Construct a SELECT statement which has the original query in its FROM
+ 	 * clause, and have target list entries in its SELECT clause.  The number
+ 	 * used in column aliases are attnum - FirstLowInvalidHeapAttributeNumber in
+ 	 * order to make all numbers positive even for system columns which have
+ 	 * minus value as attnum.
+ 	 */
+ 	initStringInfo(&buf);
+ 	appendStringInfoString(&buf, "SELECT ");
+ 	first = true;
+ 	foreach(lc, baserel->reltargetlist)
+ 	{
+ 		Var *var = (Var *) lfirst(lc);
+ 
+ 		if (!first)
+ 			appendStringInfoString(&buf, ", ");
+ 	
+ 		if (var->varattno == 0)
+ 			appendStringInfo(&buf, "%s", wholerow.data);
+ 		else
+ 			appendStringInfo(&buf, "%c.a%d", side,
+ 							 var->varattno - FirstLowInvalidHeapAttributeNumber);
+ 
+ 		first = false;
+ 	}
+ 	appendStringInfo(&buf, " FROM (%s) %c", sql, side);
+ 
+ 	return buf.data;
+ }
+ 
+ static const char *
+ get_jointype_name(JoinType jointype)
+ {
+ 	if (jointype == JOIN_INNER)
+ 		return "INNER";
+ 	else if (jointype == JOIN_LEFT)
+ 		return "LEFT";
+ 	else if (jointype == JOIN_RIGHT)
+ 		return "RIGHT";
+ 	else if (jointype == JOIN_FULL)
+ 		return "FULL";
+ 
+ 	/* not reached */
+ 	elog(ERROR, "unsupported join type %d", jointype);
+ }
+ 
+ /*
+  * Construct a SELECT statement which contains join clause.
+  *
+  * We also create an TargetEntry List of the columns being retrieved, which is
+  * returned to *fdw_scan_tlist.
+  *
+  * path_o, tl_o, sql_o are respectively path, targetlist, and remote query
+  * statement of the outer child relation.  postfix _i means those for the inner
+  * child relation.  jointype and joinclauses are information of join method.
+  * fdw_scan_tlist is output parameter to pass target list of the pseudo scan to
+  * caller.
+  */
+ void
+ deparseJoinSql(StringInfo buf,
+ 			   PlannerInfo *root,
+ 			   RelOptInfo *baserel,
+ 			   RelOptInfo *outerrel,
+ 			   RelOptInfo *innerrel,
+ 			   const char *sql_o,
+ 			   const char *sql_i,
+ 			   JoinType jointype,
+ 			   List *joinclauses,
+ 			   List *otherclauses,
+ 			   List **fdw_scan_tlist,
+ 			   List **retrieved_attrs)
+ {
+ 	StringInfoData selbuf;		/* buffer for SELECT clause */
+ 	StringInfoData abuf_o;		/* buffer for column alias list of outer */
+ 	StringInfoData abuf_i;		/* buffer for column alias list of inner */
+ 	int			i;
+ 	ListCell   *lc;
+ 	const char *jointype_str;
+ 	deparse_expr_cxt context;
+ 
+ 	context.root = root;
+ 	context.foreignrel = baserel;
+ 	context.buf = &selbuf;
+ 	context.params_list = NULL;
+ 	context.outertlist = outerrel->reltargetlist;
+ 	context.innertlist = innerrel->reltargetlist;
+ 
+ 	jointype_str = get_jointype_name(jointype);
+ 	*retrieved_attrs = NIL;
+ 
+ 	/* print SELECT clause of the join scan */
+ 	initStringInfo(&selbuf);
+ 	i = 0;
+ 	foreach(lc, baserel->reltargetlist)
+ 	{
+ 		Var		   *var = (Var *) lfirst(lc);
+ 		TargetEntry *tle;
+ 
+ 		if (i > 0)
+ 			appendStringInfoString(&selbuf, ", ");
+ 		deparseJoinVar(var, &context);
+ 
+ 		tle = makeTargetEntry((Expr *) var, i + 1, NULL, false);
+ 		if (fdw_scan_tlist)
+ 			*fdw_scan_tlist = lappend(*fdw_scan_tlist, tle);
+ 
+ 		*retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+ 
+ 		i++;
+ 	}
+ 	if (i == 0)
+ 		appendStringInfoString(&selbuf, "NULL");
+ 
+ 	/*
+ 	 * Do pseudo-projection for an underlying scan on a foreign table, if a) the
+ 	 * relation is a base relation, and b) its targetlist contains whole-row
+ 	 * reference.
+ 	 */
+ 	if (outerrel->reloptkind == RELOPT_BASEREL)
+ 		sql_o = deparseProjectionSql(root, outerrel, sql_o, 'l');
+ 	if (innerrel->reloptkind == RELOPT_BASEREL)
+ 		sql_i = deparseProjectionSql(root, innerrel, sql_i, 'r');
+ 
+ 	/* Deparse column alias portion of subquery in FROM clause. */
+ 	initStringInfo(&abuf_o);
+ 	deparseColumnAliases(&abuf_o, outerrel->reltargetlist);
+ 	initStringInfo(&abuf_i);
+ 	deparseColumnAliases(&abuf_i, innerrel->reltargetlist);
+ 
+ 	/* Construct SELECT statement */
+ 	appendStringInfo(buf, "SELECT %s FROM", selbuf.data);
+ 	appendStringInfo(buf, " (%s) l (%s) %s JOIN (%s) r (%s)",
+ 					 sql_o, abuf_o.data, jointype_str, sql_i, abuf_i.data);
+ 	/* Append ON clause */
+ 	if (joinclauses)
+ 		appendConditions(buf, root, baserel,
+ 						 outerrel->reltargetlist, innerrel->reltargetlist,
+ 						 joinclauses,
+ 						 " ON ", NULL);
+ 	/* Append WHERE clause */
+ 	if (otherclauses)
+ 		appendConditions(buf, root, baserel,
+ 						 outerrel->reltargetlist, innerrel->reltargetlist,
+ 						 otherclauses,
+ 						 " WHERE ", NULL);
+ }
+ 
+ /*
   * deparse remote INSERT statement
   *
   * The statement text is appended to buf, and we also create an integer List
***************
*** 1025,1031 **** deparseReturningList(StringInfo buf, PlannerInfo *root,
  	{
  		appendStringInfoString(buf, " RETURNING ");
  		deparseTargetList(buf, root, rtindex, rel, attrs_used,
! 						  retrieved_attrs);
  	}
  	else
  		*retrieved_attrs = NIL;
--- 1505,1511 ----
  	{
  		appendStringInfoString(buf, " RETURNING ");
  		deparseTargetList(buf, root, rtindex, rel, attrs_used,
! 						  retrieved_attrs, false);
  	}
  	else
  		*retrieved_attrs = NIL;
***************
*** 1292,1297 **** deparseExpr(Expr *node, deparse_expr_cxt *context)
--- 1772,1779 ----
  /*
   * Deparse given Var node into context->buf.
   *
+  * If context has valid innerrel, this is invoked for a join conditions.
+  *
   * If the Var belongs to the foreign relation, just print its remote name.
   * Otherwise, it's effectively a Param (and will in fact be a Param at
   * run time).  Handle it the same way we handle plain Params --- see
***************
*** 1302,1340 **** deparseVar(Var *node, deparse_expr_cxt *context)
  {
  	StringInfo	buf = context->buf;
  
! 	if (node->varno == context->foreignrel->relid &&
! 		node->varlevelsup == 0)
  	{
! 		/* Var belongs to foreign table */
! 		deparseColumnRef(buf, node->varno, node->varattno, context->root);
  	}
  	else
  	{
! 		/* Treat like a Param */
! 		if (context->params_list)
  		{
! 			int			pindex = 0;
! 			ListCell   *lc;
! 
! 			/* find its index in params_list */
! 			foreach(lc, *context->params_list)
  			{
! 				pindex++;
! 				if (equal(node, (Node *) lfirst(lc)))
! 					break;
  			}
! 			if (lc == NULL)
  			{
! 				/* not in list, so add it */
! 				pindex++;
! 				*context->params_list = lappend(*context->params_list, node);
  			}
- 
- 			printRemoteParam(pindex, node->vartype, node->vartypmod, context);
- 		}
- 		else
- 		{
- 			printRemotePlaceholder(node->vartype, node->vartypmod, context);
  		}
  	}
  }
--- 1784,1829 ----
  {
  	StringInfo	buf = context->buf;
  
! 	if (context->foreignrel->reloptkind == RELOPT_JOINREL)
  	{
! 		deparseJoinVar(node, context);
  	}
  	else
  	{
! 		if (node->varno == context->foreignrel->relid &&
! 			node->varlevelsup == 0)
  		{
! 			/* Var belongs to foreign table */
! 			deparseColumnRef(buf, node->varno, node->varattno, context->root);
! 		}
! 		else
! 		{
! 			/* Treat like a Param */
! 			if (context->params_list)
  			{
! 				int			pindex = 0;
! 				ListCell   *lc;
! 
! 				/* find its index in params_list */
! 				foreach(lc, *context->params_list)
! 				{
! 					pindex++;
! 					if (equal(node, (Node *) lfirst(lc)))
! 						break;
! 				}
! 				if (lc == NULL)
! 				{
! 					/* not in list, so add it */
! 					pindex++;
! 					*context->params_list = lappend(*context->params_list, node);
! 				}
! 
! 				printRemoteParam(pindex, node->vartype, node->vartypmod, context);
  			}
! 			else
  			{
! 				printRemotePlaceholder(node->vartype, node->vartypmod, context);
  			}
  		}
  	}
  }
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 9,19 **** DO $d$
--- 9,24 ----
              OPTIONS (dbname '$$||current_database()||$$',
                       port '$$||current_setting('port')||$$'
              )$$;
+         EXECUTE $$CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+             OPTIONS (dbname '$$||current_database()||$$',
+                      port '$$||current_setting('port')||$$'
+             )$$;
      END;
  $d$;
  CREATE USER MAPPING FOR public SERVER testserver1
  	OPTIONS (user 'value', password 'value');
  CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
+ CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
  -- ===================================================================
  -- create objects used through FDW loopback server
  -- ===================================================================
***************
*** 35,40 **** CREATE TABLE "S 1"."T 2" (
--- 40,57 ----
  	c2 text,
  	CONSTRAINT t2_pkey PRIMARY KEY (c1)
  );
+ CREATE TABLE "S 1"."T 3" (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text,
+ 	CONSTRAINT t3_pkey PRIMARY KEY (c1)
+ );
+ CREATE TABLE "S 1"."T 4" (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c4 text,
+ 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
+ );
  INSERT INTO "S 1"."T 1"
  	SELECT id,
  	       id % 10,
***************
*** 49,56 **** INSERT INTO "S 1"."T 2"
--- 66,87 ----
  	SELECT id,
  	       'AAA' || to_char(id, 'FM000')
  	FROM generate_series(1, 100) id;
+ INSERT INTO "S 1"."T 3"
+ 	SELECT id,
+ 	       id + 1,
+ 	       'AAA' || to_char(id, 'FM000')
+ 	FROM generate_series(1, 100) id;
+ DELETE FROM "S 1"."T 3" WHERE c1 % 2 != 0;	-- delete for outer join tests
+ INSERT INTO "S 1"."T 4"
+ 	SELECT id,
+ 	       id + 1,
+ 	       'AAA' || to_char(id, 'FM000')
+ 	FROM generate_series(1, 100) id;
+ DELETE FROM "S 1"."T 4" WHERE c1 % 3 != 0;	-- delete for outer join tests
  ANALYZE "S 1"."T 1";
  ANALYZE "S 1"."T 2";
+ ANALYZE "S 1"."T 3";
+ ANALYZE "S 1"."T 4";
  -- ===================================================================
  -- create foreign tables
  -- ===================================================================
***************
*** 78,83 **** CREATE FOREIGN TABLE ft2 (
--- 109,134 ----
  	c8 user_enum
  ) SERVER loopback;
  ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
+ CREATE FOREIGN TABLE ft4 (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text
+ ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
+ CREATE FOREIGN TABLE ft5 (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text
+ ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
+ CREATE FOREIGN TABLE ft6 (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text
+ ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+ CREATE USER view_owner;
+ GRANT ALL ON ft5 TO view_owner;
+ CREATE VIEW v_ft5 AS SELECT * FROM ft5;
+ ALTER VIEW v_ft5 OWNER TO view_owner;
+ CREATE USER MAPPING FOR view_owner SERVER loopback;
  -- ===================================================================
  -- tests for validator
  -- ===================================================================
***************
*** 127,138 **** ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
  ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
  ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
  \det+
!                              List of foreign tables
!  Schema | Table |  Server  |              FDW Options              | Description 
! --------+-------+----------+---------------------------------------+-------------
!  public | ft1   | loopback | (schema_name 'S 1', table_name 'T 1') | 
!  public | ft2   | loopback | (schema_name 'S 1', table_name 'T 1') | 
! (2 rows)
  
  -- Now we should be able to run ANALYZE.
  -- To exercise multiple code paths, we use local stats on ft1
--- 178,192 ----
  ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
  ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
  \det+
!                               List of foreign tables
!  Schema | Table |  Server   |              FDW Options              | Description 
! --------+-------+-----------+---------------------------------------+-------------
!  public | ft1   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
!  public | ft2   | loopback  | (schema_name 'S 1', table_name 'T 1') | 
!  public | ft4   | loopback  | (schema_name 'S 1', table_name 'T 3') | 
!  public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
!  public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
! (5 rows)
  
  -- Now we should be able to run ANALYZE.
  -- To exercise multiple code paths, we use local stats on ft1
***************
*** 281,302 **** SELECT COUNT(*) FROM ft1 t1;
    1000
  (1 row)
  
- -- join two tables
- SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
-  c1  
- -----
-  101
-  102
-  103
-  104
-  105
-  106
-  107
-  108
-  109
-  110
- (10 rows)
- 
  -- subquery
  SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
   c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
--- 335,340 ----
***************
*** 446,462 **** EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c8 = 'foo';  -- can't
  -- parameterized remote path
  EXPLAIN (VERBOSE, COSTS false)
    SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
!                                                  QUERY PLAN                                                  
! -------------------------------------------------------------------------------------------------------------
!  Nested Loop
     Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
!    ->  Foreign Scan on public.ft2 a
!          Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8
!          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 47))
!    ->  Foreign Scan on public.ft2 b
!          Output: b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
!          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (($1::integer = "C 1"))
! (8 rows)
  
  SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
   c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
--- 484,496 ----
  -- parameterized remote path
  EXPLAIN (VERBOSE, COSTS false)
    SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
!                                                                                                                                                                                                                                                                                      QUERY PLAN                                                                                                                                                                                                                                                                                      
! -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
!  Foreign Scan
     Output: a.c1, a.c2, a.c3, a.c4, a.c5, a.c6, a.c7, a.c8, b.c1, b.c2, b.c3, b.c4, b.c5, b.c6, b.c7, b.c8
!    Relations: (public.ft2 a) INNER JOIN (public.ft2 b)
!    Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8 FROM (SELECT l.a9, l.a10, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE (("C 1" = 47))) l) l (a1, a2, a3, a4, a5, a6, a7, a8) INNER JOIN (SELECT r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8) ON ((l.a2 = r.a1))
! (4 rows)
  
  SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
   c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  | c1 | c2 |  c3   |              c4              |            c5            | c6 |     c7     | c8  
***************
*** 757,762 **** SELECT count(c3) FROM ft1 t1 WHERE t1.c1 === t1.c2;
--- 791,1434 ----
  (1 row)
  
  -- ===================================================================
+ -- JOIN queries
+ -- ===================================================================
+ -- join two tables
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                                                                                                QUERY PLAN                                                                                                                
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1, t1.c3
+    ->  Sort
+          Output: t1.c1, t2.c1, t1.c3
+          Sort Key: t1.c3, t1.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c1, t1.c3
+                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                Remote SQL: SELECT l.a1, l.a2, r.a1 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+  c1  | c1  
+ -----+-----
+  101 | 101
+  102 | 102
+  103 | 103
+  104 | 104
+  105 | 105
+  106 | 106
+  107 | 107
+  108 | 108
+  109 | 109
+  110 | 110
+ (10 rows)
+ 
+ -- join three tables
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+                                                                                                                                                                                                               QUERY PLAN                                                                                                                                                                                                               
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c2, t3.c3, t1.c3
+    ->  Sort
+          Output: t1.c1, t2.c2, t3.c3, t1.c3
+          Sort Key: t1.c3, t1.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c2, t3.c3, t1.c3
+                Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+                Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1 FROM (SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a10, r.a9 FROM (SELECT "C 1" a9, c2 a10 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a1 = r.a2))) l (a1, a2, a3, a4) INNER JOIN (SELECT r.a11, r.a9 FROM (SELECT c1 a9, c3 a11 FROM "S 1"."T 3") r) r (a1, a2) ON ((l.a1 = r.a2))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+  c1 | c2 |   c3   
+ ----+----+--------
+  22 |  2 | AAA022
+  24 |  4 | AAA024
+  26 |  6 | AAA026
+  28 |  8 | AAA028
+  30 |  0 | AAA030
+  32 |  2 | AAA032
+  34 |  4 | AAA034
+  36 |  6 | AAA036
+  38 |  8 | AAA038
+  40 |  0 | AAA040
+ (10 rows)
+ 
+ -- left outer join
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+                                                                                               QUERY PLAN                                                                                               
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Sort
+          Output: t1.c1, t2.c1
+          Sort Key: t1.c1, t2.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c1
+                Relations: (public.ft4 t1) LEFT JOIN (public.ft5 t2)
+                Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) LEFT JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+  c1 | c1 
+ ----+----
+  22 |   
+  24 | 24
+  26 |   
+  28 |   
+  30 | 30
+  32 |   
+  34 |   
+  36 | 36
+  38 |   
+  40 |   
+ (10 rows)
+ 
+ -- right outer join
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft5 t1 RIGHT JOIN ft4 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t1.c1 OFFSET 10 LIMIT 10;
+                                                                                               QUERY PLAN                                                                                               
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Sort
+          Output: t1.c1, t2.c1
+          Sort Key: t2.c1, t1.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c1
+                Relations: (public.ft4 t2) LEFT JOIN (public.ft5 t1)
+                Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) LEFT JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((r.a1 = l.a1))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft5 t1 RIGHT JOIN ft4 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t1.c1 OFFSET 10 LIMIT 10;
+  c1 | c1 
+ ----+----
+     | 22
+  24 | 24
+     | 26
+     | 28
+  30 | 30
+     | 32
+     | 34
+  36 | 36
+     | 38
+     | 40
+ (10 rows)
+ 
+ -- full outer join
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+                                                                                               QUERY PLAN                                                                                               
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Sort
+          Output: t1.c1, t2.c1
+          Sort Key: t1.c1, t2.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c1
+                Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+                Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) FULL JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+  c1  | c1 
+ -----+----
+   92 |   
+   94 |   
+   96 | 96
+   98 |   
+  100 |   
+      |  3
+      |  9
+      | 15
+      | 21
+      | 27
+ (10 rows)
+ 
+ -- full outer join + WHERE clause, only matched rows
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+                                                                                                                    QUERY PLAN                                                                                                                    
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Sort
+          Output: t1.c1, t2.c1
+          Sort Key: t1.c1, t2.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c1
+                Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+                Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") l) l (a1) FULL JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 4") r) r (a1) ON ((l.a1 = r.a1)) WHERE (((l.a1 = r.a1) OR (l.a1 IS NULL)))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+  c1 | c1 
+ ----+----
+  66 | 66
+  72 | 72
+  78 | 78
+  84 | 84
+  90 | 90
+  96 | 96
+     |  3
+     |  9
+     | 15
+     | 21
+ (10 rows)
+ 
+ -- join at WHERE clause 
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                                                                                                QUERY PLAN                                                                                                                
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1, t1.c3
+    ->  Sort
+          Output: t1.c1, t2.c1, t1.c3
+          Sort Key: t1.c3, t1.c1
+          ->  Foreign Scan
+                Output: t1.c1, t2.c1, t1.c3
+                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                Remote SQL: SELECT l.a1, l.a2, r.a1 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+ (9 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+  c1  | c1  
+ -----+-----
+  101 | 101
+  102 | 102
+  103 | 103
+  104 | 104
+  105 | 105
+  106 | 106
+  107 | 107
+  108 | 108
+  109 | 109
+  110 | 110
+ (10 rows)
+ 
+ -- join in CTE
+ EXPLAIN (COSTS false, VERBOSE)
+ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+                                                                                                              QUERY PLAN                                                                                                              
+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t.c1_1, t.c2_1, t.c1_3
+    CTE t
+      ->  Foreign Scan
+            Output: t1.c1, t1.c3, t2.c1
+            Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+            Remote SQL: SELECT l.a1, l.a2, r.a1 FROM (SELECT l.a10, l.a12 FROM (SELECT "C 1" a10, c3 a12 FROM "S 1"."T 1") l) l (a1, a2) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+    ->  Sort
+          Output: t.c1_1, t.c2_1, t.c1_3
+          Sort Key: t.c1_3, t.c1_1
+          ->  CTE Scan on t
+                Output: t.c1_1, t.c2_1, t.c1_3
+ (12 rows)
+ 
+ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+  c1_1 | c2_1 
+ ------+------
+   101 |  101
+   102 |  102
+   103 |  103
+   104 |  104
+   105 |  105
+   106 |  106
+   107 |  107
+   108 |  108
+   109 |  109
+   110 |  110
+ (10 rows)
+ 
+ -- ctid with whole-row reference
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                                                                                                                                                                                                                                    QUERY PLAN                                                                                                                                                                                                                                                    
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
+    ->  Sort
+          Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
+          Sort Key: t1.c3, t1.c1
+          ->  Foreign Scan
+                Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
+                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a10, l.a12 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17, ctid a7 FROM "S 1"."T 1") l) l (a1, a2, a3, a4) INNER JOIN (SELECT ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a9 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a3 = r.a2))
+ (9 rows)
+ 
+ SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+   ctid  |                                             t1                                             |                                             t2                                             | c1  
+ --------+--------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+-----
+  (1,4)  | (101,1,00101,"Fri Jan 02 00:00:00 1970 PST","Fri Jan 02 00:00:00 1970",1,"1         ",foo) | (101,1,00101,"Fri Jan 02 00:00:00 1970 PST","Fri Jan 02 00:00:00 1970",1,"1         ",foo) | 101
+  (1,5)  | (102,2,00102,"Sat Jan 03 00:00:00 1970 PST","Sat Jan 03 00:00:00 1970",2,"2         ",foo) | (102,2,00102,"Sat Jan 03 00:00:00 1970 PST","Sat Jan 03 00:00:00 1970",2,"2         ",foo) | 102
+  (1,6)  | (103,3,00103,"Sun Jan 04 00:00:00 1970 PST","Sun Jan 04 00:00:00 1970",3,"3         ",foo) | (103,3,00103,"Sun Jan 04 00:00:00 1970 PST","Sun Jan 04 00:00:00 1970",3,"3         ",foo) | 103
+  (1,7)  | (104,4,00104,"Mon Jan 05 00:00:00 1970 PST","Mon Jan 05 00:00:00 1970",4,"4         ",foo) | (104,4,00104,"Mon Jan 05 00:00:00 1970 PST","Mon Jan 05 00:00:00 1970",4,"4         ",foo) | 104
+  (1,8)  | (105,5,00105,"Tue Jan 06 00:00:00 1970 PST","Tue Jan 06 00:00:00 1970",5,"5         ",foo) | (105,5,00105,"Tue Jan 06 00:00:00 1970 PST","Tue Jan 06 00:00:00 1970",5,"5         ",foo) | 105
+  (1,9)  | (106,6,00106,"Wed Jan 07 00:00:00 1970 PST","Wed Jan 07 00:00:00 1970",6,"6         ",foo) | (106,6,00106,"Wed Jan 07 00:00:00 1970 PST","Wed Jan 07 00:00:00 1970",6,"6         ",foo) | 106
+  (1,10) | (107,7,00107,"Thu Jan 08 00:00:00 1970 PST","Thu Jan 08 00:00:00 1970",7,"7         ",foo) | (107,7,00107,"Thu Jan 08 00:00:00 1970 PST","Thu Jan 08 00:00:00 1970",7,"7         ",foo) | 107
+  (1,11) | (108,8,00108,"Fri Jan 09 00:00:00 1970 PST","Fri Jan 09 00:00:00 1970",8,"8         ",foo) | (108,8,00108,"Fri Jan 09 00:00:00 1970 PST","Fri Jan 09 00:00:00 1970",8,"8         ",foo) | 108
+  (1,12) | (109,9,00109,"Sat Jan 10 00:00:00 1970 PST","Sat Jan 10 00:00:00 1970",9,"9         ",foo) | (109,9,00109,"Sat Jan 10 00:00:00 1970 PST","Sat Jan 10 00:00:00 1970",9,"9         ",foo) | 109
+  (1,13) | (110,0,00110,"Sun Jan 11 00:00:00 1970 PST","Sun Jan 11 00:00:00 1970",0,"0         ",foo) | (110,0,00110,"Sun Jan 11 00:00:00 1970 PST","Sun Jan 11 00:00:00 1970",0,"0         ",foo) | 110
+ (10 rows)
+ 
+ -- partially unsafe to push down, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+                                                                                                             QUERY PLAN                                                                                                             
+ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1
+    ->  Nested Loop
+          Output: t1.c1
+          ->  Foreign Scan on public.ft1 t1
+                Output: t1.c1
+                Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+          ->  Materialize
+                ->  Foreign Scan
+                      Relations: (public.ft2 t2) INNER JOIN (public.ft4 t3)
+                      Remote SQL: SELECT NULL FROM (SELECT l.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1" WHERE (("C 1" = "C 1"))) l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT c1 a9 FROM "S 1"."T 3") r) r (a1) ON ((l.a1 = r.a1))
+ (11 rows)
+ 
+ SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+  c1 
+ ----
+   1
+   1
+   1
+   1
+   1
+   1
+   1
+   1
+   1
+   1
+ (10 rows)
+ 
+ -- SEMI JOIN, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                     QUERY PLAN                                    
+ ----------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1
+    ->  Merge Semi Join
+          Output: t1.c1
+          Merge Cond: (t1.c1 = t2.c1)
+          ->  Foreign Scan on public.ft1 t1
+                Output: t1.c1
+                Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+          ->  Materialize
+                Output: t2.c1
+                ->  Foreign Scan on public.ft2 t2
+                      Output: t2.c1
+                      Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+ (13 rows)
+ 
+ SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+  c1  
+ -----
+  101
+  102
+  103
+  104
+  105
+  106
+  107
+  108
+  109
+  110
+ (10 rows)
+ 
+ -- ANTI JOIN, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+                                  QUERY PLAN                                 
+ ----------------------------------------------------------------------------
+  Limit
+    Output: t1.c1
+    ->  Merge Anti Join
+          Output: t1.c1
+          Merge Cond: (t1.c1 = t2.c2)
+          ->  Foreign Scan on public.ft1 t1
+                Output: t1.c1
+                Remote SQL: SELECT "C 1" FROM "S 1"."T 1" ORDER BY "C 1" ASC
+          ->  Sort
+                Output: t2.c2
+                Sort Key: t2.c2
+                ->  Foreign Scan on public.ft2 t2
+                      Output: t2.c2
+                      Remote SQL: SELECT c2 FROM "S 1"."T 1"
+ (14 rows)
+ 
+ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+  c1  
+ -----
+  110
+  111
+  112
+  113
+  114
+  115
+  116
+  117
+  118
+  119
+ (10 rows)
+ 
+ -- CROSS JOIN, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+                              QUERY PLAN                              
+ ---------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Sort
+          Output: t1.c1, t2.c1
+          Sort Key: t1.c1, t2.c1
+          ->  Nested Loop
+                Output: t1.c1, t2.c1
+                ->  Foreign Scan on public.ft1 t1
+                      Output: t1.c1
+                      Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+                ->  Materialize
+                      Output: t2.c1
+                      ->  Foreign Scan on public.ft2 t2
+                            Output: t2.c1
+                            Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+ (15 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+  c1 | c1  
+ ----+-----
+   1 | 101
+   1 | 102
+   1 | 103
+   1 | 104
+   1 | 105
+   1 | 106
+   1 | 107
+   1 | 108
+   1 | 109
+   1 | 110
+ (10 rows)
+ 
+ -- different server
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+                                  QUERY PLAN                                 
+ ----------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Merge Join
+          Output: t1.c1, t2.c1
+          Merge Cond: (t1.c1 = t2.c1)
+          ->  Foreign Scan on public.ft5 t1
+                Output: t1.c1, t1.c2, t1.c3
+                Remote SQL: SELECT c1 FROM "S 1"."T 4" ORDER BY c1 ASC
+          ->  Materialize
+                Output: t2.c1, t2.c2, t2.c3
+                ->  Foreign Scan on public.ft6 t2
+                      Output: t2.c1, t2.c2, t2.c3
+                      Remote SQL: SELECT c1 FROM "S 1"."T 4" ORDER BY c1 ASC
+ (13 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+  c1 | c1 
+ ----+----
+ (0 rows)
+ 
+ -- different effective user for permission check
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+                                  QUERY PLAN                                 
+ ----------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, ft5.c1
+    ->  Merge Join
+          Output: t1.c1, ft5.c1
+          Merge Cond: (t1.c1 = ft5.c1)
+          ->  Foreign Scan on public.ft5 t1
+                Output: t1.c1, t1.c2, t1.c3
+                Remote SQL: SELECT c1 FROM "S 1"."T 4" ORDER BY c1 ASC
+          ->  Materialize
+                Output: ft5.c1, ft5.c2, ft5.c3
+                ->  Foreign Scan on public.ft5
+                      Output: ft5.c1, ft5.c2, ft5.c3
+                      Remote SQL: SELECT c1 FROM "S 1"."T 4" ORDER BY c1 ASC
+ (13 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+  c1 | c1 
+ ----+----
+ (0 rows)
+ 
+ -- unsafe join conditions
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                          QUERY PLAN                                         
+ --------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1, t1.c3
+    ->  Nested Loop
+          Output: t1.c1, t2.c1, t1.c3
+          Join Filter: (t1.c8 = t2.c8)
+          ->  Foreign Scan on public.ft1 t1
+                Output: t1.c1, t1.c3, t1.c8
+                Remote SQL: SELECT "C 1", c3, c8 FROM "S 1"."T 1" ORDER BY c3 ASC, "C 1" ASC
+          ->  Materialize
+                Output: t2.c1, t2.c8
+                ->  Foreign Scan on public.ft2 t2
+                      Output: t2.c1, t2.c8
+                      Remote SQL: SELECT "C 1", c8 FROM "S 1"."T 1"
+ (13 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+  c1 | c1  
+ ----+-----
+   1 | 101
+   1 | 102
+   1 | 103
+   1 | 104
+   1 | 105
+   1 | 106
+   1 | 107
+   1 | 108
+   1 | 109
+   1 | 110
+ (10 rows)
+ 
+ -- local filter (unsafe conditions on one side)
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+                                  QUERY PLAN                                  
+ -----------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1, t1.c3
+    ->  Sort
+          Output: t1.c1, t2.c1, t1.c3
+          Sort Key: t1.c3, t1.c1
+          ->  Hash Join
+                Output: t1.c1, t2.c1, t1.c3
+                Hash Cond: (t2.c1 = t1.c1)
+                ->  Foreign Scan on public.ft2 t2
+                      Output: t2.c1
+                      Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
+                ->  Hash
+                      Output: t1.c1, t1.c3
+                      ->  Foreign Scan on public.ft1 t1
+                            Output: t1.c1, t1.c3
+                            Filter: (t1.c8 = 'foo'::user_enum)
+                            Remote SQL: SELECT "C 1", c3, c8 FROM "S 1"."T 1"
+ (17 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+  c1  | c1  
+ -----+-----
+  101 | 101
+  102 | 102
+  103 | 103
+  104 | 104
+  105 | 105
+  106 | 106
+  107 | 107
+  108 | 108
+  109 | 109
+  110 | 110
+ (10 rows)
+ 
+ -- Aggregate after UNION, for testing setrefs
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+                                                                                                             QUERY PLAN                                                                                                            
+ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, (avg((t1.c1 + t2.c1)))
+    ->  Sort
+          Output: t1.c1, (avg((t1.c1 + t2.c1)))
+          Sort Key: t1.c1
+          ->  HashAggregate
+                Output: t1.c1, avg((t1.c1 + t2.c1))
+                Group Key: t1.c1
+                ->  HashAggregate
+                      Output: t1.c1, t2.c1
+                      Group Key: t1.c1, t2.c1
+                      ->  Append
+                            ->  Foreign Scan
+                                  Output: t1.c1, t2.c1
+                                  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                                  Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a10 FROM (SELECT "C 1" a10 FROM "S 1"."T 1") l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+                            ->  Foreign Scan
+                                  Output: t1_1.c1, t2_1.c1
+                                  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                                  Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a10 FROM (SELECT "C 1" a10 FROM "S 1"."T 1") l) l (a1) INNER JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+ (20 rows)
+ 
+ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+  t1c1 |         avg          
+ ------+----------------------
+   101 | 202.0000000000000000
+   102 | 204.0000000000000000
+   103 | 206.0000000000000000
+   104 | 208.0000000000000000
+   105 | 210.0000000000000000
+   106 | 212.0000000000000000
+   107 | 214.0000000000000000
+   108 | 216.0000000000000000
+   109 | 218.0000000000000000
+   110 | 220.0000000000000000
+ (10 rows)
+ 
+ -- join two foreign tables and two local tables
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+                                                                                                      QUERY PLAN                                                                                                      
+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  Limit
+    Output: t1.c1, t2.c1
+    ->  Sort
+          Output: t1.c1, t2.c1
+          Sort Key: t1.c1
+          ->  Hash Join
+                Output: t1.c1, t2.c1
+                Hash Cond: (t1.c1 = t3."C 1")
+                ->  Foreign Scan
+                      Output: t1.c1, t2.c1
+                      Relations: (public.ft1 t1) LEFT JOIN (public.ft2 t2)
+                      Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a10 FROM (SELECT "C 1" a10 FROM "S 1"."T 1") l) l (a1) LEFT JOIN (SELECT r.a9 FROM (SELECT "C 1" a9 FROM "S 1"."T 1") r) r (a1) ON ((l.a1 = r.a1))
+                ->  Hash
+                      Output: t3."C 1", t4.c1
+                      ->  Merge Join
+                            Output: t3."C 1", t4.c1
+                            Merge Cond: (t3."C 1" = t4.c1)
+                            ->  Index Only Scan using t1_pkey on "S 1"."T 1" t3
+                                  Output: t3."C 1"
+                            ->  Sort
+                                  Output: t4.c1
+                                  Sort Key: t4.c1
+                                  ->  Seq Scan on "S 1"."T 2" t4
+                                        Output: t4.c1
+ (24 rows)
+ 
+ SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+  c1 | c1 
+ ----+----
+  11 | 11
+  12 | 12
+  13 | 13
+  14 | 14
+  15 | 15
+  16 | 16
+  17 | 17
+  18 | 18
+  19 | 19
+  20 | 20
+ (10 rows)
+ 
+ -- ===================================================================
  -- parameterized queries
  -- ===================================================================
  -- simple join
***************
*** 1355,1376 **** UPDATE ft2 SET c2 = c2 + 400, c3 = c3 || '_update7' WHERE c1 % 10 = 7 RETURNING
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
!                                                                             QUERY PLAN                                                                             
! -------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
     Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c3 = $3, c7 = $4 WHERE ctid = $1
!    ->  Hash Join
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
!          Hash Cond: (ft2.c2 = ft1.c1)
!          ->  Foreign Scan on public.ft2
!                Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c8, ft2.ctid
!                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c8, ctid FROM "S 1"."T 1" FOR UPDATE
!          ->  Hash
!                Output: ft1.*, ft1.c1
!                ->  Foreign Scan on public.ft1
!                      Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))
! (13 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
--- 2027,2041 ----
  EXPLAIN (verbose, costs off)
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
!                                                                                                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                                                                                                       
! --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Update on public.ft2
     Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2, c3 = $3, c7 = $4 WHERE ctid = $1
!    ->  Foreign Scan
           Output: ft2.c1, (ft2.c2 + 500), NULL::integer, (ft2.c3 || '_update9'::text), ft2.c4, ft2.c5, ft2.c6, 'ft2       '::character(10), ft2.c8, ft2.ctid, ft1.*
!          Relations: (public.ft2) INNER JOIN (public.ft1)
!          Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, r.a1 FROM (SELECT l.a9, l.a10, l.a12, l.a13, l.a14, l.a15, l.a17, l.a7 FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c8 a17, ctid a7 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8) INNER JOIN (SELECT ROW(r.a10, r.a11, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 9))) r) r (a1, a2) ON ((l.a2 = r.a2))
! (6 rows)
  
  UPDATE ft2 SET c2 = ft2.c2 + 500, c3 = ft2.c3 || '_update9', c7 = DEFAULT
    FROM ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 9;
***************
*** 1496,1517 **** DELETE FROM ft2 WHERE c1 % 10 = 5 RETURNING c1, c4;
  
  EXPLAIN (verbose, costs off)
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
!                                                       QUERY PLAN                                                      
! ----------------------------------------------------------------------------------------------------------------------
   Delete on public.ft2
     Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1
!    ->  Hash Join
           Output: ft2.ctid, ft1.*
!          Hash Cond: (ft2.c2 = ft1.c1)
!          ->  Foreign Scan on public.ft2
!                Output: ft2.ctid, ft2.c2
!                Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" FOR UPDATE
!          ->  Hash
!                Output: ft1.*, ft1.c1
!                ->  Foreign Scan on public.ft1
!                      Output: ft1.*, ft1.c1
!                      Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))
! (13 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 2161,2175 ----
  
  EXPLAIN (verbose, costs off)
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
!                                                                                                                                                                                         QUERY PLAN                                                                                                                                                                                         
! -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Delete on public.ft2
     Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1
!    ->  Foreign Scan
           Output: ft2.ctid, ft1.*
!          Relations: (public.ft2) INNER JOIN (public.ft1)
!          Remote SQL: SELECT l.a1, r.a1 FROM (SELECT l.a7, l.a10 FROM (SELECT c2 a10, ctid a7 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT ROW(r.a10, r.a11, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" WHERE ((("C 1" % 10) = 2))) r) r (a1, a2) ON ((l.a2 = r.a2))
! (6 rows)
  
  DELETE FROM ft2 USING ft1 WHERE ft1.c1 = ft2.c2 AND ft1.c1 % 10 = 2;
  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
***************
*** 3786,3788 **** QUERY:  CREATE FOREIGN TABLE t5 (
--- 4444,4449 ----
  OPTIONS (schema_name 'import_source', table_name 't5');
  CONTEXT:  importing foreign table "t5"
  ROLLBACK;
+ -- Cleanup
+ DROP OWNED BY view_owner;
+ DROP USER view_owner;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 28,34 ****
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
- #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
--- 28,33 ----
***************
*** 68,74 **** enum FdwScanPrivateIndex
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs
  };
  
  /*
--- 67,79 ----
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs,
! 	/* Integer value of server for the scan */
! 	FdwScanPrivateServerOid,
! 	/* Integer value of user mapping for the scan */
! 	FdwScanPrivateUserMappingOid,
! 	/* Names of relation scanned, added when the scan is join */
! 	FdwScanPrivateRelations,
  };
  
  /*
***************
*** 98,104 **** enum FdwModifyPrivateIndex
   */
  typedef struct PgFdwScanState
  {
! 	Relation	rel;			/* relcache entry for the foreign table */
  	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
  
  	/* extracted fdw_private data */
--- 103,110 ----
   */
  typedef struct PgFdwScanState
  {
! 	const char *relname;		/* name of relation being scanned */
! 	TupleDesc	tupdesc;		/* tuple descriptor of the scan */
  	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
  
  	/* extracted fdw_private data */
***************
*** 164,169 **** typedef struct PgFdwAnalyzeState
--- 170,177 ----
  	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
  	List	   *retrieved_attrs;	/* attr numbers retrieved by query */
  
+ 	char	   *query;			/* text of SELECT command */
+ 
  	/* collected sample rows */
  	HeapTuple  *rows;			/* array of size targrows */
  	int			targrows;		/* target # of sample rows */
***************
*** 184,190 **** typedef struct PgFdwAnalyzeState
   */
  typedef struct ConversionLocation
  {
! 	Relation	rel;			/* foreign table's relcache entry */
  	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
  } ConversionLocation;
  
--- 192,201 ----
   */
  typedef struct ConversionLocation
  {
! 	const char *relname;		/* name of relation being processed, or NULL for
! 								   a foreign join */
! 	const char *query;			/* query being processed */
! 	TupleDesc	tupdesc;		/* tuple descriptor for attribute names */
  	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
  } ConversionLocation;
  
***************
*** 247,252 **** static TupleTableSlot *postgresExecForeignDelete(EState *estate,
--- 258,265 ----
  static void postgresEndForeignModify(EState *estate,
  						 ResultRelInfo *resultRelInfo);
  static int	postgresIsForeignRelUpdatable(Relation rel);
+ static bool postgresRecheckForeignScan(ForeignScanState *node,
+ 									   TupleTableSlot *slot);
  static void postgresExplainForeignScan(ForeignScanState *node,
  						   ExplainState *es);
  static void postgresExplainForeignModify(ModifyTableState *mtstate,
***************
*** 259,264 **** static bool postgresAnalyzeForeignTable(Relation relation,
--- 272,283 ----
  							BlockNumber *totalpages);
  static List *postgresImportForeignSchema(ImportForeignSchemaStmt *stmt,
  							Oid serverOid);
+ static void postgresGetForeignJoinPaths(PlannerInfo *root,
+ 										RelOptInfo *joinrel,
+ 										RelOptInfo *outerrel,
+ 										RelOptInfo *innerrel,
+ 										JoinType jointype,
+ 										JoinPathExtraData *extra);
  
  /*
   * Helper functions
***************
*** 295,306 **** static void analyze_row_processor(PGresult *res, int row,
  					  PgFdwAnalyzeState *astate);
  static HeapTuple make_tuple_from_result_row(PGresult *res,
  						   int row,
! 						   Relation rel,
  						   AttInMetadata *attinmeta,
  						   List *retrieved_attrs,
  						   MemoryContext temp_context);
  static void conversion_error_callback(void *arg);
  
  
  /*
   * Foreign-data wrapper handler function: return a struct with pointers
--- 314,354 ----
  					  PgFdwAnalyzeState *astate);
  static HeapTuple make_tuple_from_result_row(PGresult *res,
  						   int row,
! 						   const char *relname,
! 						   const char *query,
! 						   TupleDesc tupdesc,
  						   AttInMetadata *attinmeta,
  						   List *retrieved_attrs,
  						   MemoryContext temp_context);
  static void conversion_error_callback(void *arg);
+ static Path *get_unsorted_unparameterized_path(List *paths);
+ 
+ /*
+  * Describe Bitmapset as comma-separated integer list.
+  * For debug purpose.
+  * XXX Can this become a member of bitmapset.c?
+  */
+ static char *
+ bms_to_str(Bitmapset *bmp)
+ {
+ 	StringInfoData buf;
+ 	bool		first = true;
+ 	int			x;
  
+ 	initStringInfo(&buf);
+ 
+ 	x = -1;
+ 	while ((x = bms_next_member(bmp, x)) >= 0)
+ 	{
+ 		if (!first)
+ 			appendStringInfoString(&buf, ", ");
+ 		appendStringInfo(&buf, "%d", x);
+ 
+ 		first = false;
+ 	}
+ 
+ 	return buf.data;
+ }
  
  /*
   * Foreign-data wrapper handler function: return a struct with pointers
***************
*** 330,335 **** postgres_fdw_handler(PG_FUNCTION_ARGS)
--- 378,386 ----
  	routine->EndForeignModify = postgresEndForeignModify;
  	routine->IsForeignRelUpdatable = postgresIsForeignRelUpdatable;
  
+ 	/* Functions for SELECT FOR UPDATE/SHARE row locking */
+ 	routine->RecheckForeignScan = postgresRecheckForeignScan;
+ 
  	/* Support functions for EXPLAIN */
  	routine->ExplainForeignScan = postgresExplainForeignScan;
  	routine->ExplainForeignModify = postgresExplainForeignModify;
***************
*** 340,345 **** postgres_fdw_handler(PG_FUNCTION_ARGS)
--- 391,399 ----
  	/* Support functions for IMPORT FOREIGN SCHEMA */
  	routine->ImportForeignSchema = postgresImportForeignSchema;
  
+ 	/* Support functions for join push-down */
+ 	routine->GetForeignJoinPaths = postgresGetForeignJoinPaths;
+ 
  	PG_RETURN_POINTER(routine);
  }
  
***************
*** 365,373 **** postgresGetForeignRelSize(PlannerInfo *root,
--- 419,431 ----
  	fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
  	baserel->fdw_private = (void *) fpinfo;
  
+ 	/* This scan can be pushed down to the remote. */
+ 	fpinfo->pushdown_safe = true;
+ 
  	/* Look up foreign-table catalog info. */
  	fpinfo->table = GetForeignTable(foreigntableid);
  	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+ 	fpinfo->umid = baserel->umid;
  
  	/*
  	 * Extract user-settable option values.  Note that per-table setting of
***************
*** 404,425 **** postgresGetForeignRelSize(PlannerInfo *root,
  	}
  
  	/*
- 	 * If the table or the server is configured to use remote estimates,
- 	 * identify which user to do remote access as during planning.  This
- 	 * should match what ExecCheckRTEPerms() does.  If we fail due to lack of
- 	 * permissions, the query would have failed at runtime anyway.
- 	 */
- 	if (fpinfo->use_remote_estimate)
- 	{
- 		RangeTblEntry *rte = planner_rt_fetch(baserel->relid, root);
- 		Oid			userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
- 
- 		fpinfo->user = GetUserMapping(userid, fpinfo->server->serverid);
- 	}
- 	else
- 		fpinfo->user = NULL;
- 
- 	/*
  	 * Identify which baserestrictinfo clauses can be sent to the remote
  	 * server and which can't.
  	 */
--- 462,467 ----
***************
*** 788,793 **** postgresGetForeignPlan(PlannerInfo *root,
--- 830,837 ----
  	List	   *retrieved_attrs;
  	StringInfoData sql;
  	ListCell   *lc;
+ 	List	   *fdw_scan_tlist = NIL;
+ 	StringInfoData relations;
  
  	/*
  	 * Separate the scan_clauses into those that can be executed remotely and
***************
*** 834,910 **** postgresGetForeignPlan(PlannerInfo *root,
  			local_exprs = lappend(local_exprs, rinfo->clause);
  	}
  
  	/*
  	 * Build the query string to be sent for execution, and identify
  	 * expressions to be sent as parameters.
  	 */
  	initStringInfo(&sql);
! 	deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
! 					 &retrieved_attrs);
! 	if (remote_conds)
! 		appendWhereClause(&sql, root, baserel, remote_conds,
! 						  true, &params_list);
! 
! 	/* Add ORDER BY clause if we found any useful pathkeys */
! 	if (best_path->path.pathkeys)
! 		appendOrderByClause(&sql, root, baserel, best_path->path.pathkeys);
! 
! 	/*
! 	 * Add FOR UPDATE/SHARE if appropriate.  We apply locking during the
! 	 * initial row fetch, rather than later on as is done for local tables.
! 	 * The extra roundtrips involved in trying to duplicate the local
! 	 * semantics exactly don't seem worthwhile (see also comments for
! 	 * RowMarkType).
! 	 *
! 	 * Note: because we actually run the query as a cursor, this assumes that
! 	 * DECLARE CURSOR ... FOR UPDATE is supported, which it isn't before 8.3.
! 	 */
! 	if (baserel->relid == root->parse->resultRelation &&
! 		(root->parse->commandType == CMD_UPDATE ||
! 		 root->parse->commandType == CMD_DELETE))
! 	{
! 		/* Relation is UPDATE/DELETE target, so use FOR UPDATE */
! 		appendStringInfoString(&sql, " FOR UPDATE");
! 	}
! 	else
! 	{
! 		PlanRowMark *rc = get_plan_rowmark(root->rowMarks, baserel->relid);
! 
! 		if (rc)
! 		{
! 			/*
! 			 * Relation is specified as a FOR UPDATE/SHARE target, so handle
! 			 * that.  (But we could also see LCS_NONE, meaning this isn't a
! 			 * target relation after all.)
! 			 *
! 			 * For now, just ignore any [NO] KEY specification, since (a) it's
! 			 * not clear what that means for a remote table that we don't have
! 			 * complete information about, and (b) it wouldn't work anyway on
! 			 * older remote servers.  Likewise, we don't worry about NOWAIT.
! 			 */
! 			switch (rc->strength)
! 			{
! 				case LCS_NONE:
! 					/* No locking needed */
! 					break;
! 				case LCS_FORKEYSHARE:
! 				case LCS_FORSHARE:
! 					appendStringInfoString(&sql, " FOR SHARE");
! 					break;
! 				case LCS_FORNOKEYUPDATE:
! 				case LCS_FORUPDATE:
! 					appendStringInfoString(&sql, " FOR UPDATE");
! 					break;
! 			}
! 		}
! 	}
  
  	/*
  	 * Build the fdw_private list that will be available to the executor.
  	 * Items in the list must match enum FdwScanPrivateIndex, above.
  	 */
! 	fdw_private = list_make2(makeString(sql.data),
! 							 retrieved_attrs);
  
  	/*
  	 * Create the ForeignScan node from target list, filtering expressions,
--- 878,921 ----
  			local_exprs = lappend(local_exprs, rinfo->clause);
  	}
  
+ 	if (scan_relid == 0)
+ 	{
+ 		Assert(scan_clauses == NIL);
+ 		Assert(remote_conds == NIL);
+ 		Assert(remote_exprs == NIL);
+ 	}
+ 
  	/*
  	 * Build the query string to be sent for execution, and identify
  	 * expressions to be sent as parameters.
  	 */
  	initStringInfo(&sql);
! 	if (baserel->reloptkind == RELOPT_JOINREL)
! 		initStringInfo(&relations);
! 	deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, remote_conds,
! 					 best_path->path.pathkeys,
! 					 &params_list, &fdw_scan_tlist, &retrieved_attrs,
! 					 baserel->reloptkind == RELOPT_JOINREL ? &relations : NULL,
! 					 false);
  
  	/*
  	 * Build the fdw_private list that will be available to the executor.
  	 * Items in the list must match enum FdwScanPrivateIndex, above.
  	 */
! 	fdw_private = list_make4(makeString(sql.data),
! 							 retrieved_attrs,
! 							 makeInteger(fpinfo->server->serverid),
! 							 makeInteger(fpinfo->umid));
! 	if (baserel->reloptkind == RELOPT_JOINREL)
! 		fdw_private = lappend(fdw_private, makeString(relations.data));
! 
! 	/* Adjust outer_plan */
! 	if (scan_relid == 0)
! 	{
! 		Assert(outer_plan != NULL);
! 		outer_plan->targetlist = fdw_scan_tlist;
! 		outer_plan->qual = fpinfo->otherclauses;
! 	}
  
  	/*
  	 * Create the ForeignScan node from target list, filtering expressions,
***************
*** 919,925 **** postgresGetForeignPlan(PlannerInfo *root,
  							scan_relid,
  							params_list,
  							fdw_private,
! 							NIL,	/* no custom tlist */
  							remote_exprs,
  							outer_plan);
  }
--- 930,936 ----
  							scan_relid,
  							params_list,
  							fdw_private,
! 							fdw_scan_tlist,
  							remote_exprs,
  							outer_plan);
  }
***************
*** 934,942 **** postgresBeginForeignScan(ForeignScanState *node, int eflags)
  	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
  	EState	   *estate = node->ss.ps.state;
  	PgFdwScanState *fsstate;
! 	RangeTblEntry *rte;
! 	Oid			userid;
! 	ForeignTable *table;
  	ForeignServer *server;
  	UserMapping *user;
  	int			numParams;
--- 945,952 ----
  	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
  	EState	   *estate = node->ss.ps.state;
  	PgFdwScanState *fsstate;
! 	Oid			serverid;
! 	Oid			umid;
  	ForeignServer *server;
  	UserMapping *user;
  	int			numParams;
***************
*** 956,977 **** postgresBeginForeignScan(ForeignScanState *node, int eflags)
  	node->fdw_state = (void *) fsstate;
  
  	/*
- 	 * Identify which user to do the remote access as.  This should match what
- 	 * ExecCheckRTEPerms() does.
- 	 */
- 	rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
- 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
- 
- 	/* Get info about foreign table. */
- 	fsstate->rel = node->ss.ss_currentRelation;
- 	table = GetForeignTable(RelationGetRelid(fsstate->rel));
- 	server = GetForeignServer(table->serverid);
- 	user = GetUserMapping(userid, server->serverid);
- 
- 	/*
  	 * Get connection to the foreign server.  Connection manager will
  	 * establish new connection if necessary.
  	 */
  	fsstate->conn = GetConnection(server, user, false);
  
  	/* Assign a unique ID for my cursor */
--- 966,978 ----
  	node->fdw_state = (void *) fsstate;
  
  	/*
  	 * Get connection to the foreign server.  Connection manager will
  	 * establish new connection if necessary.
  	 */
+ 	serverid = intVal(list_nth(fsplan->fdw_private, FdwScanPrivateServerOid));
+ 	umid = intVal(list_nth(fsplan->fdw_private, FdwScanPrivateUserMappingOid));
+ 	server = GetForeignServer(serverid);
+ 	user = GetUserMappingById(umid);
  	fsstate->conn = GetConnection(server, user, false);
  
  	/* Assign a unique ID for my cursor */
***************
*** 996,1003 **** postgresBeginForeignScan(ForeignScanState *node, int eflags)
  											  ALLOCSET_SMALL_INITSIZE,
  											  ALLOCSET_SMALL_MAXSIZE);
  
! 	/* Get info we'll need for input data conversion. */
! 	fsstate->attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(fsstate->rel));
  
  	/* Prepare for output conversion of parameters used in remote query. */
  	numParams = list_length(fsplan->fdw_exprs);
--- 997,1014 ----
  											  ALLOCSET_SMALL_INITSIZE,
  											  ALLOCSET_SMALL_MAXSIZE);
  
! 	/* Get info we'll need for input data conversion and error report. */
! 	if (fsplan->scan.scanrelid > 0)
! 	{
! 		fsstate->relname = RelationGetRelationName(node->ss.ss_currentRelation);
! 		fsstate->tupdesc = RelationGetDescr(node->ss.ss_currentRelation);
! 	}
! 	else
! 	{
! 		fsstate->relname = NULL;
! 		fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
! 	}
! 	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
  
  	/* Prepare for output conversion of parameters used in remote query. */
  	numParams = list_length(fsplan->fdw_exprs);
***************
*** 1718,1723 **** postgresIsForeignRelUpdatable(Relation rel)
--- 1729,1761 ----
  }
  
  /*
+  * postgresRecheckForeignScan
+  *		Execute a local join execution plan for a foreign join
+  */
+ static bool
+ postgresRecheckForeignScan(ForeignScanState *node, TupleTableSlot *slot)
+ {
+ 	Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+ 	PlanState  *outerPlan = outerPlanState(node);
+ 	TupleTableSlot *result;
+ 
+ 	if (scanrelid > 0)
+ 		return true;
+ 
+ 	Assert(outerPlan != NULL);
+ 
+ 	/* Execute a local join execution plan */
+ 	result = ExecProcNode(outerPlan);
+ 	if (TupIsNull(result))
+ 		return false;
+ 
+ 	/* Store result in the given slot */
+ 	ExecCopySlot(slot, result);
+ 
+ 	return true;
+ }
+ 
+ /*
   * postgresExplainForeignScan
   *		Produce extra output for EXPLAIN of a ForeignScan on a foreign table
   */
***************
*** 1726,1735 **** postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
  {
  	List	   *fdw_private;
  	char	   *sql;
  
  	if (es->verbose)
  	{
- 		fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
  		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
  		ExplainPropertyText("Remote SQL", sql, es);
  	}
--- 1764,1788 ----
  {
  	List	   *fdw_private;
  	char	   *sql;
+ 	char	   *relations;
+ 
+ 	fdw_private = ((ForeignScan *) node->ss.ps.plan)->fdw_private;
+ 
+ 	/*
+ 	 * Add names of relation handled by the foreign scan when the scan is a
+ 	 * join
+ 	 */
+ 	if (list_length(fdw_private) > FdwScanPrivateRelations)
+ 	{
+ 		relations = strVal(list_nth(fdw_private, FdwScanPrivateRelations));
+ 		ExplainPropertyText("Relations", relations, es);
+ 	}
  
+ 	/*
+ 	 * Add remote query, when VERBOSE option is specified.
+ 	 */
  	if (es->verbose)
  	{
  		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
  		ExplainPropertyText("Remote SQL", sql, es);
  	}
***************
*** 1789,1798 **** estimate_path_cost_size(PlannerInfo *root,
--- 1842,1853 ----
  	 */
  	if (fpinfo->use_remote_estimate)
  	{
+ 		List	   *remote_conds;
  		List	   *remote_join_conds;
  		List	   *local_join_conds;
  		StringInfoData sql;
  		List	   *retrieved_attrs;
+ 		UserMapping *user;
  		PGconn	   *conn;
  		Selectivity local_sel;
  		QualCost	local_cost;
***************
*** 1804,1830 **** estimate_path_cost_size(PlannerInfo *root,
  		classifyConditions(root, baserel, join_conds,
  						   &remote_join_conds, &local_join_conds);
  
  		/*
  		 * Construct EXPLAIN query including the desired SELECT, FROM, and
  		 * WHERE clauses.  Params and other-relation Vars are replaced by
  		 * dummy values.
  		 */
  		initStringInfo(&sql);
  		appendStringInfoString(&sql, "EXPLAIN ");
! 		deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used,
! 						 &retrieved_attrs);
! 		if (fpinfo->remote_conds)
! 			appendWhereClause(&sql, root, baserel, fpinfo->remote_conds,
! 							  true, NULL);
! 		if (remote_join_conds)
! 			appendWhereClause(&sql, root, baserel, remote_join_conds,
! 							  (fpinfo->remote_conds == NIL), NULL);
! 
! 		if (pathkeys)
! 			appendOrderByClause(&sql, root, baserel, pathkeys);
  
  		/* Get the remote estimate */
! 		conn = GetConnection(fpinfo->server, fpinfo->user, false);
  		get_remote_estimate(sql.data, conn, &rows, &width,
  							&startup_cost, &total_cost);
  		ReleaseConnection(conn);
--- 1859,1882 ----
  		classifyConditions(root, baserel, join_conds,
  						   &remote_join_conds, &local_join_conds);
  
+ 		remote_conds = copyObject(fpinfo->remote_conds);
+ 		remote_conds = list_concat(remote_conds, remote_join_conds);
+ 
  		/*
  		 * Construct EXPLAIN query including the desired SELECT, FROM, and
  		 * WHERE clauses.  Params and other-relation Vars are replaced by
  		 * dummy values.
+ 		 * Here we waste params_list and fdw_scan_tlist because they are
+ 		 * unnecessary for EXPLAIN.
  		 */
  		initStringInfo(&sql);
  		appendStringInfoString(&sql, "EXPLAIN ");
! 		deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, remote_conds,
! 						 pathkeys, NULL, NULL, &retrieved_attrs, NULL, false);
  
  		/* Get the remote estimate */
! 		user = GetUserMappingById(fpinfo->umid);
! 		conn = GetConnection(fpinfo->server, user, false);
  		get_remote_estimate(sql.data, conn, &rows, &width,
  							&startup_cost, &total_cost);
  		ReleaseConnection(conn);
***************
*** 2136,2142 **** fetch_more_data(ForeignScanState *node)
  		{
  			fsstate->tuples[i] =
  				make_tuple_from_result_row(res, i,
! 										   fsstate->rel,
  										   fsstate->attinmeta,
  										   fsstate->retrieved_attrs,
  										   fsstate->temp_cxt);
--- 2188,2196 ----
  		{
  			fsstate->tuples[i] =
  				make_tuple_from_result_row(res, i,
! 										   fsstate->relname,
! 										   fsstate->query,
! 										   fsstate->tupdesc,
  										   fsstate->attinmeta,
  										   fsstate->retrieved_attrs,
  										   fsstate->temp_cxt);
***************
*** 2354,2360 **** store_returning_result(PgFdwModifyState *fmstate,
  		HeapTuple	newtup;
  
  		newtup = make_tuple_from_result_row(res, 0,
! 											fmstate->rel,
  											fmstate->attinmeta,
  											fmstate->retrieved_attrs,
  											fmstate->temp_cxt);
--- 2408,2416 ----
  		HeapTuple	newtup;
  
  		newtup = make_tuple_from_result_row(res, 0,
! 										RelationGetRelationName(fmstate->rel),
! 											fmstate->query,
! 											RelationGetDescr(fmstate->rel),
  											fmstate->attinmeta,
  											fmstate->retrieved_attrs,
  											fmstate->temp_cxt);
***************
*** 2504,2509 **** postgresAcquireSampleRowsFunc(Relation relation, int elevel,
--- 2560,2566 ----
  	initStringInfo(&sql);
  	appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
  	deparseAnalyzeSql(&sql, relation, &astate.retrieved_attrs);
+ 	astate.query = sql.data;
  
  	/* In what follows, do not risk leaking any PGresults. */
  	PG_TRY();
***************
*** 2645,2651 **** analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
  		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
  
  		astate->rows[pos] = make_tuple_from_result_row(res, row,
! 													   astate->rel,
  													   astate->attinmeta,
  													 astate->retrieved_attrs,
  													   astate->temp_cxt);
--- 2702,2710 ----
  		oldcontext = MemoryContextSwitchTo(astate->anl_cxt);
  
  		astate->rows[pos] = make_tuple_from_result_row(res, row,
! 										   RelationGetRelationName(astate->rel),
! 													   astate->query,
! 											   RelationGetDescr(astate->rel),
  													   astate->attinmeta,
  													 astate->retrieved_attrs,
  													   astate->temp_cxt);
***************
*** 2919,2924 **** postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
--- 2978,3308 ----
  }
  
  /*
+  * Construct PgFdwRelationInfo from two join sources
+  */
+ static void
+ merge_fpinfo(RelOptInfo *outerrel,
+ 			 RelOptInfo *innerrel,
+ 			 PgFdwRelationInfo *fpinfo,
+ 			 JoinType jointype,
+ 			 double rows,
+ 			 int width)
+ {
+ 	PgFdwRelationInfo *fpinfo_o;
+ 	PgFdwRelationInfo *fpinfo_i;
+ 
+ 	fpinfo_o = (PgFdwRelationInfo *) outerrel->fdw_private;
+ 	fpinfo_i = (PgFdwRelationInfo *) innerrel->fdw_private;
+ 
+ 	/* Mark that this join can be pushed down safely */
+ 	fpinfo->pushdown_safe = true;
+ 
+ 	/* Join relation must have conditions come from sources */
+ 	fpinfo->remote_conds = list_concat(copyObject(fpinfo_o->remote_conds),
+ 									   copyObject(fpinfo_i->remote_conds));
+ 	fpinfo->local_conds = list_concat(copyObject(fpinfo_o->local_conds),
+ 									  copyObject(fpinfo_i->local_conds));
+ 
+ 	/* Only for simple foreign table scan */
+ 	fpinfo->attrs_used = NULL;
+ 
+ 	/* rows and width will be set later */
+ 	fpinfo->rows = rows;
+ 	fpinfo->width = width;
+ 
+ 	/* A join have local conditions for outer and inner, so sum up them. */
+ 	fpinfo->local_conds_cost.startup = fpinfo_o->local_conds_cost.startup +
+ 									   fpinfo_i->local_conds_cost.startup;
+ 	fpinfo->local_conds_cost.per_tuple = fpinfo_o->local_conds_cost.per_tuple +
+ 										 fpinfo_i->local_conds_cost.per_tuple;
+ 
+ 	/* Don't consider correlation between local filters. */
+ 	fpinfo->local_conds_sel = fpinfo_o->local_conds_sel *
+ 							  fpinfo_i->local_conds_sel;
+ 
+ 	fpinfo->use_remote_estimate = false;
+ 
+ 	/*
+ 	 * These two comes default or per-server setting, so outer and inner must
+ 	 * have same value.
+ 	 */
+ 	fpinfo->fdw_startup_cost = fpinfo_o->fdw_startup_cost;
+ 	fpinfo->fdw_tuple_cost = fpinfo_o->fdw_tuple_cost;
+ 
+ 	/*
+ 	 * TODO estimate more accurately
+ 	 */
+ 	fpinfo->startup_cost = fpinfo->fdw_startup_cost +
+ 						   fpinfo->local_conds_cost.startup;
+ 	fpinfo->total_cost = fpinfo->startup_cost +
+ 						 (fpinfo->fdw_tuple_cost +
+ 						  fpinfo->local_conds_cost.per_tuple +
+ 						  cpu_tuple_cost) * fpinfo->rows;
+ 
+ 	/* serverid and userid are respectively identical */
+ 	fpinfo->server = fpinfo_o->server;
+ 	fpinfo->umid = fpinfo_o->umid;
+ 
+ 	fpinfo->outerrel = outerrel;
+ 	fpinfo->innerrel = innerrel;
+ 	fpinfo->jointype = jointype;
+ 
+ 	/* This join can be pushed down safely */
+ 	fpinfo->pushdown_safe = true;
+ 
+ 	/* joinclauses and otherclauses will be set later */
+ }
+ 
+ /*
+  * Get a copy of unsorted, unparameterized path
+  */
+ static Path *
+ get_unsorted_unparameterized_path(List *paths)
+ {
+ 	ListCell   *l;
+ 
+ 	foreach(l, paths)
+ 	{
+ 		Path	   *path = (Path *) lfirst(l);
+ 
+ 		if (path->pathkeys == NIL && path->param_info == NULL)
+ 		{
+ 			switch (path->pathtype)
+ 			{
+ 				case T_MergeJoin:
+ 					{
+ 						MergePath  *retval = makeNode(MergePath);
+ 						*retval = *((MergePath *) path);
+ 						return (Path *) retval;
+ 					}
+ 				case T_HashJoin:
+ 					{
+ 						HashPath   *retval = makeNode(HashPath);
+ 						*retval = *((HashPath *) path);
+ 						return (Path *) retval;
+ 					}
+ 				case T_NestLoop:
+ 					{
+ 						NestPath   *retval = makeNode(NestPath);
+ 						*retval = *((NestPath *) path);
+ 						return (Path *) retval;
+ 					}
+ 				default:
+ 					elog(ERROR, "unrecognized node type: %d",
+ 						 (int) path->pathtype);
+ 					return NULL;
+ 			}
+ 		}
+ 	}
+ 	return NULL;
+ }
+ 
+ /*
+  * postgresGetForeignJoinPaths
+  *		Add possible ForeignPath to joinrel.
+  *
+  * Joins satisfy conditions below can be pushed down to the remote PostgreSQL
+  * server.
+  *
+  * 1) Join type is INNER or OUTER (one of LEFT/RIGHT/FULL)
+  * 2) Both outer and inner portions are safe to push-down
+  * 3) All foreign tables in the join belong to the same foreign server
+  * 4) All join conditions are safe to push down
+  * 5) No relation has local filter (this can be relaxed for INNER JOIN with
+  * no volatile function/operator, but as of now we want safer way)
+  */
+ static void
+ postgresGetForeignJoinPaths(PlannerInfo *root,
+ 							RelOptInfo *joinrel,
+ 							RelOptInfo *outerrel,
+ 							RelOptInfo *innerrel,
+ 							JoinType jointype,
+ 							JoinPathExtraData *extra)
+ {
+ 	PgFdwRelationInfo *fpinfo;
+ 	PgFdwRelationInfo *fpinfo_o;
+ 	PgFdwRelationInfo *fpinfo_i;
+ 	ForeignPath	   *joinpath;
+ 	double			rows;
+ 	Cost			startup_cost;
+ 	Cost			total_cost;
+ 	Path		   *subpath;
+ 
+ 	ListCell	   *lc;
+ 	List		   *joinclauses;
+ 	List		   *otherclauses;
+ 
+ 	/*
+ 	 * Skip if this join combination has been considered already.
+ 	 */
+ 	if (joinrel->fdw_private)
+ 	{
+ 		ereport(DEBUG3, (errmsg("combination already considered")));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Create unfinished PgFdwRelationInfo entry which is used to indicate that
+ 	 * the join relaiton is already considered but the join can't be pushed
+ 	 * down.  Once we know that this join can be pushed down, we fill the entry
+ 	 * and make it valid by calling merge_fpinfo.
+ 	 *
+ 	 * This unfinished entry prevents redandunt checks for a join combination
+ 	 * which is already known as unsafe to push down.
+ 	 */
+ 	fpinfo = (PgFdwRelationInfo *) palloc0(sizeof(PgFdwRelationInfo));
+ 	fpinfo->pushdown_safe = false;
+ 	joinrel->fdw_private = fpinfo;
+ 
+ 	/*
+ 	 * We support all outer joins in addition to inner join.  CROSS JOIN is
+ 	 * an INNER JOIN with no conditions internally, so will be checked later.
+ 	 */
+ 	if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
+ 		jointype != JOIN_RIGHT && jointype != JOIN_FULL)
+ 	{
+ 		ereport(DEBUG3, (errmsg("unsupported join type (SEMI, ANTI)")));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Having valid PgFdwRelationInfo marked as "safe to push down" in
+ 	 * RelOptInfo#fdw_private indicates that scanning against the relation can
+ 	 * be pushed down.  If either of them doesn't have PgFdwRelationInfo or it
+ 	 * is not marked as safe, give up to push down this join relation.
+ 	 */
+ 	fpinfo_o = (PgFdwRelationInfo *) outerrel->fdw_private;
+ 	if (!fpinfo_o || !fpinfo_o->pushdown_safe)
+ 	{
+ 		ereport(DEBUG3, (errmsg("outer is not safe to push-down")));
+ 		return;
+ 	}
+ 	fpinfo_i = (PgFdwRelationInfo *) innerrel->fdw_private;
+ 	if (!fpinfo_i || !fpinfo_i->pushdown_safe)
+ 	{
+ 		ereport(DEBUG3, (errmsg("inner is not safe to push-down")));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * All relations in the join must belong to same server.  Having a valid
+ 	 * fdw_private means that all relations in the relations belong to the
+ 	 * server the fdw_private has, so what we should do is just compare
+ 	 * serverid of outer/inner relations.
+ 	 */
+ 	if (fpinfo_o->server->serverid != fpinfo_i->server->serverid)
+ 	{
+ 		ereport(DEBUG3, (errmsg("server unmatch")));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * No source relation can have local conditions.  This can be relaxed
+ 	 * if the join is an inner join and local conditions don't contain
+ 	 * volatile function/operator, but as of now we leave it as future
+ 	 * enhancement.
+ 	 */
+ 	if (fpinfo_o->local_conds != NULL || fpinfo_i->local_conds != NULL)
+ 	{
+ 		ereport(DEBUG3, (errmsg("join with local filter")));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Separate restrictlist into two lists, join conditions and remote filters.
+ 	 */
+ 	joinclauses = extra->restrictlist;
+ 	if (IS_OUTER_JOIN(jointype))
+ 	{
+ 		extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
+ 	}
+ 	else
+ 	{
+ 		joinclauses = extract_actual_clauses(joinclauses, false);
+ 		otherclauses = NIL;
+ 	}
+ 
+ 	/*
+ 	 * Note that CROSS JOIN (cartesian product) is transformed to JOIN_INNER
+ 	 * with empty joinclauses.  Pushing down CROSS JOIN usually produces more
+ 	 * result than retrieving each tables separately, so we don't push down
+ 	 * such joins.
+ 	 */
+ 	if (jointype == JOIN_INNER && joinclauses == NIL)
+ 	{
+ 		ereport(DEBUG3, (errmsg("unsupported join type (CROSS)")));
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Join condition must be safe to push down.
+ 	 */
+ 	foreach(lc, joinclauses)
+ 	{
+ 		Expr *expr = (Expr *) lfirst(lc);
+ 
+ 		if (!is_foreign_expr(root, joinrel, expr))
+ 		{
+ 			ereport(DEBUG3, (errmsg("join quals contains unsafe conditions")));
+ 			return;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Other condition for the join must be safe to push down.
+ 	 */
+ 	foreach(lc, otherclauses)
+ 	{
+ 		Expr *expr = (Expr *) lfirst(lc);
+ 
+ 		if (!is_foreign_expr(root, joinrel, expr))
+ 		{
+ 			ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
+ 			return;
+ 		}
+ 	}
+ 
+ 	/* Here we know that this join can be pushed-down to remote side. */
+ 
+ 	/* Construct fpinfo for the join relation */
+ 	merge_fpinfo(outerrel, innerrel, fpinfo, jointype, joinrel->rows,
+ 				 joinrel->width); 
+ 	fpinfo->joinclauses = joinclauses;
+ 	fpinfo->otherclauses = otherclauses;
+ 
+ 	/* TODO determine more accurate cost and rows of the join. */
+ 	rows = joinrel->rows;
+ 	startup_cost = fpinfo->startup_cost;
+ 	total_cost = fpinfo->total_cost;
+ 
+ 	/* Get an alternative path for this foreign join */
+ 	subpath = get_unsorted_unparameterized_path(joinrel->pathlist);
+ 	if (subpath == NULL)
+ 		elog(ERROR, "could not get any alternative path for a foreign join");
+ 
+ 	/*
+ 	 * Create a new join path and add it to the joinrel which represents a join
+ 	 * between foreign tables.
+ 	 */
+ 	joinpath = create_foreignscan_path(root,
+ 									   joinrel,
+ 									   rows,
+ 									   startup_cost,
+ 									   total_cost,
+ 									   NIL,		/* no pathkeys */
+ 									   NULL,	/* no required_outer */
+ 									   subpath,
+ 									   NIL);	/* no fdw_private */
+ 
+ 	/* Add generated path into joinrel by add_path(). */
+ 	add_path(joinrel, (Path *) joinpath);
+ 	elog(DEBUG3, "join path added for (%s) join (%s)",
+ 		 bms_to_str(outerrel->relids), bms_to_str(innerrel->relids));
+ 
+ 	/* TODO consider parameterized paths */
+ }
+ 
+ /*
   * Create a tuple from the specified row of the PGresult.
   *
   * rel is the local representation of the foreign table, attinmeta is
***************
*** 2929,2941 **** postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
  static HeapTuple
  make_tuple_from_result_row(PGresult *res,
  						   int row,
! 						   Relation rel,
  						   AttInMetadata *attinmeta,
  						   List *retrieved_attrs,
  						   MemoryContext temp_context)
  {
  	HeapTuple	tuple;
- 	TupleDesc	tupdesc = RelationGetDescr(rel);
  	Datum	   *values;
  	bool	   *nulls;
  	ItemPointer ctid = NULL;
--- 3313,3326 ----
  static HeapTuple
  make_tuple_from_result_row(PGresult *res,
  						   int row,
! 						   const char *relname,
! 						   const char *query,
! 						   TupleDesc tupdesc,
  						   AttInMetadata *attinmeta,
  						   List *retrieved_attrs,
  						   MemoryContext temp_context)
  {
  	HeapTuple	tuple;
  	Datum	   *values;
  	bool	   *nulls;
  	ItemPointer ctid = NULL;
***************
*** 2962,2968 **** make_tuple_from_result_row(PGresult *res,
  	/*
  	 * Set up and install callback to report where conversion error occurs.
  	 */
! 	errpos.rel = rel;
  	errpos.cur_attno = 0;
  	errcallback.callback = conversion_error_callback;
  	errcallback.arg = (void *) &errpos;
--- 3347,3355 ----
  	/*
  	 * Set up and install callback to report where conversion error occurs.
  	 */
! 	errpos.relname = relname;
! 	errpos.query = query;
! 	errpos.tupdesc = tupdesc;
  	errpos.cur_attno = 0;
  	errcallback.callback = conversion_error_callback;
  	errcallback.arg = (void *) &errpos;
***************
*** 3052,3064 **** make_tuple_from_result_row(PGresult *res,
  static void
  conversion_error_callback(void *arg)
  {
  	ConversionLocation *errpos = (ConversionLocation *) arg;
! 	TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
  
  	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
! 		errcontext("column \"%s\" of foreign table \"%s\"",
! 				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
! 				   RelationGetRelationName(errpos->rel));
  }
  
  /*
--- 3439,3479 ----
  static void
  conversion_error_callback(void *arg)
  {
+ 	const char *attname;
+ 	const char *relname;
  	ConversionLocation *errpos = (ConversionLocation *) arg;
! 	TupleDesc	tupdesc = errpos->tupdesc;
! 	StringInfoData buf;
! 
! 	if (errpos->relname)
! 	{
! 		/* error occurred in a scan against a foreign table */ 
! 		initStringInfo(&buf);
! 		if (errpos->cur_attno > 0)
! 			appendStringInfo(&buf, "column \"%s\"",
! 					 NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname));
! 		else if (errpos->cur_attno == SelfItemPointerAttributeNumber)
! 			appendStringInfoString(&buf, "column \"ctid\"");
! 		attname = buf.data;
! 
! 		initStringInfo(&buf);
! 		appendStringInfo(&buf, "foreign table \"%s\"", errpos->relname);
! 		relname = buf.data;
! 	}
! 	else
! 	{
! 		/* error occurred in a scan against a foreign join */ 
! 		initStringInfo(&buf);
! 		appendStringInfo(&buf, "column %d", errpos->cur_attno - 1);
! 		attname = buf.data;
! 
! 		initStringInfo(&buf);
! 		appendStringInfo(&buf, "foreign join \"%s\"", errpos->query);
! 		relname = buf.data;
! 	}
  
  	if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
! 		errcontext("%s of %s", attname, relname);
  }
  
  /*
*** a/contrib/postgres_fdw/postgres_fdw.h
--- b/contrib/postgres_fdw/postgres_fdw.h
***************
*** 15,20 ****
--- 15,21 ----
  
  #include "foreign/foreign.h"
  #include "lib/stringinfo.h"
+ #include "nodes/plannodes.h"
  #include "nodes/relation.h"
  #include "utils/relcache.h"
  
***************
*** 26,31 ****
--- 27,38 ----
   */
  typedef struct PgFdwRelationInfo
  {
+ 	/*
+ 	 * True means that the relation can be pushed down.  Always true for
+ 	 * simple foreign scan.
+ 	 */
+ 	bool		pushdown_safe;
+ 
  	/* baserestrictinfo clauses, broken down into safe and unsafe subsets. */
  	List	   *remote_conds;
  	List	   *local_conds;
***************
*** 52,58 **** typedef struct PgFdwRelationInfo
  	/* Cached catalog information. */
  	ForeignTable *table;
  	ForeignServer *server;
! 	UserMapping *user;			/* only set in use_remote_estimate mode */
  } PgFdwRelationInfo;
  
  /* in postgres_fdw.c */
--- 59,72 ----
  	/* Cached catalog information. */
  	ForeignTable *table;
  	ForeignServer *server;
! 	Oid			umid;
! 
! 	/* Join information */
! 	RelOptInfo *outerrel;
! 	RelOptInfo *innerrel;
! 	JoinType	jointype;
! 	List	   *joinclauses;
! 	List	   *otherclauses;
  } PgFdwRelationInfo;
  
  /* in postgres_fdw.c */
***************
*** 88,100 **** extern void deparseSelectSql(StringInfo buf,
  				 PlannerInfo *root,
  				 RelOptInfo *baserel,
  				 Bitmapset *attrs_used,
! 				 List **retrieved_attrs);
! extern void appendWhereClause(StringInfo buf,
  				  PlannerInfo *root,
  				  RelOptInfo *baserel,
  				  List *exprs,
! 				  bool is_first,
  				  List **params);
  extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
  				 Index rtindex, Relation rel,
  				 List *targetAttrs, bool doNothing, List *returningList,
--- 102,134 ----
  				 PlannerInfo *root,
  				 RelOptInfo *baserel,
  				 Bitmapset *attrs_used,
! 				 List *remote_conds,
! 				 List *pathkeys,
! 				 List **params_list,
! 				 List **fdw_scan_tlist,
! 				 List **retrieved_attrs,
! 				 StringInfo relations,
! 				 bool alias);
! extern void appendConditions(StringInfo buf,
  				  PlannerInfo *root,
  				  RelOptInfo *baserel,
+ 				  List *outertlist,
+ 				  List *innertlist,
  				  List *exprs,
! 				  const char *prefix,
  				  List **params);
+ extern void deparseJoinSql(StringInfo sql,
+ 			   PlannerInfo *root,
+ 			   RelOptInfo *baserel,
+ 			   RelOptInfo *outerrel,
+ 			   RelOptInfo *innerrel,
+ 			   const char *sql_o,
+ 			   const char *sql_i,
+ 			   JoinType jointype,
+ 			   List *joinclauses,
+ 			   List *otherclauses,
+ 			   List **fdw_scan_tlist,
+ 			   List **retrieved_attrs);
  extern void deparseInsertSql(StringInfo buf, PlannerInfo *root,
  				 Index rtindex, Relation rel,
  				 List *targetAttrs, bool doNothing, List *returningList,
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 11,22 **** DO $d$
--- 11,27 ----
              OPTIONS (dbname '$$||current_database()||$$',
                       port '$$||current_setting('port')||$$'
              )$$;
+         EXECUTE $$CREATE SERVER loopback2 FOREIGN DATA WRAPPER postgres_fdw
+             OPTIONS (dbname '$$||current_database()||$$',
+                      port '$$||current_setting('port')||$$'
+             )$$;
      END;
  $d$;
  
  CREATE USER MAPPING FOR public SERVER testserver1
  	OPTIONS (user 'value', password 'value');
  CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
+ CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
  
  -- ===================================================================
  -- create objects used through FDW loopback server
***************
*** 39,44 **** CREATE TABLE "S 1"."T 2" (
--- 44,61 ----
  	c2 text,
  	CONSTRAINT t2_pkey PRIMARY KEY (c1)
  );
+ CREATE TABLE "S 1"."T 3" (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text,
+ 	CONSTRAINT t3_pkey PRIMARY KEY (c1)
+ );
+ CREATE TABLE "S 1"."T 4" (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c4 text,
+ 	CONSTRAINT t4_pkey PRIMARY KEY (c1)
+ );
  
  INSERT INTO "S 1"."T 1"
  	SELECT id,
***************
*** 54,62 **** INSERT INTO "S 1"."T 2"
--- 71,93 ----
  	SELECT id,
  	       'AAA' || to_char(id, 'FM000')
  	FROM generate_series(1, 100) id;
+ INSERT INTO "S 1"."T 3"
+ 	SELECT id,
+ 	       id + 1,
+ 	       'AAA' || to_char(id, 'FM000')
+ 	FROM generate_series(1, 100) id;
+ DELETE FROM "S 1"."T 3" WHERE c1 % 2 != 0;	-- delete for outer join tests
+ INSERT INTO "S 1"."T 4"
+ 	SELECT id,
+ 	       id + 1,
+ 	       'AAA' || to_char(id, 'FM000')
+ 	FROM generate_series(1, 100) id;
+ DELETE FROM "S 1"."T 4" WHERE c1 % 3 != 0;	-- delete for outer join tests
  
  ANALYZE "S 1"."T 1";
  ANALYZE "S 1"."T 2";
+ ANALYZE "S 1"."T 3";
+ ANALYZE "S 1"."T 4";
  
  -- ===================================================================
  -- create foreign tables
***************
*** 87,92 **** CREATE FOREIGN TABLE ft2 (
--- 118,146 ----
  ) SERVER loopback;
  ALTER FOREIGN TABLE ft2 DROP COLUMN cx;
  
+ CREATE FOREIGN TABLE ft4 (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text
+ ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 3');
+ 
+ CREATE FOREIGN TABLE ft5 (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text
+ ) SERVER loopback OPTIONS (schema_name 'S 1', table_name 'T 4');
+ 
+ CREATE FOREIGN TABLE ft6 (
+ 	c1 int NOT NULL,
+ 	c2 int NOT NULL,
+ 	c3 text
+ ) SERVER loopback2 OPTIONS (schema_name 'S 1', table_name 'T 4');
+ CREATE USER view_owner;
+ GRANT ALL ON ft5 TO view_owner;
+ CREATE VIEW v_ft5 AS SELECT * FROM ft5;
+ ALTER VIEW v_ft5 OWNER TO view_owner;
+ CREATE USER MAPPING FOR view_owner SERVER loopback;
+ 
  -- ===================================================================
  -- tests for validator
  -- ===================================================================
***************
*** 168,175 **** EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
  SELECT * FROM ft1 t1 WHERE c1 = 102 FOR SHARE;
  -- aggregate
  SELECT COUNT(*) FROM ft1 t1;
- -- join two tables
- SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
  -- subquery
  SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <= 10) ORDER BY c1;
  -- subquery+MAX
--- 222,227 ----
***************
*** 257,262 **** EXPLAIN (VERBOSE, COSTS false)
--- 309,394 ----
  SELECT count(c3) FROM ft1 t1 WHERE t1.c1 === t1.c2;
  
  -- ===================================================================
+ -- JOIN queries
+ -- ===================================================================
+ -- join two tables
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ -- join three tables
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+ SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
+ -- left outer join
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft4 t1 LEFT JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ -- right outer join
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft5 t1 RIGHT JOIN ft4 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t1.c1 OFFSET 10 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft5 t1 RIGHT JOIN ft4 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t1.c1 OFFSET 10 LIMIT 10;
+ -- full outer join
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 45 LIMIT 10;
+ -- full outer join + WHERE clause, only matched rows
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
+ -- join at WHERE clause 
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON true WHERE (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ -- join in CTE
+ EXPLAIN (COSTS false, VERBOSE)
+ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+ -- ctid with whole-row reference
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ -- partially unsafe to push down, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+ SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON t2.c1 = t2.c1 JOIN ft4 t3 ON t2.c1 = t3.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+ -- SEMI JOIN, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1 FROM ft1 t1 WHERE EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c1) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ -- ANTI JOIN, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
+ -- CROSS JOIN, not pushed down
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ -- different server
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ -- different effective user for permission check
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN v_ft5 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
+ -- unsafe join conditions
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c8 = t2.c8) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ -- local filter (unsafe conditions on one side)
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = 'foo' ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
+ -- Aggregate after UNION, for testing setrefs
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+ SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
+ -- join two foreign tables and two local tables
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+ SELECT t1.c1, t2.c1 FROM ft1 t1 LEFT JOIN ft2 t2 ON t1.c1 = t2.c1 JOIN "S 1"."T 1" t3 ON t1.c1 = t3."C 1" JOIN "S 1"."T 2" t4 ON t1.c1 = t4.c1 ORDER BY t1.c1 OFFSET 10 LIMIT 10;
+ 
+ -- ===================================================================
  -- parameterized queries
  -- ===================================================================
  -- simple join
***************
*** 880,882 **** DROP TYPE "Colors" CASCADE;
--- 1012,1018 ----
  IMPORT FOREIGN SCHEMA import_source LIMIT TO (t5)
    FROM SERVER loopback INTO import_dest5;  -- ERROR
  ROLLBACK;
+ 
+ -- Cleanup
+ DROP OWNED BY view_owner;
+ DROP USER view_owner;
#74Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#73)

On Tue, Dec 8, 2015 at 5:49 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

On 2015/12/08 3:06, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I think the core system likely needs visibility into where paths and
plans are present in node trees, and putting them somewhere inside
fdw_private would be going in the opposite direction.

Absolutely. You don't really want FDWs having to take responsibility
for setrefs.c processing of their node trees, for example. This is why
e.g. ForeignScan has both fdw_exprs and fdw_private.

I'm not too concerned about whether we have to adjust FDW-related APIs
as we go along. It's been clear from the beginning that we'd have to
do that, and we are nowhere near a point where we should promise that
we're done doing so.

OK, I'd vote for Robert's idea, then. I'd like to discuss the next
thing about his patch. As I mentioned in [1], the following change in
the patch will break the EXPLAIN output.

@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState
*estate, int eflags)
scanstate->fdwroutine = fdwroutine;
scanstate->fdw_state = NULL;

+       /* Initialize any outer plan. */
+       if (outerPlanState(scanstate))
+               outerPlanState(scanstate) =
+                       ExecInitNode(outerPlan(node), estate, eflags);
+

As pointed out by Horiguchi-san, that's not correct, though; we should
initialize the outer plan if outerPlan(node) != NULL, not
outerPlanState(scanstate) != NULL. Attached is an updated version of
his patch.

Oops, good catch.

I'm also attaching an updated version of the postgres_fdw
join pushdown patch.

Is that based on Ashutosh's version of the patch, or are the two of
you developing independent of each other? We should avoid dueling
patches if possible.

You can find the breaking examples by doing the
regression tests in the postgres_fdw patch. Please apply the patches in
the following order:

epq-recheck-v6-efujita (attached)
usermapping_matching.patch in [2]
add_GetUserMappingById.patch in [2]
foreign_join_v16_efujita2.patch (attached)

As I proposed upthread, I think we could fix that by handling the outer
plan as in the patch [3]; a) the core initializes the outer plan and
stores it into somewhere in the ForeignScanState node, not the lefttree
of the ForeignScanState node, during ExecInitForeignScan, and b) when
the RecheckForeignScan routine gets called, the FDW extracts the plan
from the given ForeignScanState node and executes it. What do you think
about that?

I think the actual regression test outputs are fine, and that your
desire to suppress part of the plan tree from showing up in the
EXPLAIN output is misguided. I like it just the way it is. To
prevent user confusion, I think that when we add support to
postgres_fdw for this we might also want to add some documentation
explaining how to interpret this EXPLAIN output, but I don't think
there's any problem with the output itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Robert Haas (#74)

On 2015/12/09 1:13, Robert Haas wrote:

On Tue, Dec 8, 2015 at 5:49 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I'd like to discuss the next
thing about his patch. As I mentioned in [1], the following change in
the patch will break the EXPLAIN output.

@@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState
*estate, int eflags)
scanstate->fdwroutine = fdwroutine;
scanstate->fdw_state = NULL;

+       /* Initialize any outer plan. */
+       if (outerPlanState(scanstate))
+               outerPlanState(scanstate) =
+                       ExecInitNode(outerPlan(node), estate, eflags);
+

As pointed out by Horiguchi-san, that's not correct, though; we should
initialize the outer plan if outerPlan(node) != NULL, not
outerPlanState(scanstate) != NULL. Attached is an updated version of
his patch.

I'm also attaching an updated version of the postgres_fdw
join pushdown patch.

Is that based on Ashutosh's version of the patch, or are the two of
you developing independent of each other? We should avoid dueling
patches if possible.

That's not based on his version. I'll add to his patch changes I've
made. IIUC, his version is an updated version of Hanada-san's original
patches that I've modified, so I guess that I could do that easily.
(I've added a helper function for creating a local join execution plan
for a given foreign join, but that is a rush work. So, I'll rewrite that.)

You can find the breaking examples by doing the
regression tests in the postgres_fdw patch. Please apply the patches in
the following order:

epq-recheck-v6-efujita (attached)
usermapping_matching.patch in [2]
add_GetUserMappingById.patch in [2]
foreign_join_v16_efujita2.patch (attached)

As I proposed upthread, I think we could fix that by handling the outer
plan as in the patch [3]; a) the core initializes the outer plan and
stores it into somewhere in the ForeignScanState node, not the lefttree
of the ForeignScanState node, during ExecInitForeignScan, and b) when
the RecheckForeignScan routine gets called, the FDW extracts the plan
from the given ForeignScanState node and executes it. What do you think
about that?

I think the actual regression test outputs are fine, and that your
desire to suppress part of the plan tree from showing up in the
EXPLAIN output is misguided. I like it just the way it is. To
prevent user confusion, I think that when we add support to
postgres_fdw for this we might also want to add some documentation
explaining how to interpret this EXPLAIN output, but I don't think
there's any problem with the output itself.

I'm not sure that that's a good idea. one reason for that is I think
that that would be more confusing to users when more than two foreign
tables are involved in a foreign join as shown in the following example.
Note that the outer plans will be shown recursively. Another reason
is there is no consistency between the costs of the outer plans and that
of the main plan.

postgres=# explain verbose select * from foo, bar, baz where foo.a =
bar.a and bar.a = baz.a for update;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
LockRows (cost=100.00..100.45 rows=15 width=96)
Output: foo.a, bar.a, baz.a, foo.*, bar.*, baz.*
-> Foreign Scan (cost=100.00..100.30 rows=15 width=96)
Output: foo.a, bar.a, baz.a, foo.*, bar.*, baz.*
Relations: ((public.foo) INNER JOIN (public.bar)) INNER JOIN
(public.baz)
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1, r.a2 FROM
(SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT l.a9, ROW(l.a9) FROM (SELECT
a a9 FROM p
ublic.foo FOR UPDATE) l) l (a1, a2) INNER JOIN (SELECT r.a9, ROW(r.a9)
FROM (SELECT a a9 FROM public.bar FOR UPDATE) r) r (a1, a2) ON ((l.a1 =
r.a1))) l
(a1, a2, a3, a4) INNER JOIN (SELECT r.a9, ROW(r.a9) FROM (SELECT a a9
FROM public.baz FOR UPDATE) r) r (a1, a2) ON ((l.a1 = r.a1))
-> Hash Join (cost=272.13..272.69 rows=15 width=96)
Output: foo.a, foo.*, bar.a, bar.*, baz.a, baz.*
Hash Cond: (foo.a = baz.a)
-> Foreign Scan (cost=100.00..100.04 rows=2 width=64)
Output: foo.a, foo.*, bar.a, bar.*
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM
(SELECT l.a9, ROW(l.a9) FROM (SELECT a a9 FROM public.foo FOR UPDATE) l)
l (a1, a2)
INNER JOIN (SELECT r.a9, ROW(r.a9) FROM (SELECT a a9 FROM public.bar FOR
UPDATE) r) r (a1, a2) ON ((l.a1 = r.a1))
-> Nested Loop (cost=200.00..202.18 rows=2 width=64)
Output: foo.a, foo.*, bar.a, bar.*
Join Filter: (foo.a = bar.a)
-> Foreign Scan on public.foo
(cost=100.00..101.06 rows=2 width=32)
Output: foo.a, foo.*
Remote SQL: SELECT a FROM public.foo
FOR UPDATE
-> Materialize (cost=100.00..101.07 rows=2
width=32)
Output: bar.a, bar.*
-> Foreign Scan on public.bar
(cost=100.00..101.06 rows=2 width=32)
Output: bar.a, bar.*
Remote SQL: SELECT a FROM
public.bar FOR UPDATE
-> Hash (cost=153.86..153.86 rows=1462 width=32)
Output: baz.a, baz.*
-> Foreign Scan on public.baz
(cost=100.00..153.86 rows=1462 width=32)
Output: baz.a, baz.*
Remote SQL: SELECT a FROM public.baz FOR UPDATE
(29 rows)

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#75)

On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I think the actual regression test outputs are fine, and that your
desire to suppress part of the plan tree from showing up in the
EXPLAIN output is misguided. I like it just the way it is. To
prevent user confusion, I think that when we add support to
postgres_fdw for this we might also want to add some documentation
explaining how to interpret this EXPLAIN output, but I don't think
there's any problem with the output itself.

I'm not sure that that's a good idea. one reason for that is I think that
that would be more confusing to users when more than two foreign tables are
involved in a foreign join as shown in the following example. Note that the
outer plans will be shown recursively. Another reason is there is no
consistency between the costs of the outer plans and that of the main plan.

I still don't really see a problem here, but, regardless, the solution
can't be to hide nodes that are in fact present from the user. We can
talk about making further changes here, but hiding the nodes
altogether is categorically out in my mind.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#76)

On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I think the actual regression test outputs are fine, and that your
desire to suppress part of the plan tree from showing up in the
EXPLAIN output is misguided. I like it just the way it is. To
prevent user confusion, I think that when we add support to
postgres_fdw for this we might also want to add some documentation
explaining how to interpret this EXPLAIN output, but I don't think
there's any problem with the output itself.

I'm not sure that that's a good idea. one reason for that is I think that
that would be more confusing to users when more than two foreign tables are
involved in a foreign join as shown in the following example. Note that the
outer plans will be shown recursively. Another reason is there is no
consistency between the costs of the outer plans and that of the main plan.

I still don't really see a problem here, but, regardless, the solution
can't be to hide nodes that are in fact present from the user. We can
talk about making further changes here, but hiding the nodes
altogether is categorically out in my mind.

Fujita-san,

If you really want to hide the alternative sub-plan, you can move the
outer planstate onto somewhere private field on BeginForeignScan,
then kick ExecProcNode() at the ForeignRecheck callback by itself.
Explain walks down the sub-plan if outerPlanState(planstate) is
valid. So, as long as your extension keeps the planstate privately,
it is not visible from the EXPLAIN.

Of course, I don't recommend it.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Kouhei Kaigai (#77)

On 2015/12/09 13:26, Kouhei Kaigai wrote:

On Tue, Dec 8, 2015 at 10:00 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

I think the actual regression test outputs are fine, and that your
desire to suppress part of the plan tree from showing up in the
EXPLAIN output is misguided. I like it just the way it is. To
prevent user confusion, I think that when we add support to
postgres_fdw for this we might also want to add some documentation
explaining how to interpret this EXPLAIN output, but I don't think
there's any problem with the output itself.

I'm not sure that that's a good idea. one reason for that is I think that
that would be more confusing to users when more than two foreign tables are
involved in a foreign join as shown in the following example. Note that the
outer plans will be shown recursively. Another reason is there is no
consistency between the costs of the outer plans and that of the main plan.

I still don't really see a problem here, but, regardless, the solution
can't be to hide nodes that are in fact present from the user. We can
talk about making further changes here, but hiding the nodes
altogether is categorically out in my mind.

If you really want to hide the alternative sub-plan, you can move the
outer planstate onto somewhere private field on BeginForeignScan,
then kick ExecProcNode() at the ForeignRecheck callback by itself.
Explain walks down the sub-plan if outerPlanState(planstate) is
valid. So, as long as your extension keeps the planstate privately,
it is not visible from the EXPLAIN.

Of course, I don't recommend it.

Sorry, my explanation might be not enough, but I'm not saying to hide
the subplan. I think it would be better to show the subplan somewhere
in the EXPLAIN outout, but I'm not sure that it's a good idea to show
that in the current form. We have two plan trees; one for normal query
execution and another for EvalPlanQual testing. I think it'd be better
to show the EXPLAIN output the way that allows users to easily identify
each of the plan trees.

Best regards,
Etsuro Fujita

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79Robert Haas
robertmhaas@gmail.com
In reply to: Etsuro Fujita (#78)

On Wed, Dec 9, 2015 at 3:22 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Sorry, my explanation might be not enough, but I'm not saying to hide the
subplan. I think it would be better to show the subplan somewhere in the
EXPLAIN outout, but I'm not sure that it's a good idea to show that in the
current form. We have two plan trees; one for normal query execution and
another for EvalPlanQual testing. I think it'd be better to show the
EXPLAIN output the way that allows users to easily identify each of the plan
trees.

It's hard to do that because we don't identify that internally
anywhere. Like I said before, the possibility of a ForeignScan having
an outer subplan is formally independent of the new EPQ stuff, and I'd
prefer to maintain that separation and just address this with
documentation.

Getting this bug fixed has been one of the more exhausting experiences
of my involvement with PostgreSQL, and to be honest, I think I'd like
to stop spending too much time on this now and work on getting the
feature that this is intended to support working. Right now, the only
people who can have an opinion on this topic are those who are
following this thread in detail, and there really aren't that many of
those. If we get the feature - join pushdown for postgres_fdw -
working, then we might get some feedback from users about what they
like about it or don't, and certainly if this is a frequent complaint
then that bolsters the case for doing something about it, and possibly
also helps us figure out what that thing should be. On the other
hand, if we don't get the feature because we're busy debating
interface details related to this patch, then none of these details
matter anyway because nobody except developer is actually running the
code in question.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80Michael Paquier
michael.paquier@gmail.com
In reply to: Robert Haas (#79)

On Thu, Dec 10, 2015 at 1:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 9, 2015 at 3:22 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:

Sorry, my explanation might be not enough, but I'm not saying to hide the
subplan. I think it would be better to show the subplan somewhere in the
EXPLAIN outout, but I'm not sure that it's a good idea to show that in the
current form. We have two plan trees; one for normal query execution and
another for EvalPlanQual testing. I think it'd be better to show the
EXPLAIN output the way that allows users to easily identify each of the plan
trees.

It's hard to do that because we don't identify that internally
anywhere. Like I said before, the possibility of a ForeignScan having
an outer subplan is formally independent of the new EPQ stuff, and I'd
prefer to maintain that separation and just address this with
documentation.

Fujita-san, others, could this be addressed with documentation?

Getting this bug fixed has been one of the more exhausting experiences
of my involvement with PostgreSQL, and to be honest, I think I'd like
to stop spending too much time on this now and work on getting the
feature that this is intended to support working. Right now, the only
people who can have an opinion on this topic are those who are
following this thread in detail, and there really aren't that many of
those.

I am numbering that to mainly 3 people, you included :)

If we get the feature - join pushdown for postgres_fdw -
working, then we might get some feedback from users about what they
like about it or don't, and certainly if this is a frequent complaint
then that bolsters the case for doing something about it, and possibly
also helps us figure out what that thing should be. On the other
hand, if we don't get the feature because we're busy debating
interface details related to this patch, then none of these details
matter anyway because nobody except developer is actually running the
code in question.

As this debate continues, I think that moving this patch to the next
CF would make the most sense then.. So done this way.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#81Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#80)

On 2015/12/22 15:24, Michael Paquier wrote:

On Thu, Dec 10, 2015 at 1:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

If we get the feature - join pushdown for postgres_fdw -
working, then we might get some feedback from users about what they
like about it or don't, and certainly if this is a frequent complaint
then that bolsters the case for doing something about it, and possibly
also helps us figure out what that thing should be. On the other
hand, if we don't get the feature because we're busy debating
interface details related to this patch, then none of these details
matter anyway because nobody except developer is actually running the
code in question.

As this debate continues, I think that moving this patch to the next
CF would make the most sense then.. So done this way.

Perhaps, this ended (?) with the following commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=385f337c9f39b21dca96ca4770552a10a6d5af24

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82Michael Paquier
michael.paquier@gmail.com
In reply to: Amit Langote (#81)

On Tue, Dec 22, 2015 at 3:52 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2015/12/22 15:24, Michael Paquier wrote:

As this debate continues, I think that moving this patch to the next
CF would make the most sense then.. So done this way.

Perhaps, this ended (?) with the following commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=385f337c9f39b21dca96ca4770552a10a6d5af24

Ah, thanks! What has been committed is actually more or less
epq-recheck-v6-efujita.patch posted upthread, I'll mark the patch as
committed then.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#83Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#82)

On Tue, Dec 22, 2015 at 2:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Dec 22, 2015 at 3:52 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2015/12/22 15:24, Michael Paquier wrote:

As this debate continues, I think that moving this patch to the next
CF would make the most sense then.. So done this way.

Perhaps, this ended (?) with the following commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=385f337c9f39b21dca96ca4770552a10a6d5af24

Ah, thanks! What has been committed is actually more or less
epq-recheck-v6-efujita.patch posted upthread, I'll mark the patch as
committed then.

+1. And thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers