Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

Started by KaiGai Koheiabout 11 years ago39 messageshackers
Jump to latest
#1KaiGai Kohei
kaigai@ak.jp.nec.com

On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

So, overall consensus for the FDW hook location is just before the set_chepest()
at standard_join_search() and merge_clump(), isn't it?

Yes, I think so.

Let me make a design of FDW hook to reduce code duplications for each FDW driver,
especially, to identify baserel/joinrel to be involved in this join.

Great, thanks!

One issue, which I think Ashutosh alluded to upthread, is that we need
to make sure it's not unreasonably difficult for foreign data wrappers
to construct the FROM clause of an SQL query to be pushed down to the
remote side. It should be simple when there are only inner joins
involved, but when there are all outer joins it might be a bit
complex. It would be very good if someone could try to write that
code, based on the new hook locations, and see how it turns out, so
that we can figure out how to address any issues that may crop up
there.

Here is an idea that provides a common utility function that break down
the supplied RelOptInfo of joinrel into a pair of join-type and a list of
baserel/joinrel being involved in the relations join. It intends to be
called by FDW driver to list up underlying relations.
IIUC, root->join_info_list will provide information of how relations are
combined to the upper joined relations, thus, I expect it is not
unreasonably complicated way to solve.
Once a RelOptInfo of the target joinrel is broken down into multiple sub-
relations (N>=2 if all inner join, elsewhere N=2), FDW driver can
reference the RestrictInfo to be used in relations join.

Anyway, I'll try to investigate the existing code for more detail today,
to clarify whether the above approach is feasible.

Sounds good. Keep in mind that, while the parse tree will obviously
reflect the way that the user actually specified the join
syntactically, it's not the job of the join_info_list to make it
simple to reconstruct that information. To the contrary,
join_info_list is supposed to be structured in a way that makes it
easy to determine whether *a particular join order is one of the legal
join orders* not *whether it is the specific join order selected by
the user*. See join_is_legal().

For FDW pushdown, I think it's sufficient to be able to identify *any
one* legal join order, not necessarily the same order the user
originally entered. For exampe, if the user entered A LEFT JOIN B ON
A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that
instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I
suspect that's just fine. Particular FDWs might wish to try to be
smart about what they emit based on knowledge of what the remote
side's optimizer is likely to do, and that's fine. If the remote side
is PostgreSQL, it shouldn't matter much.

Sorry for my response late. It was not easy to code during business trip.

The attached patch adds a hook for FDW/CSP to replace entire join-subtree
by a foreign/custom-scan, according to the discussion upthread.

GetForeignJoinPaths handler of FDW is simplified as follows:
typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
RelOptInfo *joinrel);

It takes PlannerInfo and RelOptInfo of the join-relation to be replaced
if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be
able to know the relations to be involved and construct a remote join query.
However, it is not obvious with RelOptInfo to know how relations are joined.

The function below will help implement FDW driver that support remote join.

List *
get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel,
SpecialJoinInfo **p_sjinfo)

It returns a list of RelOptInfo to be involved in the relations join that
is represented with 'joinrel', and also set a SpecialJoinInfo on the third
argument if not inner join.
In case of inner join, it returns multiple (more than or equal to 2)
relations to be inner-joined. Elsewhere, it returns two relations and
a valid SpecialJoinInfo.

The #if 0 ... #endif block is just for confirmation purpose to show
how hook is invoked and the joinrel is broken down with above service
routine.

postgres=# select * from t0 left join t1 on t1.aid=bid
left join t2 on t1.aid=cid
left join t3 on t1.aid=did
left join t4 on t1.aid=eid;
INFO: LEFT JOIN: t0, t1
INFO: LEFT JOIN: (t0, t1), t2
INFO: LEFT JOIN: (t0, t1), t3
INFO: LEFT JOIN: (t0, t1), t4
INFO: LEFT JOIN: (t0, t1, t3), t2
INFO: LEFT JOIN: (t0, t1, t4), t2
INFO: LEFT JOIN: (t0, t1, t4), t3
INFO: LEFT JOIN: (t0, t1, t3, t4), t2

In this case, joinrel is broken down into (t0, t1, t3, t4) and t2.
The earlier one is also joinrel, so it expects FDW driver will make
the get_joinrel_broken_down() call recurdively.

postgres=# explain select * from t0 natural join t1
natural join t2
natural join t3
natural join t4;
INFO: INNER JOIN: t0, t1
INFO: INNER JOIN: t0, t2
INFO: INNER JOIN: t0, t3
INFO: INNER JOIN: t0, t4
INFO: INNER JOIN: t0, t1, t2
INFO: INNER JOIN: t0, t1, t3
INFO: INNER JOIN: t0, t1, t4
INFO: INNER JOIN: t0, t2, t3
INFO: INNER JOIN: t0, t2, t4
INFO: INNER JOIN: t0, t3, t4
INFO: INNER JOIN: t0, t1, t2, t3
INFO: INNER JOIN: t0, t1, t2, t4
INFO: INNER JOIN: t0, t1, t3, t4
INFO: INNER JOIN: t0, t2, t3, t4
INFO: INNER JOIN: t0, t1, t2, t3, t4

In this case, joinrel is consist of inner join, so get_joinrel_broken_down()
returns a list that contains RelOptInfo of 6 base relations.

postgres=# explain select * from t0 natural join t1
left join t2 on t1.aid=t2.bid
natural join t3
natural join t4;
INFO: INNER JOIN: t0, t1
INFO: INNER JOIN: t0, t3
INFO: INNER JOIN: t0, t4
INFO: LEFT JOIN: t1, t2
INFO: INNER JOIN: (t1, t2), t0
INFO: INNER JOIN: t0, t1, t3
INFO: INNER JOIN: t0, t1, t4
INFO: INNER JOIN: t0, t3, t4
INFO: INNER JOIN: (t1, t2), t0, t3
INFO: INNER JOIN: (t1, t2), t0, t4
INFO: INNER JOIN: t0, t1, t3, t4
INFO: INNER JOIN: (t1, t2), t0, t3, t4

In mixture case, it keeps restriction of join legality (t1 and t2 must
be left joined) during its broken down.

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.5-custom-join.v10.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v10.patchDownload+772-49
#2Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#1)

2015/03/23 9:12、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

Sorry for my response late. It was not easy to code during business trip.

The attached patch adds a hook for FDW/CSP to replace entire join-subtree
by a foreign/custom-scan, according to the discussion upthread.

GetForeignJoinPaths handler of FDW is simplified as follows:
typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
RelOptInfo *joinrel);

It’s not a critical issue but I’d like to propose to rename add_joinrel_extra_paths() to add_extra_paths_to_joinrel(), because the latter would make it more clear that it does extra work in addition to add_paths_to_joinrel().

It takes PlannerInfo and RelOptInfo of the join-relation to be replaced
if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be
able to know the relations to be involved and construct a remote join query.
However, it is not obvious with RelOptInfo to know how relations are joined.

The function below will help implement FDW driver that support remote join.

List *
get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel,
SpecialJoinInfo **p_sjinfo)

It returns a list of RelOptInfo to be involved in the relations join that
is represented with 'joinrel', and also set a SpecialJoinInfo on the third
argument if not inner join.
In case of inner join, it returns multiple (more than or equal to 2)
relations to be inner-joined. Elsewhere, it returns two relations and
a valid SpecialJoinInfo.

As far as I tested, it works fine for SEMI and ANTI.
# I want dump function of BitmapSet for debugging, as Node has nodeToString()...

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

With applying your patch, regression tests of “updatable view” failed. regression.diff contains some errors like this:
! ERROR: could not find RelOptInfo for given relids

Could you check that?


Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#2)

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

With applying your patch, regression tests of “updatable view” failed.
regression.diff contains some errors like this:
! ERROR: could not find RelOptInfo for given relids

Could you check that?

It is a bug around the logic to find out two RelOptInfo that can construct
another RelOptInfo of joinrel.
Even though I'm now working to correct the logic, it is not obvious to
identify two relids that satisfy joinrel->relids.
(Yep, law of entropy enhancement...)

On the other hands, we may have a solution that does not need a complicated
reconstruction process. The original concern was, FDW driver may add paths
that will replace entire join subtree by foreign-scan on remote join multiple
times, repeatedly, but these paths shall be identical.

If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able
to solve the problem more straight-forward and simply way.
Because build_join_rel() finds a cache on root->join_rel_hash then returns
immediately if required joinrelids already has its RelOptInfo, bottom of
this function never called twice on a particular set of joinrelids.
Once FDW/CSP constructs a path that replaces entire join subtree towards
the joinrel just after construction, it shall be kept until cheaper built-in
paths are added (if exists).

This idea has one other positive side-effect. We expect remote-join is cheaper
than local join with two remote scan in most cases. Once a much cheaper path
is added prior to local join consideration, add_path_precheck() breaks path
consideration earlier.

Please comment on.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Shigeru HANADA [mailto:shigeru.hanada@gmail.com]
Sent: Tuesday, March 24, 2015 7:36 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Robert Haas; Tom Lane; Ashutosh Bapat; Thom Brown;
pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015/03/23 9:12、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

Sorry for my response late. It was not easy to code during business trip.

The attached patch adds a hook for FDW/CSP to replace entire join-subtree
by a foreign/custom-scan, according to the discussion upthread.

GetForeignJoinPaths handler of FDW is simplified as follows:
typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
RelOptInfo *joinrel);

It’s not a critical issue but I’d like to propose to rename
add_joinrel_extra_paths() to add_extra_paths_to_joinrel(), because the latter
would make it more clear that it does extra work in addition to
add_paths_to_joinrel().

It takes PlannerInfo and RelOptInfo of the join-relation to be replaced
if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be
able to know the relations to be involved and construct a remote join query.
However, it is not obvious with RelOptInfo to know how relations are joined.

The function below will help implement FDW driver that support remote join.

List *
get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel,
SpecialJoinInfo **p_sjinfo)

It returns a list of RelOptInfo to be involved in the relations join that
is represented with 'joinrel', and also set a SpecialJoinInfo on the third
argument if not inner join.
In case of inner join, it returns multiple (more than or equal to 2)
relations to be inner-joined. Elsewhere, it returns two relations and
a valid SpecialJoinInfo.

As far as I tested, it works fine for SEMI and ANTI.
# I want dump function of BitmapSet for debugging, as Node has nodeToString()...

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

With applying your patch, regression tests of “updatable view” failed.
regression.diff contains some errors like this:
! ERROR: could not find RelOptInfo for given relids

Could you check that?


Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: KaiGai Kohei (#3)
Re: Custom/Foreign-Join-APIs

Hello, I had a look on this.

At Wed, 25 Mar 2015 03:59:28 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F8010C6819@BPXM15GP.gisp.nec.co.jp>

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

With applying your patch, regression tests of “updatable view” failed.
regression.diff contains some errors like this:
! ERROR: could not find RelOptInfo for given relids

Could you check that?

It is a bug around the logic to find out two RelOptInfo that can construct
another RelOptInfo of joinrel.

It is caused by split (or multilevel) joinlist. Setting
join_collapse_limit to 10 makes the query to go well.

I suppose that get_joinrel_broken_down should give up returning
result when given joinrel spans over multiple join subproblems,
becuase they cannot be merged by FDW anyway even if they
comformed the basic requirements for merging.

Even though I'm now working to correct the logic, it is not obvious to
identify two relids that satisfy joinrel->relids.
(Yep, law of entropy enhancement...)

On the other hands, we may have a solution that does not need a complicated
reconstruction process. The original concern was, FDW driver may add paths
that will replace entire join subtree by foreign-scan on remote join multiple
times, repeatedly, but these paths shall be identical.

If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able
to solve the problem more straight-forward and simply way.
Because build_join_rel() finds a cache on root->join_rel_hash then returns
immediately if required joinrelids already has its RelOptInfo, bottom of
this function never called twice on a particular set of joinrelids.
Once FDW/CSP constructs a path that replaces entire join subtree towards
the joinrel just after construction, it shall be kept until cheaper built-in
paths are added (if exists).

This idea has one other positive side-effect. We expect remote-join is cheaper
than local join with two remote scan in most cases. Once a much cheaper path
is added prior to local join consideration, add_path_precheck() breaks path
consideration earlier.

+1 as a whole.

regards,

--
堀口恭太郎

日本電信電話株式会社 NTTオープンソースソフトウェアセンタ
Phone: 03-5860-5115 / Fax: 03-5463-5490

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#3)

2015/03/25 12:59、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

With applying your patch, regression tests of “updatable view” failed.
regression.diff contains some errors like this:
! ERROR: could not find RelOptInfo for given relids

Could you check that?

It is a bug around the logic to find out two RelOptInfo that can construct
another RelOptInfo of joinrel.
Even though I'm now working to correct the logic, it is not obvious to
identify two relids that satisfy joinrel->relids.
(Yep, law of entropy enhancement…)

IIUC, this problem is in only non-INNER JOINs because we can treat relations joined with only INNER JOIN in arbitrary order. But supporting OUTER JOINs would be necessary even for the first cut.

On the other hands, we may have a solution that does not need a complicated
reconstruction process. The original concern was, FDW driver may add paths
that will replace entire join subtree by foreign-scan on remote join multiple
times, repeatedly, but these paths shall be identical.

If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able
to solve the problem more straight-forward and simply way.
Because build_join_rel() finds a cache on root->join_rel_hash then returns
immediately if required joinrelids already has its RelOptInfo, bottom of
this function never called twice on a particular set of joinrelids.
Once FDW/CSP constructs a path that replaces entire join subtree towards
the joinrel just after construction, it shall be kept until cheaper built-in
paths are added (if exists).

This idea has one other positive side-effect. We expect remote-join is cheaper
than local join with two remote scan in most cases. Once a much cheaper path
is added prior to local join consideration, add_path_precheck() breaks path
consideration earlier.

Please comment on.

Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases.

Though I’m not sure that it also fits custom join provider’s requirements.


Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Shigeru Hanada (#5)

On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com>
wrote:

Or bottom of make_join_rel(). IMO build_join_rel() is responsible for
just building (or searching from a list) a RelOptInfo for given relids.
After that make_join_rel() calls add_paths_to_joinrel() with appropriate
arguments per join type to generate actual Paths implements the join.
make_join_rel() is called only once for particular relid combination, and
there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and
WHERE), so it seems promising for FDW cases.

I like that idea, but I think we will have complex hook signature, it won't
remain as simple as hook (root, joinrel).

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#7KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Ashutosh Bapat (#6)

On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
Or bottom of make_join_rel(). IMO build_join_rel() is responsible for
just building (or searching from a list) a RelOptInfo for given relids. After
that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per
join type to generate actual Paths implements the join. make_join_rel() is
called only once for particular relid combination, and there SpecialJoinInfo and
restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising
for FDW cases.

I like that idea, but I think we will have complex hook signature, it won't remain
as simple as hook (root, joinrel).

In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2,
sjinfo and restrictlist.
It is not too simple, but not complicated signature.

Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute
restrictlist using build_joinrel_restrictlist() again. It is a static function
in relnode.c. So, I don't think either of them has definitive advantage from
the standpoint of simplicity.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#7)

2015/03/25 19:09、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
Or bottom of make_join_rel(). IMO build_join_rel() is responsible for
just building (or searching from a list) a RelOptInfo for given relids. After
that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per
join type to generate actual Paths implements the join. make_join_rel() is
called only once for particular relid combination, and there SpecialJoinInfo and
restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising
for FDW cases.

I like that idea, but I think we will have complex hook signature, it won't remain
as simple as hook (root, joinrel).

In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2,
sjinfo and restrictlist.
It is not too simple, but not complicated signature.

Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute
restrictlist using build_joinrel_restrictlist() again. It is a static function
in relnode.c. So, I don't think either of them has definitive advantage from
the standpoint of simplicity.

The bottom of make_join_rel() seems good from the viewpoint of information, but it is called multiple times for join combinations which are essentially identical, for INNER JOIN case like this:

fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid;
INFO: postgresGetForeignJoinPaths() 1x2
INFO: postgresGetForeignJoinPaths() 1x4
INFO: postgresGetForeignJoinPaths() 2x4
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: postgresGetForeignJoinPaths() 0x4
INFO: postgresGetForeignJoinPaths() 0x2
INFO: postgresGetForeignJoinPaths() 0x1
INFO: standard_join_search() old hook point
QUERY PLAN
---------------------------------------------------------
Foreign Scan (cost=100.00..102.11 rows=211 width=1068)
(1 row)

Here I’ve put probe point in the beginnig of GetForeignJoinPaths handler and just before set_cheapest() call in standard_join_search() as “old hook point”. In this example 1, 2, and 4 are base relations, and in the join level 3 planner calls GetForeignJoinPaths() three times for the combinations:

1) (1x2)x4
2) (1x4)x2
3) (2x4)x1

Tom’s suggestion is aiming at providing a chance to consider join push-down in more abstract level, IIUC. So it would be good to call handler only once for that case, for flattened combination (1x2x3).

Hum, how about skipping calling handler (or hook) if the joinrel was found by find_join_rel()? At least it suppress redundant call for different join orders, and handler can determine whether the combination can be flattened by checking that all RelOptInfo with RELOPT_JOINREL under joinrel has JOIN_INNER as jointype.


Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#5)

2015/03/25 12:59、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

At this moment, I'm not 100% certain about its logic. Especially, I didn't
test SEMI- and ANTI- join cases yet.
However, time is money - I want people to check overall design first, rather
than detailed debugging. Please tell me if I misunderstood the logic to break
down join relations.

With applying your patch, regression tests of “updatable view” failed.
regression.diff contains some errors like this:
! ERROR: could not find RelOptInfo for given relids

Could you check that?

It is a bug around the logic to find out two RelOptInfo that can construct
another RelOptInfo of joinrel.
Even though I'm now working to correct the logic, it is not obvious to
identify two relids that satisfy joinrel->relids.
(Yep, law of entropy enhancement…)

IIUC, this problem is in only non-INNER JOINs because we can treat relations joined
with only INNER JOIN in arbitrary order. But supporting OUTER JOINs would be
necessary even for the first cut.

Yep. In case when joinrel contains all inner-joined relations managed by same
FDW driver, job of get_joinrel_broken_down() is quite simple.
However, people want to support outer-join also, doesn't it?

On the other hands, we may have a solution that does not need a complicated
reconstruction process. The original concern was, FDW driver may add paths
that will replace entire join subtree by foreign-scan on remote join multiple
times, repeatedly, but these paths shall be identical.

If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able
to solve the problem more straight-forward and simply way.
Because build_join_rel() finds a cache on root->join_rel_hash then returns
immediately if required joinrelids already has its RelOptInfo, bottom of
this function never called twice on a particular set of joinrelids.
Once FDW/CSP constructs a path that replaces entire join subtree towards
the joinrel just after construction, it shall be kept until cheaper built-in
paths are added (if exists).

This idea has one other positive side-effect. We expect remote-join is cheaper
than local join with two remote scan in most cases. Once a much cheaper path
is added prior to local join consideration, add_path_precheck() breaks path
consideration earlier.

Please comment on.

Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just
building (or searching from a list) a RelOptInfo for given relids. After that
make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join
type to generate actual Paths implements the join. make_join_rel() is called
only once for particular relid combination, and there SpecialJoinInfo and
restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising
for FDW cases.

As long as caller can know whether build_join_rel() actually construct a new
RelOptInfo object, or not, I think it makes more sense than putting a hook
within make_join_rel().

Though I’m not sure that it also fits custom join provider’s requirements.

Join replaced by CSP has two scenarios. First one implements just an alternative
logic of built-in join, will takes underlying inner/outer node, so its hook
is located on add_paths_to_joinrel() as like built-in join logics.
Second one tries to replace entire join sub-tree by materialized view (for
example), like FDW remote join cases. So, it has to be hooked nearby the
location of GetForeignJoinPaths().
In case of the second scenario, CSP does not have private field in RelOptInfo,
so it may not obvious to check whether the given joinrel exactly matches with
a particular materialized-view or other caches.

At this moment, what I'm interested in is the first scenario, so priority of
the second case is not significant for me, at least.

Thanks.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#8)

2015/03/25 19:09、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com>

wrote:

Or bottom of make_join_rel(). IMO build_join_rel() is responsible for
just building (or searching from a list) a RelOptInfo for given relids. After
that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments

per

join type to generate actual Paths implements the join. make_join_rel() is
called only once for particular relid combination, and there SpecialJoinInfo

and

restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising
for FDW cases.

I like that idea, but I think we will have complex hook signature, it won't

remain

as simple as hook (root, joinrel).

In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2,
sjinfo and restrictlist.
It is not too simple, but not complicated signature.

Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute
restrictlist using build_joinrel_restrictlist() again. It is a static function
in relnode.c. So, I don't think either of them has definitive advantage from
the standpoint of simplicity.

The bottom of make_join_rel() seems good from the viewpoint of information, but
it is called multiple times for join combinations which are essentially identical,
for INNER JOIN case like this:

fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid
= b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid;
INFO: postgresGetForeignJoinPaths() 1x2
INFO: postgresGetForeignJoinPaths() 1x4
INFO: postgresGetForeignJoinPaths() 2x4
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: postgresGetForeignJoinPaths() 0x4
INFO: postgresGetForeignJoinPaths() 0x2
INFO: postgresGetForeignJoinPaths() 0x1
INFO: standard_join_search() old hook point
QUERY PLAN
---------------------------------------------------------
Foreign Scan (cost=100.00..102.11 rows=211 width=1068)
(1 row)

Here I’ve put probe point in the beginnig of GetForeignJoinPaths handler and just
before set_cheapest() call in standard_join_search() as “old hook point”. In
this example 1, 2, and 4 are base relations, and in the join level 3 planner calls
GetForeignJoinPaths() three times for the combinations:

1) (1x2)x4
2) (1x4)x2
3) (2x4)x1

Tom’s suggestion is aiming at providing a chance to consider join push-down in
more abstract level, IIUC. So it would be good to call handler only once for
that case, for flattened combination (1x2x3).

Hum, how about skipping calling handler (or hook) if the joinrel was found by
find_join_rel()? At least it suppress redundant call for different join orders,
and handler can determine whether the combination can be flattened by checking
that all RelOptInfo with RELOPT_JOINREL under joinrel has JOIN_INNER as jointype.

The reason why FDW handler was called multiple times on your example is,
your modified make_join_rel() does not check whether build_join_rel()
actually build a new RelOptInfo, or just a cache reference, doesn't it?

If so, I'm inclined to your proposition.
A new "bool *found" argument of build_join_rel() makes reduce number of
FDW handler call, with keeping reasonable information to build remote-
join query.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Ashutosh Bapat (#6)

2015/03/25 18:53、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール:

On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:

Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases.

I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel).

Signature of the hook (or the FDW API handler) would be like this:

typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist);

This is very similar to add_paths_to_joinrel(), but lacks semifactors and extra_lateral_rels. semifactors can be obtained with compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed from root->placeholder_list as add_paths_to_joinrel() does.

From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to generate SELECT statement, so it would require most work done in make_join_rel again if the signature was hook(root, joinrel). sjinfo will be necessary for supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw.

I guess that other FDWs require at least jointype and restrictlist.


Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#10)

2015/03/25 19:47、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

The reason why FDW handler was called multiple times on your example is,
your modified make_join_rel() does not check whether build_join_rel()
actually build a new RelOptInfo, or just a cache reference, doesn't it?

Yep. After that change calling count looks like this:

fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid;
INFO: postgresGetForeignJoinPaths() 1x2
INFO: postgresGetForeignJoinPaths() 1x4
INFO: postgresGetForeignJoinPaths() 2x4
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: postgresGetForeignJoinPaths() 0x4
INFO: standard_join_search() old hook point
QUERY PLAN
---------------------------------------------------------
Foreign Scan (cost=100.00..102.11 rows=211 width=1068)
(1 row)

fdw=#

If so, I'm inclined to your proposition.
A new "bool *found" argument of build_join_rel() makes reduce number of
FDW handler call, with keeping reasonable information to build remote-
join query.

Another idea is to pass “found” as parameter to FDW handler, and let FDW to decide to skip or not. Some of FDWs (and some of CSP?) might want to be conscious of join combination.


Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#12)

2015/03/25 19:47、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

The reason why FDW handler was called multiple times on your example is,
your modified make_join_rel() does not check whether build_join_rel()
actually build a new RelOptInfo, or just a cache reference, doesn't it?

Yep. After that change calling count looks like this:

fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid
= b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid;
INFO: postgresGetForeignJoinPaths() 1x2
INFO: postgresGetForeignJoinPaths() 1x4
INFO: postgresGetForeignJoinPaths() 2x4
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: standard_join_search() old hook point
INFO: postgresGetForeignJoinPaths() 0x4
INFO: standard_join_search() old hook point
QUERY PLAN
---------------------------------------------------------
Foreign Scan (cost=100.00..102.11 rows=211 width=1068)
(1 row)

fdw=#

If so, I'm inclined to your proposition.
A new "bool *found" argument of build_join_rel() makes reduce number of
FDW handler call, with keeping reasonable information to build remote-
join query.

Another idea is to pass “found” as parameter to FDW handler, and let FDW to decide
to skip or not. Some of FDWs (and some of CSP?) might want to be conscious of
join combination.

I think it does not match the concept we stand on.
Unlike CSP, FDW intends to replace an entire join sub-tree that is
represented with a particular joinrel, regardless of the sequence
to construct a joinrel from multiple baserels.
So, it is sufficient to call GetForeignJoinPaths() once a joinrel
is constructed, isn't it?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#11)

Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just

building (or searching from a list) a RelOptInfo for given relids. After that
make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join
type to generate actual Paths implements the join. make_join_rel() is called
only once for particular relid combination, and there SpecialJoinInfo and
restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising
for FDW cases.

I like that idea, but I think we will have complex hook signature, it won't

remain as simple as hook (root, joinrel).

Signature of the hook (or the FDW API handler) would be like this:

typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist);

This is very similar to add_paths_to_joinrel(), but lacks semifactors and
extra_lateral_rels. semifactors can be obtained with
compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed
from root->placeholder_list as add_paths_to_joinrel() does.

From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to
generate SELECT statement, so it would require most work done in make_join_rel
again if the signature was hook(root, joinrel). sjinfo will be necessary for
supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw.

I guess that other FDWs require at least jointype and restrictlist.

The attached patch adds GetForeignJoinPaths call on make_join_rel() only when
'joinrel' is actually built and both of child relations are managed by same
FDW driver, prior to any other built-in join paths.
I adjusted the hook definition a little bit, because jointype can be reproduced
using SpecialJoinInfo. Right?

Probably, it will solve the original concern towards multiple calls of FDW
handler in case when it tries to replace an entire join subtree with a foreign-
scan on the result of remote join query.

How about your opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.5-custom-join.v11.patchapplication/octet-stream; name=pgsql-v9.5-custom-join.v11.patchDownload+499-52
#15Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#14)

2015/03/26 10:51、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:

The attached patch adds GetForeignJoinPaths call on make_join_rel() only when
'joinrel' is actually built and both of child relations are managed by same
FDW driver, prior to any other built-in join paths.
I adjusted the hook definition a little bit, because jointype can be reproduced
using SpecialJoinInfo. Right?

OK.

Probably, it will solve the original concern towards multiple calls of FDW
handler in case when it tries to replace an entire join subtree with a foreign-
scan on the result of remote join query.

How about your opinion?

Seems fine. I’ve fixed my postgres_fdw code to fit the new version, and am working on handling a whole-join-tree.

It would be difficult in the 9.5 cycle, but a hook point where we can handle whole joinrel might allow us to optimize a query which accesses multiple parent tables, each is inherited by foreign tables and partitioned with identical join key, by building a path tree which joins sharded tables first, and then union those results.

--
Shigeru HANADA
shigeru.hanada@gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Shigeru Hanada (#15)

Attached is the patch which adds join push-down support to postgres_fdw
(v7). It supports SELECT statements with JOIN, but has some more possible
enhancements (see below). I'd like to share the WIP patch here to get
comments about new FDW API design provided by KaiGai-san's v11 patch.

To make reviewing easier, I summarized changes against Custom/Foreign join
v11 patch.

Changes for Join push-down support
==================================
- Add FDW API GetForeignJoinPaths(). It generates a ForeignPath which
represents a scan against pseudo join relation represented by given
RelOptInfo.
- Expand deparsing module to handle multi-relation queries. Steps of
deparsing a join query:

1) Optimizer calls postgresGetForeignPaths() for each BASEREL. Here
postgres_fdw does the same things as before, except adding column aliases
in SELECT clause.
2) Optimizer calls postgresGetForeignJoinPaths() for each JOINREL.
Optimizer calls once per RelOptInfo with reloptkind == RELOPT_JOINREL, so
postgres_fdw should consider both A JOIN B and B JOIN A in one call.

postgres_fdw checks whether the join can be pushed down.

a) Both outer and inner relations can be pushed down (NULL in
RelOptInfo#fdw_private indicates such situation)
b) Outmost command is a SELECT (this can be relaxed in the future)
c) Join type is inner or one of outer
d) Server of all relations in the join are identical
e) Effective user id for all relations in the join are identical (they
might be different some were accessed via views)
f) No local filters (this can be relaxed if inner && non-volatile)
g) Join conditions doesn't contain any "unsafe" expression
h) Remote filter doesn't contain any "unsafe" expression

If all criteria passed, postgres_fdw makes ForeignPath for the join and
store these information in its fdw_private.

a) ForeignPath of outer relation, first non-parameterized one
b) ForeignPath of outer relation, first non-parameterized one
c) join type (as integer)
d) join conditions (as List of Expr)
e) other conditions (as List of Expr)

As of now the costs of the path is not so accurate, this is a possible
enhancement.

2) Optimizer calls postgresGetForeignPlan() for the cheapest topmost Path.
If foreign join is the cheapest way to execute the query, optimizer calls
postgresGetForeignPlan for the topmost path generated by
postgresGetForeignJoinPaths. As Robert and Tom mentioned in the thread,
large_table JOIN huge_table might be removed even (large_table JOIN
huge_table) JOIN small_table is the cheapest in the join level 3, so
postgres_fdw can't assume that paths in lower level survived planning.

To cope with the situation, I'm trying to avoid calling create_plan_recurse()
for underlying paths by putting necessary information into
PgFdwRelationInfo and link it to appropriate RelOptInfo.

Anyway in current version query string is built up from bottom (BASEREL) to
upper recursively. For a join, unerlying outer/inner query are put into
FROM clause with wrapping with parenthesis and aliasing. For example:

select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;

is transformed to a query like this:

SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT bid a9,
bbalance a10, filler a11 FROM public.pgbench_branches) l (a1, a2, a3) INNER
JOIN (SELECT tid a9, bid a10, balance a11, filler a12 FROM
public.pgbench_tellers)
r (a1, a2, a3, a4) ON ((l.a1 = r.a2));

As in the remote query, column aliasing uses attnum-based numbering with
shifted by FirstLowInvalidHeapAttributeNumber to make all attnum positive.
For instance, this system uses alias "a9" for the first user column. For
readability of code around this, I introduced TO_RELATEVE() macro which
converts absolute attnum (-8~) to relative ones (0~). Current deparser can
also handle whole-row references (attnum == 0) correctly.

3) Executor calls BeginForeignScan to initialize a scan. Here TupleDesc is
taken from the slot, not Relation.

Possible enhancement
====================
- Make deparseSelectSql() more general, thus it can handle both simple
SELECT and join SELECT by calling itself recursively. This would avoid
assuming that underlying ForeignPath remains in RelOptInfo. (WIP)
- Move appendConditions() calls into deparse.c, to clarify responsibility
of modules.
- more accurate estimation
- more detailed information for error location (currently "foreign table"
is used as relation name always)

Attachments:

foreign_join_v7.patchapplication/octet-stream; name=foreign_join_v7.patchDownload+1724-746
#17KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#16)

Hanada-san,

Thanks for your dedicated efforts for remote-join feature.
Below are the comments from my side.

* Bug - mixture of ctid system column and whole row-reference
In case when "ctid" system column is required, deparseSelectSql()
adds ctid reference on the base relation scan level.
On the other hands, whole-row reference is transformed to
a reference to the underlying relation. It will work fine if
system column is not specified. However, system column reference
breaks tuple layout from the expected one.
Eventually it leads an error.

postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b;
ERROR: malformed record literal: "(2,2,bbb,"(0,2)")"
DETAIL: Too many columns.
CONTEXT: column "" of foreign table "foreign join"
STATEMENT: select ft1.ctid,ft1 from ft1,ft2 where a=b;

postgres=# explain verbose
select ft1.ctid,ft1 from ft1,ft2 where a=b;
QUERY PLAN
--------------------------------------------------------------------------------
Foreign Scan (cost=200.00..208.35 rows=835 width=70)
Output: ft1.ctid, ft1.*
Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9,
a a10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT
b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1))

"l" of the first SELECT represents a whole-row reference.
However, underlying SELECT contains system columns in its target-
list.

Is it available to construct such a query?
SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ...
^^^^^^^^^^
Probably, it is a job of deparseProjectionSql().

* postgresGetForeignJoinPaths()
It walks on the root->simple_rel_array to check whether
all the relations involved are manged by same foreign
server with same credential.
We may have more graceful way for this. Pay attention on
the fdw_private field of innerrel/outerrel. If they have
a valid fdw_private, it means FDW driver (postgres_fdw)
considers all the underlying relations scan/join are
available to run the remote-server.
So, all we need to check is whether server-id and user-id
of both relations are identical or not.

* merge_fpinfo()
It seems to me fpinfo->rows should be joinrel->rows, and
fpinfo->width also should be joinrel->width.
No need to have special intelligence here, isn't it?

* explain output

EXPLAIN output may be a bit insufficient to know what does it
actually try to do.

postgres=# explain select * from ft1,ft2 where a = b;
QUERY PLAN
--------------------------------------------------------
Foreign Scan (cost=200.00..212.80 rows=1280 width=80)
(1 row)

Even though it is not an immediate request, it seems to me
better to show users joined relations and remote ON/WHERE
clause here.

Please don't hesitate to consult me, if you have any questions.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru Hanada
Sent: Friday, April 03, 2015 7:32 PM
To: Kaigai Kouhei(海外 浩平)
Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown;
pgsql-hackers@postgreSQL.org
Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Attached is the patch which adds join push-down support to postgres_fdw (v7).
It supports SELECT statements with JOIN, but has some more possible enhancements
(see below). I'd like to share the WIP patch here to get comments about new FDW
API design provided by KaiGai-san's v11 patch.

To make reviewing easier, I summarized changes against Custom/Foreign join v11
patch.

Changes for Join push-down support
==================================
- Add FDW API GetForeignJoinPaths(). It generates a ForeignPath which represents
a scan against pseudo join relation represented by given RelOptInfo.
- Expand deparsing module to handle multi-relation queries. Steps of deparsing
a join query:

1) Optimizer calls postgresGetForeignPaths() for each BASEREL. Here
postgres_fdw does the same things as before, except adding column aliases in SELECT
clause.
2) Optimizer calls postgresGetForeignJoinPaths() for each JOINREL. Optimizer
calls once per RelOptInfo with reloptkind == RELOPT_JOINREL, so postgres_fdw
should consider both A JOIN B and B JOIN A in one call.

postgres_fdw checks whether the join can be pushed down.

a) Both outer and inner relations can be pushed down (NULL in
RelOptInfo#fdw_private indicates such situation)
b) Outmost command is a SELECT (this can be relaxed in the future)
c) Join type is inner or one of outer
d) Server of all relations in the join are identical
e) Effective user id for all relations in the join are identical (they might be
different some were accessed via views)
f) No local filters (this can be relaxed if inner && non-volatile)
g) Join conditions doesn't contain any "unsafe" expression
h) Remote filter doesn't contain any "unsafe" expression

If all criteria passed, postgres_fdw makes ForeignPath for the join and store
these information in its fdw_private.

a) ForeignPath of outer relation, first non-parameterized one
b) ForeignPath of outer relation, first non-parameterized one
c) join type (as integer)
d) join conditions (as List of Expr)
e) other conditions (as List of Expr)

As of now the costs of the path is not so accurate, this is a possible enhancement.

2) Optimizer calls postgresGetForeignPlan() for the cheapest topmost Path. If
foreign join is the cheapest way to execute the query, optimizer calls
postgresGetForeignPlan for the topmost path generated by
postgresGetForeignJoinPaths. As Robert and Tom mentioned in the thread,
large_table JOIN huge_table might be removed even (large_table JOIN huge_table)
JOIN small_table is the cheapest in the join level 3, so postgres_fdw can't assume
that paths in lower level survived planning.

To cope with the situation, I'm trying to avoid calling create_plan_recurse()
for underlying paths by putting necessary information into PgFdwRelationInfo and
link it to appropriate RelOptInfo.

Anyway in current version query string is built up from bottom (BASEREL) to upper
recursively. For a join, unerlying outer/inner query are put into FROM clause
with wrapping with parenthesis and aliasing. For example:

select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;

is transformed to a query like this:

SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT bid a9, bbalance
a10, filler a11 FROM public.pgbench_branches) l (a1, a2, a3) INNER JOIN (SELECT
tid a9, bid a10, balance a11, filler a12 FROM public.pgbench_tellers) r (a1, a2,
a3, a4) ON ((l.a1 = r.a2));

As in the remote query, column aliasing uses attnum-based numbering with shifted
by FirstLowInvalidHeapAttributeNumber to make all attnum positive. For instance,
this system uses alias "a9" for the first user column. For readability of code
around this, I introduced TO_RELATEVE() macro which converts absolute attnum
(-8~) to relative ones (0~). Current deparser can also handle whole-row
references (attnum == 0) correctly.

3) Executor calls BeginForeignScan to initialize a scan. Here TupleDesc is taken
from the slot, not Relation.

Possible enhancement
====================
- Make deparseSelectSql() more general, thus it can handle both simple SELECT
and join SELECT by calling itself recursively. This would avoid assuming that
underlying ForeignPath remains in RelOptInfo. (WIP)
- Move appendConditions() calls into deparse.c, to clarify responsibility of
modules.
- more accurate estimation
- more detailed information for error location (currently "foreign table" is used
as relation name always)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#17)

Hi KaiGai-san,

Thanks for the review. Attached is the v8 patch of foreign join support for postgres_fdw.

In addition to your comments, I removed useless code that retrieves ForeignPath from outer/inner RelOptInfo and store them into ForeignPath#fdw_private. Now postgres_fdw’s join pushd-down is free from existence of ForeignPath under the join relation. This means that we can support the case Robert mentioned before, that whole "(huge JOIN large) JOIN small” can be pushed down even if “(huge JOIN large)” is dominated by another join path.

2015-04-07 13:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Thanks for your dedicated efforts for remote-join feature.
Below are the comments from my side.

* Bug - mixture of ctid system column and whole row-reference
In case when "ctid" system column is required, deparseSelectSql()
adds ctid reference on the base relation scan level.
On the other hands, whole-row reference is transformed to
a reference to the underlying relation. It will work fine if
system column is not specified. However, system column reference
breaks tuple layout from the expected one.
Eventually it leads an error.

I too found the bug. As you suggested, deparseProjectionSql() is the place to fix.

postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b;
ERROR: malformed record literal: "(2,2,bbb,"(0,2)")"
DETAIL: Too many columns.
CONTEXT: column "" of foreign table "foreign join"
STATEMENT: select ft1.ctid,ft1 from ft1,ft2 where a=b;

postgres=# explain verbose
select ft1.ctid,ft1 from ft1,ft2 where a=b;
QUERY PLAN
--------------------------------------------------------------------------------
Foreign Scan (cost=200.00..208.35 rows=835 width=70)
Output: ft1.ctid, ft1.*
Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9,
a a10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT
b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1))

"l" of the first SELECT represents a whole-row reference.
However, underlying SELECT contains system columns in its target-
list.

Is it available to construct such a query?
SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ...
^^^^^^^^^^

Simple relation reference such as "l" is not sufficient for the purpose, yes. But putting columns in parentheses would not work when a user column is referenced in original query.

I implemented deparseProjectionSql to use ROW(...) expression for a whole-row reference in the target list, in addition to ordinary column references for actually used columns and ctid.

Please see the test case for mixed use of ctid and whole-row reference to postgres_fdw’s regression tests. Now a whole-row reference in the remote query looks like this:

-- ctid with whole-row reference
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
-> Sort
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
Sort Key: t1.c3, t1.c1
-> Foreign Scan
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a12, l.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1
(8 rows)

In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data, but IMO this would simplify the code for deparsing.

* postgresGetForeignJoinPaths()
It walks on the root->simple_rel_array to check whether
all the relations involved are manged by same foreign
server with same credential.
We may have more graceful way for this. Pay attention on
the fdw_private field of innerrel/outerrel. If they have
a valid fdw_private, it means FDW driver (postgres_fdw)
considers all the underlying relations scan/join are
available to run the remote-server.
So, all we need to check is whether server-id and user-id
of both relations are identical or not.

Exactly. I fixed the code not to loop around.

* merge_fpinfo()
It seems to me fpinfo->rows should be joinrel->rows, and
fpinfo->width also should be joinrel->width.
No need to have special intelligence here, isn't it?

Oops. They are vestige of my struggle which disabled SELECT clause optimization (omit unused columns). Now width and rows are inherited from joinrel. Besides that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simple summary, not average.

* explain output

EXPLAIN output may be a bit insufficient to know what does it
actually try to do.

postgres=# explain select * from ft1,ft2 where a = b;
QUERY PLAN
--------------------------------------------------------
Foreign Scan (cost=200.00..212.80 rows=1280 width=80)
(1 row)

Even though it is not an immediate request, it seems to me
better to show users joined relations and remote ON/WHERE
clause here.

Like this?

Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 rows=1280 width=80)

It might produce a very long line in a case of joining many tables because it contains most of remote query other than SELECT clause, but I prefer detailed. Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN items.

Note that v8 patch doesn’t contain this change yet!

--
Shigeru HANADA

Attachments:

foreign_join_v8.patchapplication/octet-stream; name=foreign_join_v8.patchDownload+1845-824
#19Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Shigeru Hanada (#18)

Sorry , the document portion was not in the v8 patch. Please use v9
patch instead.

2015-04-07 15:53 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:

Hi KaiGai-san,

Thanks for the review. Attached is the v8 patch of foreign join support for postgres_fdw.

In addition to your comments, I removed useless code that retrieves ForeignPath from outer/inner RelOptInfo and store them into ForeignPath#fdw_private. Now postgres_fdw’s join pushd-down is free from existence of ForeignPath under the join relation. This means that we can support the case Robert mentioned before, that whole "(huge JOIN large) JOIN small” can be pushed down even if “(huge JOIN large)” is dominated by another join path.

2015-04-07 13:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Thanks for your dedicated efforts for remote-join feature.
Below are the comments from my side.

* Bug - mixture of ctid system column and whole row-reference
In case when "ctid" system column is required, deparseSelectSql()
adds ctid reference on the base relation scan level.
On the other hands, whole-row reference is transformed to
a reference to the underlying relation. It will work fine if
system column is not specified. However, system column reference
breaks tuple layout from the expected one.
Eventually it leads an error.

I too found the bug. As you suggested, deparseProjectionSql() is the place to fix.

postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b;
ERROR: malformed record literal: "(2,2,bbb,"(0,2)")"
DETAIL: Too many columns.
CONTEXT: column "" of foreign table "foreign join"
STATEMENT: select ft1.ctid,ft1 from ft1,ft2 where a=b;

postgres=# explain verbose
select ft1.ctid,ft1 from ft1,ft2 where a=b;
QUERY PLAN
--------------------------------------------------------------------------------
Foreign Scan (cost=200.00..208.35 rows=835 width=70)
Output: ft1.ctid, ft1.*
Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9,
a a10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT
b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1))

"l" of the first SELECT represents a whole-row reference.
However, underlying SELECT contains system columns in its target-
list.

Is it available to construct such a query?
SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ...
^^^^^^^^^^

Simple relation reference such as "l" is not sufficient for the purpose, yes. But putting columns in parentheses would not work when a user column is referenced in original query.

I implemented deparseProjectionSql to use ROW(...) expression for a whole-row reference in the target list, in addition to ordinary column references for actually used columns and ctid.

Please see the test case for mixed use of ctid and whole-row reference to postgres_fdw’s regression tests. Now a whole-row reference in the remote query looks like this:

-- ctid with whole-row reference
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
-> Sort
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
Sort Key: t1.c3, t1.c1
-> Foreign Scan
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a12, l.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1
(8 rows)

In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data, but IMO this would simplify the code for deparsing.

* postgresGetForeignJoinPaths()
It walks on the root->simple_rel_array to check whether
all the relations involved are manged by same foreign
server with same credential.
We may have more graceful way for this. Pay attention on
the fdw_private field of innerrel/outerrel. If they have
a valid fdw_private, it means FDW driver (postgres_fdw)
considers all the underlying relations scan/join are
available to run the remote-server.
So, all we need to check is whether server-id and user-id
of both relations are identical or not.

Exactly. I fixed the code not to loop around.

* merge_fpinfo()
It seems to me fpinfo->rows should be joinrel->rows, and
fpinfo->width also should be joinrel->width.
No need to have special intelligence here, isn't it?

Oops. They are vestige of my struggle which disabled SELECT clause optimization (omit unused columns). Now width and rows are inherited from joinrel. Besides that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simple summary, not average.

* explain output

EXPLAIN output may be a bit insufficient to know what does it
actually try to do.

postgres=# explain select * from ft1,ft2 where a = b;
QUERY PLAN
--------------------------------------------------------
Foreign Scan (cost=200.00..212.80 rows=1280 width=80)
(1 row)

Even though it is not an immediate request, it seems to me
better to show users joined relations and remote ON/WHERE
clause here.

Like this?

Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 rows=1280 width=80)

It might produce a very long line in a case of joining many tables because it contains most of remote query other than SELECT clause, but I prefer detailed. Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN items.

Note that v8 patch doesn’t contain this change yet!

--
Shigeru HANADA

--
Shigeru HANADA

Attachments:

foreign_join_v9.patchapplication/octet-stream; name=foreign_join_v9.patchDownload+1866-829
#20KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#19)

Hanada-san,

In addition to your comments, I removed useless code that retrieves ForeignPath
from outer/inner RelOptInfo and store them into ForeignPath#fdw_private. Now
postgres_fdw’s join pushd-down is free from existence of ForeignPath under the
join relation. This means that we can support the case Robert mentioned before,
that whole "(huge JOIN large) JOIN small” can be pushed down even if “(huge JOIN
large)” is dominated by another join path.

Yes, it's definitely reasonable design, and fits intention of the interface.
I should point out it from the beginning. :-)

"l" of the first SELECT represents a whole-row reference.
However, underlying SELECT contains system columns in its target-
list.

Is it available to construct such a query?
SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ...
^^^^^^^^^^

Simple relation reference such as "l" is not sufficient for the purpose, yes.
But putting columns in parentheses would not work when a user column is referenced
in original query.

I implemented deparseProjectionSql to use ROW(...) expression for a whole-row
reference in the target list, in addition to ordinary column references for
actually used columns and ctid.

Please see the test case for mixed use of ctid and whole-row reference to
postgres_fdw’s regression tests. Now a whole-row reference in the remote query
looks like this:

It seems to me that deparseProjectionSql() works properly.

-- ctid with whole-row reference
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
t1.c1 OFFSET 100 LIMIT 10;

----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------------------------------------------------
Limit
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
-> Sort
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
Sort Key: t1.c3, t1.c1
-> Foreign Scan
Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7,
ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a12, l.a10 FROM
(SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1
(8 rows)

In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data,
but IMO this would simplify the code for deparsing.

I agree. Even if we can reduce networking cost a little, tuple
construction takes CPU cycles. Your decision is fair enough for
me.

* merge_fpinfo()
It seems to me fpinfo->rows should be joinrel->rows, and
fpinfo->width also should be joinrel->width.
No need to have special intelligence here, isn't it?

Oops. They are vestige of my struggle which disabled SELECT clause optimization
(omit unused columns). Now width and rows are inherited from joinrel. Besides
that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simple
summary, not average.

Does fpinfo->fdw_startup_cost represent a cost to open connection to remote
PostgreSQL, doesn't it?

postgres_fdw.c:1757 says as follows:

/*
* Add some additional cost factors to account for connection overhead
* (fdw_startup_cost), transferring data across the network
* (fdw_tuple_cost per retrieved row), and local manipulation of the data
* (cpu_tuple_cost per retrieved row).
*/

If so, does a ForeignScan that involves 100 underlying relation takes 100
times heavy network operations on startup? Probably, no.
I think, average is better than sum, and max of them will reflect the cost
more correctly.
Also, fdw_tuple_cost introduce the cost of data transfer over the network.
I thinks, weighted average is the best strategy, like:
fpinfo->fdw_tuple_cost =
(fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_o->fdw_tuple_cost +
(fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_i->fdw_tuple_cost;

That's just my suggestion. Please apply the best way you thought.

* explain output

EXPLAIN output may be a bit insufficient to know what does it
actually try to do.

postgres=# explain select * from ft1,ft2 where a = b;
QUERY PLAN
--------------------------------------------------------
Foreign Scan (cost=200.00..212.80 rows=1280 width=80)
(1 row)

Even though it is not an immediate request, it seems to me
better to show users joined relations and remote ON/WHERE
clause here.

Like this?

Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80
rows=1280 width=80)

No. This line is produced by ExplainScanTarget(), so it requires the
backend knowledge to individual FDW.
Rather than the backend, postgresExplainForeignScan() shall produce
the output.

It might produce a very long line in a case of joining many tables because it
contains most of remote query other than SELECT clause, but I prefer detailed.
Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN
items.

It is good, if postgres_fdw can generate relations name involved in
the join for each level, and join cond/remote filter individually.

Note that v8 patch doesn’t contain this change yet!

It is a "nice to have" feature. So, I don't think the first commit needs
to support this. Just a suggestion in the next step.

* implementation suggestion

At the deparseJoinSql(),

+   /* print SELECT clause of the join scan */
+   initStringInfo(&selbuf);
+   i = 0;
+   foreach(lc, baserel->reltargetlist)
+   {
+       Var        *var = (Var *) lfirst(lc);
+       TargetEntry *tle;
+
+       if (i > 0)
+           appendStringInfoString(&selbuf, ", ");
+       deparseJoinVar(var, &context);
+
+       tle = makeTargetEntry((Expr *) copyObject(var),
+                             i + 1, pstrdup(""), false);
+       if (fdw_ps_tlist)
+           *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle));
+
+       *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);
+
+       i++;
+   }

The tle is a copy of the original target-entry, and var-node is also
copied one. Why is the tle copied on lappend() again?
Also, NULL as acceptable as 3rd argument of makeTargetEntry.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#20)
#22KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#21)
#23Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#22)
#24KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#23)
#25Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#24)
#26KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#25)
#27Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#26)
#28KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#27)
#29Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#28)
#30KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#29)
#31Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#14)
#32KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#31)
#33Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Shigeru Hanada (#29)
#34Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#14)
#35Robert Haas
robertmhaas@gmail.com
In reply to: KaiGai Kohei (#32)
#36Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Ashutosh Bapat (#33)
#37Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Ashutosh Bapat (#33)
#38Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Shigeru Hanada (#36)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Shigeru Hanada (#37)