[idea] more aggressive join pushdown on postgres_fdw
Hi,
Yesterday, JPUG held an unconference event at Tokyo, and
Hanada-san had a talk about join-pushdown feature of
postgres_fdw.
At this talk, someone proposed an interesting idea to
make join pushdown more aggressive/effective.
Let me share it with pgsql-hackers.
He said, we may have a workload to join a large foreign-
scan and a small local-scan regardless of the plan type.
For example:
joinrel (expected nrows = 5)
+ outerrel ForeignScan (expected nrows = 1000000)
+ innerrel LocalScan (expected nrows = 5)
In this case, we may be able to run the entire joinrel
on the remote side then fetch just 5 rows, if fdw-driver
construct VALUES() clause according to the contents of
LocalScan then makes an entire join query with another
one kept in ForeignScan.
If above ForeignScan have the following remote query,
SELECT a, b, c FROM t0 WHERE d < 1000000
we may be able to construct the query below to run remote
join with local (small) relation.
SELECT a, b, c, x, y FROM
(SELECT a, b, c FROM t0 WHERE d < 1000000) AS ft
JOIN
(VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'),
(4,'ddd'), (5,'eee')) AS lt (x, y)
ON ft.a = lt.x
The VALUES clauses can be mechanically constructed according
to the result set of LocalScan, and it is not difficult to
make such a remote query on top of the existing ForeignScan.
In the result, it will reduce amount of network traffic and
CPU cycles to form/deform tuples dramatically.
I don't intend to implement this idea urgently (of course,
join pushdown for both ForeignScan case has higher priority),
however, it makes sense to keep the future direction in mind.
Also, as an aside, even though Hanada-san mentioned ForeignScan
does not need an infrastructure to initialize child path nodes,
this idea may require ForeignScan to have local child path.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Yesterday, JPUG held an unconference event at Tokyo, and
Hanada-san had a talk about join-pushdown feature of
postgres_fdw.
At this talk, someone proposed an interesting idea to
make join pushdown more aggressive/effective.
Let me share it with pgsql-hackers.He said, we may have a workload to join a large foreign-
scan and a small local-scan regardless of the plan type.For example: joinrel (expected nrows = 5) + outerrel ForeignScan (expected nrows = 1000000) + innerrel LocalScan (expected nrows = 5)In this case, we may be able to run the entire joinrel
on the remote side then fetch just 5 rows, if fdw-driver
construct VALUES() clause according to the contents of
LocalScan then makes an entire join query with another
one kept in ForeignScan.If above ForeignScan have the following remote query,
SELECT a, b, c FROM t0 WHERE d < 1000000
we may be able to construct the query below to run remote
join with local (small) relation.SELECT a, b, c, x, y FROM
(SELECT a, b, c FROM t0 WHERE d < 1000000) AS ft
JOIN
(VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'),
(4,'ddd'), (5,'eee')) AS lt (x, y)
ON ft.a = lt.xThe VALUES clauses can be mechanically constructed according
to the result set of LocalScan, and it is not difficult to
make such a remote query on top of the existing ForeignScan.
In the result, it will reduce amount of network traffic and
CPU cycles to form/deform tuples dramatically.I don't intend to implement this idea urgently (of course,
join pushdown for both ForeignScan case has higher priority),
however, it makes sense to keep the future direction in mind.Also, as an aside, even though Hanada-san mentioned ForeignScan
does not need an infrastructure to initialize child path nodes,
this idea may require ForeignScan to have local child path.
Neat idea. This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server? Perhaps both cases are worth thinking about at
some point.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Yesterday, JPUG held an unconference event at Tokyo, and
Hanada-san had a talk about join-pushdown feature of
postgres_fdw.
At this talk, someone proposed an interesting idea to
make join pushdown more aggressive/effective.
Let me share it with pgsql-hackers.He said, we may have a workload to join a large foreign-
scan and a small local-scan regardless of the plan type.For example: joinrel (expected nrows = 5) + outerrel ForeignScan (expected nrows = 1000000) + innerrel LocalScan (expected nrows = 5)In this case, we may be able to run the entire joinrel
on the remote side then fetch just 5 rows, if fdw-driver
construct VALUES() clause according to the contents of
LocalScan then makes an entire join query with another
one kept in ForeignScan.If above ForeignScan have the following remote query,
SELECT a, b, c FROM t0 WHERE d < 1000000
we may be able to construct the query below to run remote
join with local (small) relation.SELECT a, b, c, x, y FROM
(SELECT a, b, c FROM t0 WHERE d < 1000000) AS ft
JOIN
(VALUES (1,'aaa'), (2,'bbb'), (3,'ccc'),
(4,'ddd'), (5,'eee')) AS lt (x, y)
ON ft.a = lt.xThe VALUES clauses can be mechanically constructed according
to the result set of LocalScan, and it is not difficult to
make such a remote query on top of the existing ForeignScan.
In the result, it will reduce amount of network traffic and
CPU cycles to form/deform tuples dramatically.I don't intend to implement this idea urgently (of course,
join pushdown for both ForeignScan case has higher priority),
however, it makes sense to keep the future direction in mind.Also, as an aside, even though Hanada-san mentioned ForeignScan
does not need an infrastructure to initialize child path nodes,
this idea may require ForeignScan to have local child path.Neat idea. This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server? Perhaps both cases are worth thinking about at
some point.
I think, here is both merit and de-merit for each. It implies either of
them never always-better-strategy.
* Push out local table as VALUES(...) clause
Good: No restriction to functions/operators in the local scan or
underlying plan node.
Bad: High cost for data format modification (HeapTupleSlot =>
VALUES(...) clause in text), and 2-way data transfer.
* Remote join between foreign table and replicated table
Good: Data already exists on remote side, no need to kick out
contents of local relation (and no need to consume CPU
cycle to make VALUES() clause).
Bad: Functions/operators are restricted as existing postgres_fdw
is doing. Only immutable and built-in ones are available to
run on the remote side.
BTW, do we need either of tables being foreign table, if entire database
is (synchronously) replicated?
Also, loopback server may be a candidate even if not replicated (although
it may be an entrance of deadlock heaven).
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Neat idea. This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server? Perhaps both cases are worth thinking about at
some point.I think, here is both merit and de-merit for each. It implies either of
them never always-better-strategy.* Push out local table as VALUES(...) clause
Good: No restriction to functions/operators in the local scan or
underlying plan node.
Bad: High cost for data format modification (HeapTupleSlot =>
VALUES(...) clause in text), and 2-way data transfer.* Remote join between foreign table and replicated table
Good: Data already exists on remote side, no need to kick out
contents of local relation (and no need to consume CPU
cycle to make VALUES() clause).
Bad: Functions/operators are restricted as existing postgres_fdw
is doing. Only immutable and built-in ones are available to
run on the remote side.
Sure.
BTW, do we need either of tables being foreign table, if entire database
is (synchronously) replicated?
Also, loopback server may be a candidate even if not replicated (although
it may be an entrance of deadlock heaven).
I suppose it's possible that this sort of thing could work out to a
win, but I think it's much less likely to work out than pushing down a
foreign/local join using either the VALUES trick or a replicated copy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jun 4, 2015 at 9:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Neat idea. This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server? Perhaps both cases are worth thinking about at
some point.I think, here is both merit and de-merit for each. It implies either of
them never always-better-strategy.* Push out local table as VALUES(...) clause
Good: No restriction to functions/operators in the local scan or
underlying plan node.
Bad: High cost for data format modification (HeapTupleSlot =>
VALUES(...) clause in text), and 2-way data transfer.* Remote join between foreign table and replicated table
Good: Data already exists on remote side, no need to kick out
contents of local relation (and no need to consume CPU
cycle to make VALUES() clause).
Bad: Functions/operators are restricted as existing postgres_fdw
is doing. Only immutable and built-in ones are available to
run on the remote side.Sure.
BTW, do we need either of tables being foreign table, if entire database
is (synchronously) replicated?
Also, loopback server may be a candidate even if not replicated (although
it may be an entrance of deadlock heaven).I suppose it's possible that this sort of thing could work out to a
win, but I think it's much less likely to work out than pushing down a
foreign/local join using either the VALUES trick or a replicated copy.
Hmm, it might be too aggressive approach.
If we would try to implement, postgres_fdw will need to add so many junk
paths (expensive than usual local ones) to consider remote join between
replicated local tables.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
2015/06/05 6:43、Robert Haas <robertmhaas@gmail.com> のメール:
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Neat idea. This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server? Perhaps both cases are worth thinking about at
some point.
Interesting, but I’m not sure that I understood the situation.
Here which kind of replication method do you mean? I guess you assume some kind of per-table replication such as Slony-I or materialized views with postgres_fdw or dblink, in postgres_fdw case. If this assumption is correct, we need a mapping between a local ordinary table and a foreign table which points remote replicated table.
--
Shigeru HANADA
shigeru.hanada@gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 5, 2015 at 5:51 AM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
2015/06/05 6:43、Robert Haas <robertmhaas@gmail.com> のメール:
On Sat, May 30, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Neat idea. This ties into something I've thought about and mentioned
before: what if the innerrel is local, but there's a replicated copy
on the remote server? Perhaps both cases are worth thinking about at
some point.Interesting, but I’m not sure that I understood the situation.
Here which kind of replication method do you mean? I guess you assume some kind of per-table replication such as Slony-I or materialized views with postgres_fdw or dblink, in postgres_fdw case. If this assumption is correct, we need a mapping between a local ordinary table and a foreign table which points remote replicated table.
Right. I was thinking of BDR, in particular, or some future future
in-core feature which might be similar, but Slony could do the same
thing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers