Foreign join pushdown vs EvalPlanQual
Hi,
While reviewing the foreign join pushdown core patch, I noticed that the
patch doesn't perform an EvalPlanQual recheck properly. The example
that crashes the server will be shown below (it uses the postgres_fdw
patch [1]/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com). I think the reason for that is because the ForeignScan node
performing the foreign join remotely has scanrelid = 0 while
ExecScanFetch assumes that its scan node has scanrelid > 0.
I think this is a bug. I've not figured out how to fix this yet, but I
thought we would also need another plan that evaluates the join locally
for the test tuples for EvalPlanQual. Though I'm missing something though.
Create an environment:
postgres=# create table tab (a int, b int);
CREATE TABLE
postgres=# create foreign table foo (a int) server myserver options
(table_name 'foo');
CREATE FOREIGN TABLE
postgres=# create foreign table bar (a int) server myserver options
(table_name 'bar');
CREATE FOREIGN TABLE
postgres=# insert into tab values (1, 1);
INSERT 0 1
postgres=# insert into foo values (1);
INSERT 0 1
postgres=# insert into bar values (1);
INSERT 0 1
postgres=# analyze tab;
ANALYZE
postgres=# analyze foo;
ANALYZE
postgres=# analyze bar;
ANALYZE
Run the example:
[Terminal 1]
postgres=# begin;
BEGIN
postgres=# update tab set b = b + 1 where a = 1;
UPDATE 1
[Terminal 2]
postgres=# explain verbose select tab.* from tab, foo, bar where tab.a =
foo.a and foo.a = bar.a for update;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
LockRows (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
-> Nested Loop (cost=100.00..101.14 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
Join Filter: (foo.a = tab.a)
-> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
-> Foreign Scan (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT
ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1,
a2) INNER
JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR
UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))
(11 rows)
postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a
= bar.a for update;
[Terminal 1]
postgres=# commit;
COMMIT
[Terminal 2]
(After the commit in Terminal 1, Terminal 2 will show the following.)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
Best regards,
Etsuro Fujita
[1]: /messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Does it make sense to put the result tuple of remote join on evety
estate->es_epqTupleSet[] slot represented by this ForeignScan if
scanrelid==0?
It allows to recheck qualifier for each LockRow that intends to lock
base foreign table underlying the remote join.
ForeignScan->fdw_relids tells us which rtindexes are represented
by this ForeignScan, so infrastructure side may be able to handle.
Thanks,
2015-06-24 11:40 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
Hi,
While reviewing the foreign join pushdown core patch, I noticed that the
patch doesn't perform an EvalPlanQual recheck properly. The example
that crashes the server will be shown below (it uses the postgres_fdw
patch [1]). I think the reason for that is because the ForeignScan node
performing the foreign join remotely has scanrelid = 0 while
ExecScanFetch assumes that its scan node has scanrelid > 0.I think this is a bug. I've not figured out how to fix this yet, but I
thought we would also need another plan that evaluates the join locally
for the test tuples for EvalPlanQual. Though I'm missing something though.Create an environment:
postgres=# create table tab (a int, b int);
CREATE TABLE
postgres=# create foreign table foo (a int) server myserver options
(table_name 'foo');
CREATE FOREIGN TABLE
postgres=# create foreign table bar (a int) server myserver options
(table_name 'bar');
CREATE FOREIGN TABLE
postgres=# insert into tab values (1, 1);
INSERT 0 1
postgres=# insert into foo values (1);
INSERT 0 1
postgres=# insert into bar values (1);
INSERT 0 1
postgres=# analyze tab;
ANALYZE
postgres=# analyze foo;
ANALYZE
postgres=# analyze bar;
ANALYZERun the example:
[Terminal 1]
postgres=# begin;
BEGIN
postgres=# update tab set b = b + 1 where a = 1;
UPDATE 1[Terminal 2]
postgres=# explain verbose select tab.* from tab, foo, bar where tab.a =
foo.a and foo.a = bar.a for update;QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
LockRows (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
-> Nested Loop (cost=100.00..101.14 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
Join Filter: (foo.a = tab.a)
-> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
-> Foreign Scan (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT
ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1,
a2) INNER
JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR
UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))
(11 rows)postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a
= bar.a for update;[Terminal 1]
postgres=# commit;
COMMIT[Terminal 2]
(After the commit in Terminal 1, Terminal 2 will show the following.)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>Best regards,
Etsuro Fujita[1]
/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fujita-san,
Does it make sense to put the result tuple of remote join on evety
estate->es_epqTupleSet[] slot represented by this ForeignScan if
scanrelid==0?
Sorry, I misunderstood behavior of the es_epqTupleSet[].
I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.
For details:
The es_epqTuple[] is set by EvalPlanQualSetTuple(). It put a tuple
exactly reflects a particular base relation (that has positive rtindex).
Even if it is a foreign-table, ExecLockRows() put a tuple dynamically
constructed via whole-row-reference at EvalPlanQualFetchRowMarks().
So, regardless of copy or reference to heap, we can expect es_epqTuple[]
keeps tuples of the base relations for each.
On the other hands, ForeignScan that replaced local join by remote
join has a valid fdw_scan_tlist list. It contains expression node
to construct individual attribute of the pseudo scan target-list.
So, all we need to do is, (1) if scanrelid == 0 on ExecScanFetch(),
(2) it should be ForeignScan or CustomScan, with *_scan_tlist.
(3) then, we reconstruct a tuple of the pseudo scan based on the
*_scan_tlist, instead of simple reference to es_epqTupleSet[],
(4) and, evaluate local qualifiers of the node.
How about your thought?
BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/
Also, as an aside, postgres_fdw does not implement RefetchForeignRow()
at this moment. Does it make a problem?
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
Sent: Wednesday, June 24, 2015 10:02 PM
To: Etsuro Fujita
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQualDoes it make sense to put the result tuple of remote join on evety
estate->es_epqTupleSet[] slot represented by this ForeignScan if
scanrelid==0?It allows to recheck qualifier for each LockRow that intends to lock
base foreign table underlying the remote join.
ForeignScan->fdw_relids tells us which rtindexes are represented
by this ForeignScan, so infrastructure side may be able to handle.Thanks,
2015-06-24 11:40 GMT+09:00 Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>:
Hi,
While reviewing the foreign join pushdown core patch, I noticed that the
patch doesn't perform an EvalPlanQual recheck properly. The example
that crashes the server will be shown below (it uses the postgres_fdw
patch [1]). I think the reason for that is because the ForeignScan node
performing the foreign join remotely has scanrelid = 0 while
ExecScanFetch assumes that its scan node has scanrelid > 0.I think this is a bug. I've not figured out how to fix this yet, but I
thought we would also need another plan that evaluates the join locally
for the test tuples for EvalPlanQual. Though I'm missing something though.Create an environment:
postgres=# create table tab (a int, b int);
CREATE TABLE
postgres=# create foreign table foo (a int) server myserver options
(table_name 'foo');
CREATE FOREIGN TABLE
postgres=# create foreign table bar (a int) server myserver options
(table_name 'bar');
CREATE FOREIGN TABLE
postgres=# insert into tab values (1, 1);
INSERT 0 1
postgres=# insert into foo values (1);
INSERT 0 1
postgres=# insert into bar values (1);
INSERT 0 1
postgres=# analyze tab;
ANALYZE
postgres=# analyze foo;
ANALYZE
postgres=# analyze bar;
ANALYZERun the example:
[Terminal 1]
postgres=# begin;
BEGIN
postgres=# update tab set b = b + 1 where a = 1;
UPDATE 1[Terminal 2]
postgres=# explain verbose select tab.* from tab, foo, bar where tab.a =
foo.a and foo.a = bar.a for update;QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------LockRows (cost=100.00..101.18 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
-> Nested Loop (cost=100.00..101.14 rows=4 width=70)
Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
Join Filter: (foo.a = tab.a)
-> Seq Scan on public.tab (cost=0.00..1.01 rows=1 width=14)
Output: tab.a, tab.b, tab.ctid
-> Foreign Scan (cost=100.00..100.08 rows=4 width=64)
Output: foo.*, foo.a, bar.*, bar.a
Relations: (public.foo) INNER JOIN (public.bar)
Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT
ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1,
a2) INNER
JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR
UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))
(11 rows)postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a
= bar.a for update;[Terminal 1]
postgres=# commit;
COMMIT[Terminal 2]
(After the commit in Terminal 1, Terminal 2 will show the following.)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>Best regards,
Etsuro Fujita[1]
/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj
8wTze+CYJUHg@mail.gmail.com--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi KaiGai-san,
I'd like to work on this issue with you!
On 2015/06/25 10:48, Kouhei Kaigai wrote:
I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.
Maybe I'm missing something, but I don't think your proposal works
properly because we don't have any component ForeignScan state node or
subsidiary join state node once we've replaced the entire join with the
ForeignScan performing the join remotely, IIUC. So, my image was to
have another subplan for EvalPlanQual as well as the ForeignScan, to do
the entire join locally for the component test tuples if we are inside
an EvalPlanQual recheck.
BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/
OK
Also, as an aside, postgres_fdw does not implement RefetchForeignRow()
at this moment. Does it make a problem?
I don't think so, though I think it would be better to test that the
foreign join pushdown API patch also allows late row locking using the
postgres_fdw.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fujita-san,
BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/OK
Did you forget to attach the patch, or v17 is in use?
I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.Maybe I'm missing something, but I don't think your proposal works
properly because we don't have any component ForeignScan state node or
subsidiary join state node once we've replaced the entire join with the
ForeignScan performing the join remotely, IIUC. So, my image was to
have another subplan for EvalPlanQual as well as the ForeignScan, to do
the entire join locally for the component test tuples if we are inside
an EvalPlanQual recheck.
Hmm... Probably, we have two standpoints to tackle the problem.
The first standpoint tries to handle the base foreign table as
a prime relation for locking. Thus, we have to provide a way to
fetch a remote tuple identified with the supplied ctid.
The advantage of this approach is the way to fetch tuples from
base relation is quite similar to the existing form, however,
its disadvantage is another side of the same coin, because the
ForeignScan node with scanrelid==0 (that represents remote join
query) may have local qualifiers which shall run on the tuple
according to fdw_scan_tlist.
One other standpoint tries to handle a bunch of base foreign
tables as a unit. That means, if any of base foreign table is
the target of locking, it prompts FDW driver to fetch the latest
"joined" tuple identified by "ctid", even if this join contains
multiple base relations to be locked.
The advantage of this approach is that we can use qualifiers of
the ForeignScan node with scanrelid==0 and no need to pay attention
of remote qualifier and/or join condition individually.
Its disadvantage is, we may extend EState structure to keep the
"joined" tuples, in addition to es_epqTupleSet[].
I'm inclined to think the later standpoint works well, because
it does not need to reproduce an alternative execution path in
local side again, even if a ForeignScan node represents much
complicated remote query.
If we would fetch tuples of individual base relations, we need
to reconstruct identical join path to be executed on remote-
side, don't it?
IIUC, the purpose of EvalPlanQual() is to ensure the tuples to
be locked is still visible, so it is not an essential condition
to fetch base tuples individually.
Just an aside, please tell me if someone know, does EvalPlanQual
logic work correctly even if the tuple to be locked located in
the right tree of HashJoin?
In this case, it seems to me ExecHashJoin does not refresh Hash
table again even if ExecProcNode() is invoked with es_epqTupleSet[],
thus, old tuple is already visible and checked, isn't it?
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Etsuro Fujita
Sent: Thursday, June 25, 2015 3:12 PM
To: Kaigai Kouhei(海外 浩平)
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQualHi KaiGai-san,
I'd like to work on this issue with you!
On 2015/06/25 10:48, Kouhei Kaigai wrote:
I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.Maybe I'm missing something, but I don't think your proposal works
properly because we don't have any component ForeignScan state node or
subsidiary join state node once we've replaced the entire join with the
ForeignScan performing the join remotely, IIUC. So, my image was to
have another subplan for EvalPlanQual as well as the ForeignScan, to do
the entire join locally for the component test tuples if we are inside
an EvalPlanQual recheck.BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/OK
Also, as an aside, postgres_fdw does not implement RefetchForeignRow()
at this moment. Does it make a problem?I don't think so, though I think it would be better to test that the
foreign join pushdown API patch also allows late row locking using the
postgres_fdw.Best regards,
Etsuro Fujita--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi KaiGai-san,
On 2015/06/27 21:09, Kouhei Kaigai wrote:
BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/
OK
Did you forget to attach the patch, or v17 is in use?
Sorry, I made a mistake. The problem was produced using v16 [1]/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com.
I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.
Maybe I'm missing something, but I don't think your proposal works
properly because we don't have any component ForeignScan state node or
subsidiary join state node once we've replaced the entire join with the
ForeignScan performing the join remotely, IIUC. So, my image was to
have another subplan for EvalPlanQual as well as the ForeignScan, to do
the entire join locally for the component test tuples if we are inside
an EvalPlanQual recheck.
Hmm... Probably, we have two standpoints to tackle the problem.
The first standpoint tries to handle the base foreign table as
a prime relation for locking. Thus, we have to provide a way to
fetch a remote tuple identified with the supplied ctid.
The advantage of this approach is the way to fetch tuples from
base relation is quite similar to the existing form, however,
its disadvantage is another side of the same coin, because the
ForeignScan node with scanrelid==0 (that represents remote join
query) may have local qualifiers which shall run on the tuple
according to fdw_scan_tlist.
IIUC, I think this approach would also need to evaluate join conditions
and remote qualifiers in addition to local qualifiers in the local, for
component tuples that were re-fetched from the remote (and remaining
component tuples that were copied from whole-row vars, if any), in cases
where the re-fetched tuples were updated versions of those tuples rather
than the same versions priviously obtained.
One other standpoint tries to handle a bunch of base foreign
tables as a unit. That means, if any of base foreign table is
the target of locking, it prompts FDW driver to fetch the latest
"joined" tuple identified by "ctid", even if this join contains
multiple base relations to be locked.
The advantage of this approach is that we can use qualifiers of
the ForeignScan node with scanrelid==0 and no need to pay attention
of remote qualifier and/or join condition individually.
Its disadvantage is, we may extend EState structure to keep the
"joined" tuples, in addition to es_epqTupleSet[].
That is an idea. However, ISTM there is another disadvantage; that is
not efficient because that would need to perform another remote join
query having such additional conditions during an EvalPlanQual check, as
you proposed.
I'm inclined to think the later standpoint works well, because
it does not need to reproduce an alternative execution path in
local side again, even if a ForeignScan node represents much
complicated remote query.
If we would fetch tuples of individual base relations, we need
to reconstruct identical join path to be executed on remote-
side, don't it?
Yeah, that was my image for fixing this issue.
IIUC, the purpose of EvalPlanQual() is to ensure the tuples to
be locked is still visible, so it is not an essential condition
to fetch base tuples individually.
I think so too, but taking the similarity and/or efficiency of
processing into consideration, I would vote for the idea of having an
alternative execution path in the local. That would also allow FDW
authors to write the foreign join pushdown functionality in their FDWs
by smaller efforts.
Best regards,
Etsuro Fujita
[1]: /messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
/messages/by-id/CAEZqfEe9KGy=1_waGh2rgZPg0o4pqgD+iauYaj8wTze+CYJUHg@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Fujita-san,
Sorry for my late.
On 2015/06/27 21:09, Kouhei Kaigai wrote:
BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/OK
Did you forget to attach the patch, or v17 is in use?
Sorry, I made a mistake. The problem was produced using v16 [1].
I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.Maybe I'm missing something, but I don't think your proposal works
properly because we don't have any component ForeignScan state node or
subsidiary join state node once we've replaced the entire join with the
ForeignScan performing the join remotely, IIUC. So, my image was to
have another subplan for EvalPlanQual as well as the ForeignScan, to do
the entire join locally for the component test tuples if we are inside
an EvalPlanQual recheck.Hmm... Probably, we have two standpoints to tackle the problem.
The first standpoint tries to handle the base foreign table as
a prime relation for locking. Thus, we have to provide a way to
fetch a remote tuple identified with the supplied ctid.
The advantage of this approach is the way to fetch tuples from
base relation is quite similar to the existing form, however,
its disadvantage is another side of the same coin, because the
ForeignScan node with scanrelid==0 (that represents remote join
query) may have local qualifiers which shall run on the tuple
according to fdw_scan_tlist.IIUC, I think this approach would also need to evaluate join conditions
and remote qualifiers in addition to local qualifiers in the local, for
component tuples that were re-fetched from the remote (and remaining
component tuples that were copied from whole-row vars, if any), in cases
where the re-fetched tuples were updated versions of those tuples rather
than the same versions priviously obtained.One other standpoint tries to handle a bunch of base foreign
tables as a unit. That means, if any of base foreign table is
the target of locking, it prompts FDW driver to fetch the latest
"joined" tuple identified by "ctid", even if this join contains
multiple base relations to be locked.
The advantage of this approach is that we can use qualifiers of
the ForeignScan node with scanrelid==0 and no need to pay attention
of remote qualifier and/or join condition individually.
Its disadvantage is, we may extend EState structure to keep the
"joined" tuples, in addition to es_epqTupleSet[].That is an idea. However, ISTM there is another disadvantage; that is
not efficient because that would need to perform another remote join
query having such additional conditions during an EvalPlanQual check, as
you proposed.I'm inclined to think the later standpoint works well, because
it does not need to reproduce an alternative execution path in
local side again, even if a ForeignScan node represents much
complicated remote query.
If we would fetch tuples of individual base relations, we need
to reconstruct identical join path to be executed on remote-
side, don't it?Yeah, that was my image for fixing this issue.
IIUC, the purpose of EvalPlanQual() is to ensure the tuples to
be locked is still visible, so it is not an essential condition
to fetch base tuples individually.I think so too, but taking the similarity and/or efficiency of
processing into consideration, I would vote for the idea of having an
alternative execution path in the local. That would also allow FDW
authors to write the foreign join pushdown functionality in their FDWs
by smaller efforts.
Even though I'd like to see committer's opinion, I could not come up
with the idea better than what you proposed; foreign-/custom-scan
has alternative plan if scanrelid==0.
Let me introduce a few cases we should pay attention.
Foreign/CustomScan node may stack; that means a Foreign/CustomScan node
may have child node that includes another Foreign/CustomScan node with
scanrelid==0.
(At this moment, ForeignScan cannot have child node, however, more
aggressive push-down [1]/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp will need same feature to fetch tuples from
local relation and construct VALUES() clause.)
In this case, the highest Foreign/CustomScan node (that is also nearest
to LockRows or ModifyTuples) run the alternative sub-plan that includes
scan/join plans dominated by fdw_relids or custom_relids.
For example:
LockRows
-> HashJoin
-> CustomScan (AliceJoin)
-> SeqScan on t1
-> CustomScan (CarolJoin)
-> SeqScan on t2
-> SeqScan on t3
-> Hash
-> CustomScan (BobJoin)
-> SeqScan on t4
-> ForeignScan (remote join involves ft5, ft6)
In this case, AliceJoin will have alternative sub-plan to join t1, t2
and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will
have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the
ForeignScan will also have alternative sub-plan, however, these are
not used in this case.
Probably, it works fine.
Do we have potential scenario if foreign-/custom-join is located over
LockRows node. (Subquery expansion may give such a case?)
Anyway, doesn't it make a problem, does it?
On the next step, how do we implement this design?
I guess that planner needs to keep a path that contains neither
foreign-join nor custom-join with scanrelid==0.
Probably, "cheapest_builtin_path" of RelOptInfo is needed that
never contains these remote/custom join logic, as a seed of
alternative sub-plan.
create_foreignscan_plan() or create_customscan_plan() will be
able to construct these alternative plan, regardless of the
extensions. So, individual FDW/CSP don't need to care about
this alternative sub-plan, do it?
After that, once ExecScanFetch() is called under EvalPlanQual(),
these Foreign/CustomScan with scanrelid==0 runs the alternative
sub-plan, to validate the latest tuple.
Hmm... It looks to me a workable approach.
Fujita-san, are you available to make a patch with this approach?
If so, I'd like to volunteer its reviewing.
[1]: /messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi KaiGai-san,
On 2015/07/02 18:31, Kouhei Kaigai wrote:
Even though I'd like to see committer's opinion, I could not come up
with the idea better than what you proposed; foreign-/custom-scan
has alternative plan if scanrelid==0.
I'd also like to hear the opinion!
Let me introduce a few cases we should pay attention.
Foreign/CustomScan node may stack; that means a Foreign/CustomScan node
may have child node that includes another Foreign/CustomScan node with
scanrelid==0.
(At this moment, ForeignScan cannot have child node, however, more
aggressive push-down [1] will need same feature to fetch tuples from
local relation and construct VALUES() clause.)
In this case, the highest Foreign/CustomScan node (that is also nearest
to LockRows or ModifyTuples) run the alternative sub-plan that includes
scan/join plans dominated by fdw_relids or custom_relids.For example:
LockRows
-> HashJoin
-> CustomScan (AliceJoin)
-> SeqScan on t1
-> CustomScan (CarolJoin)
-> SeqScan on t2
-> SeqScan on t3
-> Hash
-> CustomScan (BobJoin)
-> SeqScan on t4
-> ForeignScan (remote join involves ft5, ft6)In this case, AliceJoin will have alternative sub-plan to join t1, t2
and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will
have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the
ForeignScan will also have alternative sub-plan, however, these are
not used in this case.
Probably, it works fine.
Yeah, I think so too.
Do we have potential scenario if foreign-/custom-join is located over
LockRows node. (Subquery expansion may give such a case?)
Anyway, doesn't it make a problem, does it?
IIUC, I don't think we have such a case.
On the next step, how do we implement this design?
I guess that planner needs to keep a path that contains neither
foreign-join nor custom-join with scanrelid==0.
Probably, "cheapest_builtin_path" of RelOptInfo is needed that
never contains these remote/custom join logic, as a seed of
alternative sub-plan.
Yeah, I think so too, but I've not fugiured out how to implement this yet.
To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5. It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2]/messages/by-id/5451.1426271510@sss.pgh.pa.us; (1) to
place that API call *after* the set_cheapest call and (2) to place
another set_cheapest call after that API call for each joinrel. By the
first set_cheapest call, I think we could probably save an alternative
path that we need in "cheapest_builtin_path". By the second
set_cheapest call following that API call, we could consider
foreign/custom-join-pushdown paths also. What do you think about this idea?
create_foreignscan_plan() or create_customscan_plan() will be
able to construct these alternative plan, regardless of the
extensions. So, individual FDW/CSP don't need to care about
this alternative sub-plan, do it?After that, once ExecScanFetch() is called under EvalPlanQual(),
these Foreign/CustomScan with scanrelid==0 runs the alternative
sub-plan, to validate the latest tuple.Hmm... It looks to me a workable approach.
Year, I think so too.
Fujita-san, are you available to make a patch with this approach?
If so, I'd like to volunteer its reviewing.
Yeah, I'm willing to make a patch if we obtain the consensus! And I'd
be happy if you help me doing the work!
Best regards,
Etsuro Fujita
[2]: /messages/by-id/5451.1426271510@sss.pgh.pa.us
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Let me introduce a few cases we should pay attention.
Foreign/CustomScan node may stack; that means a Foreign/CustomScan node
may have child node that includes another Foreign/CustomScan node with
scanrelid==0.
(At this moment, ForeignScan cannot have child node, however, more
aggressive push-down [1] will need same feature to fetch tuples from
local relation and construct VALUES() clause.)
In this case, the highest Foreign/CustomScan node (that is also nearest
to LockRows or ModifyTuples) run the alternative sub-plan that includes
scan/join plans dominated by fdw_relids or custom_relids.For example:
LockRows
-> HashJoin
-> CustomScan (AliceJoin)
-> SeqScan on t1
-> CustomScan (CarolJoin)
-> SeqScan on t2
-> SeqScan on t3
-> Hash
-> CustomScan (BobJoin)
-> SeqScan on t4
-> ForeignScan (remote join involves ft5, ft6)In this case, AliceJoin will have alternative sub-plan to join t1, t2
and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will
have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the
ForeignScan will also have alternative sub-plan, however, these are
not used in this case.
Probably, it works fine.Yeah, I think so too.
Sorry, I need to adjust my explanation above a bit:
In this case, AliceJoin will have alternative sub-plan to join t1 and
CarolJoin, then CarolJoin will have alternative sub-plan to join t2 and
t3. Also, BobJoin will have alternative sub-plan to join t4 and the
ForeignScan with remote join, and this ForeignScan node will have
alternative sub-plan to join ft5 and ft6.
Why this recursive design is better? Because it makes planner enhancement
much simple than overall approach. Please see my explanation in the
section below.
On the next step, how do we implement this design?
I guess that planner needs to keep a path that contains neither
foreign-join nor custom-join with scanrelid==0.
Probably, "cheapest_builtin_path" of RelOptInfo is needed that
never contains these remote/custom join logic, as a seed of
alternative sub-plan.Yeah, I think so too, but I've not fugiured out how to implement this yet.
To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5. It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2]; (1) to
place that API call *after* the set_cheapest call and (2) to place
another set_cheapest call after that API call for each joinrel. By the
first set_cheapest call, I think we could probably save an alternative
path that we need in "cheapest_builtin_path". By the second
set_cheapest call following that API call, we could consider
foreign/custom-join-pushdown paths also. What do you think about this idea?
Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.
My idea is that we save the cheapest_total_path of RelOptInfo onto the
new cheapest_builtin_path just before the GetForeignJoinPaths() hook.
Why? It should be a built-in join logic, never be a foreign/custom-join,
because of the hook location; only built-in logic shall be added here.
Even if either/both of join sub-trees contains foreign/custom-join,
these path have own alternative sub-plan at their level, no need to
care about at current level. (It is the reason why I adjust my explanation
above.)
Once this built-in path is kept and foreign/custom-join get chosen by
set_cheapest(), it is easy to attach this sub-plan to ForeignScan or
CustomScan node.
I don't find any significant down-side in this approach.
How about your opinion?
Regarding to the development timeline, I prefer to put something
workaround not to kick Assert() on ExecScanFetch().
We may add a warning in the documentation not to replace built-in
join if either/both of sub-trees are target of UPDATE/DELETE or
FOR SHARE/UPDATE.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/07/02 23:13, Kouhei Kaigai wrote:
To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5. It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2]; (1) to
place that API call *after* the set_cheapest call and (2) to place
another set_cheapest call after that API call for each joinrel. By the
first set_cheapest call, I think we could probably save an alternative
path that we need in "cheapest_builtin_path". By the second
set_cheapest call following that API call, we could consider
foreign/custom-join-pushdown paths also. What do you think about this idea?
Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.
Maybe I'm missing something, but my image about this approach is that if
base relations for a given joinrel are all foreign tables and belong to
the same foreign server, then by calling that API there, we consider the
remote join over all the foreign tables, and that if not, we give up to
consider the remote join.
My idea is that we save the cheapest_total_path of RelOptInfo onto the
new cheapest_builtin_path just before the GetForeignJoinPaths() hook.
Why? It should be a built-in join logic, never be a foreign/custom-join,
because of the hook location; only built-in logic shall be added here.
My concern about your idea is that since that (a) add_paths_to_joinrel
is called multiple times per joinrel and that (b) repetitive add_path
calls through GetForeignJoinPaths in add_paths_to_joinrel might remove
old paths that are builtin, it's possible to save a path that is not
builtin onto the cheapest_total_path and thus to save that path wrongly
onto the cheapest_builtin_path. There might be a good way to cope with
that, though.
Regarding to the development timeline, I prefer to put something
workaround not to kick Assert() on ExecScanFetch().
We may add a warning in the documentation not to replace built-in
join if either/both of sub-trees are target of UPDATE/DELETE or
FOR SHARE/UPDATE.
I'm not sure that that is a good idea, but anyway, I think we need to
hurry fixing this issue.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/07/02 23:13, Kouhei Kaigai wrote:
To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5. It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2]; (1) to
place that API call *after* the set_cheapest call and (2) to place
another set_cheapest call after that API call for each joinrel. By the
first set_cheapest call, I think we could probably save an alternative
path that we need in "cheapest_builtin_path". By the second
set_cheapest call following that API call, we could consider
foreign/custom-join-pushdown paths also. What do you think about this idea?Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.Maybe I'm missing something, but my image about this approach is that if
base relations for a given joinrel are all foreign tables and belong to
the same foreign server, then by calling that API there, we consider the
remote join over all the foreign tables, and that if not, we give up to
consider the remote join.
Your understanding is correct, but missing a point. Once foreign tables
to be joined are informed as a bitmap (joinrel->relids), it is not obvious
for extensions which relations are joined with INNER JOIN, and which ones
are joined with OUTER JOIN.
I tried to implement a common utility function under the v9.5 cycle,
however, it was suspicious whether we can make a reliable logic.
Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause.
/messages/by-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=Xw@mail.gmail.com
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp
Once we have to pay attention to the case of local/foreign relations
mixed, we have to care about the path of underlying local or foreign
relations managed by other foreign server.
I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.
My idea is that we save the cheapest_total_path of RelOptInfo onto the
new cheapest_builtin_path just before the GetForeignJoinPaths() hook.
Why? It should be a built-in join logic, never be a foreign/custom-join,
because of the hook location; only built-in logic shall be added here.My concern about your idea is that since that (a) add_paths_to_joinrel
is called multiple times per joinrel and that (b) repetitive add_path
calls through GetForeignJoinPaths in add_paths_to_joinrel might remove
old paths that are builtin, it's possible to save a path that is not
builtin onto the cheapest_total_path and thus to save that path wrongly
onto the cheapest_builtin_path. There might be a good way to cope with
that, though.
For the concern (a), FDW driver can reference RelOptInfo->fdw_private
that shall be initialized to NULL, then FDW driver will set valid data
if it preliminary adds something. IIRC, postgres_fdw also skips to
add same path multiple times.
For the concern (b), yep, we may enhance add_path() to retain built-in
path, instead of the add_paths_to_joinrel().
Let's adjust the logic a bit. The add_path() can know whether the given
path is usual or exceptional (ForeignPath/CustomPath towards none base
relation) one. If path is exceptional, the cheapest_builtin_path shall
be retained unconditionally. Elsewhere, the cheapest one replace here,
then the cheapest built-in path will survive.
Is it still problematic?
Regarding to the development timeline, I prefer to put something
workaround not to kick Assert() on ExecScanFetch().
We may add a warning in the documentation not to replace built-in
join if either/both of sub-trees are target of UPDATE/DELETE or
FOR SHARE/UPDATE.I'm not sure that that is a good idea, but anyway, I think we need to
hurry fixing this issue.
My approach is not fix, but avoid. :-)
It may be an idea to implement the above fixup even though it may be
too large/late to apply v9.5 features, but we can understand how many
changes are needed to fixup this problem.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/07/03 15:32, Kouhei Kaigai wrote:
On 2015/07/02 23:13, Kouhei Kaigai wrote:
To be honest, ISTM that it's difficult to do that simply and efficiently
for the foreign/custom-join-pushdown API that we have for 9.5. It's a
little late, but what I started thinking is to redesign that API so that
that API is called at standard_join_search, as discussed in [2];
Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.
Maybe I'm missing something, but my image about this approach is that if
base relations for a given joinrel are all foreign tables and belong to
the same foreign server, then by calling that API there, we consider the
remote join over all the foreign tables, and that if not, we give up to
consider the remote join.
Your understanding is correct, but missing a point. Once foreign tables
to be joined are informed as a bitmap (joinrel->relids), it is not obvious
for extensions which relations are joined with INNER JOIN, and which ones
are joined with OUTER JOIN.
Can't FDWs get the join information through the root, which I think we
would pass to the API as the argument?
Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause./messages/by-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SCvL-HvOpBR=Xw@mail.gmail.com
/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F20AD@BPXM15GP.gisp.nec.co.jp
Interesting!
I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.
I agree with you that it's important to ensure the expandability, and my
question is, is it possible that the API call from standard_join_search
also realize those idea if FDWs can get the join information through the
root or something like that?
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause./messages/by-id/CA+Tgmoai_VUF5h6qVLNLU+FKp0aeBCbnnMT3SC
vL-HvOpBR=Xw@mail.gmail.com/messages/by-id/9A28C8860F777E439AA12E8AEA7694F8010F20A
D@BPXM15GP.gisp.nec.co.jpInteresting!
I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.I agree with you that it's important to ensure the expandability, and my
question is, is it possible that the API call from standard_join_search
also realize those idea if FDWs can get the join information through the
root or something like that?
I don't deny its possibility, even though I once gave up to implement to
reproduce join information - which relations and other ones are joined in
this level - using PlannerInfo and RelOptInfo.
However, we need to pay attention on advantages towards the alternatives.
Hooks on add_paths_to_joinrel() enables to implement identical stuff, with
less complicated logic to reproduce left / right relations from RelOptInfo
of the joinrel. (Note that RelOptInfo->fdw_private enables to avoid path-
construction multiple times.)
I'm uncertain why this API change is necessary to fix up the problem
around EvalPlanQual.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/07/06 9:42, Kouhei Kaigai wrote:
Also, I don't want to stick on the assumption that relations involved in
remote join are all managed by same foreign-server no longer.
The following two ideas introduce possible enhancement of remote join
feature that involved local relations; replicated table or transformed
to VALUES() clause.
I think add_paths_to_joinrel() is the best location for foreign-join,
not only custom-join. Relocation to standard_join_search() has larger
disadvantage than its advantage.
I agree with you that it's important to ensure the expandability, and my
question is, is it possible that the API call from standard_join_search
also realize those idea if FDWs can get the join information through the
root or something like that?
I don't deny its possibility, even though I once gave up to implement to
reproduce join information - which relations and other ones are joined in
this level - using PlannerInfo and RelOptInfo.
OK
However, we need to pay attention on advantages towards the alternatives.
Hooks on add_paths_to_joinrel() enables to implement identical stuff, with
less complicated logic to reproduce left / right relations from RelOptInfo
of the joinrel. (Note that RelOptInfo->fdw_private enables to avoid path-
construction multiple times.)
I'm uncertain why this API change is necessary to fix up the problem
around EvalPlanQual.
Yeah, maybe we wouldn't need any API change. I think we would be able
to fix this by complicating add_path as you pointed out upthread. I'm
not sure that complicating it is a good idea, though. I think that it
might be possible that the callback in standard_join_search would allow
us to fix this without complicating the core path-cost-comparison stuff
such as add_path. I noticed that what I proposed upthread doesn't work
properly, though.
Actually, I have another concern about the callback location that you
proposed; that might meaninglessly increase planning time in the
postgres_fdw case when using remote estimates, which the proposed
postgres_fdw patch doesn't support currently IIUC, but I think it should
support that. Let me explain about that. If you have A JOIN B JOIN C
all on the same foreign server, for example, we'll have only to perform
a remote EXPLAIN for A-B-C for the estimates (when adopting a strategy
that we push down a join as large as possible into the remote server).
However, ISTM that the callback in add_paths_to_joinrel would perform
remote EXPLAINs not only for A-B-C but for A-B, A-C and B-C according to
the dynamic programming algorithm. (Duplicated remote EXPLAINs for
A-B-C can be eliminated using a way you proposed.) Thus the remote
EXPLAINs for A-B, A-C and B-C seem to me meaningless while incurring
performance degradation in query planning. Maybe I'm missing something,
though.
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015/07/07 19:15, Etsuro Fujita wrote:
On 2015/07/06 9:42, Kouhei Kaigai wrote:
However, we need to pay attention on advantages towards the alternatives.
Hooks on add_paths_to_joinrel() enables to implement identical stuff,
with
less complicated logic to reproduce left / right relations from
RelOptInfo
of the joinrel. (Note that RelOptInfo->fdw_private enables to avoid path-
construction multiple times.)
I'm uncertain why this API change is necessary to fix up the problem
around EvalPlanQual.Yeah, maybe we wouldn't need any API change. I think we would be able
to fix this by complicating add_path as you pointed out upthread. I'm
not sure that complicating it is a good idea, though. I think that it
might be possible that the callback in standard_join_search would allow
us to fix this without complicating the core path-cost-comparison stuff
such as add_path. I noticed that what I proposed upthread doesn't work
properly, though.
To resolve this issue, I tried to make the core create an alternative
plan that will be used in an EvalPlanQual recheck, instead of a foreign
scan that performs a foreign join remotely (ie, scanrelid = 0). But I
changed that idea. Instead, I'd like to propose that it's the FDW's
responsibility to provide such a plan. Specifically, I'd propose that
(1) we add a new Path field, say subpath, to the ForeignPath data
structure and that (2) when generating a ForeignPath node for a foreign
join, an FDW must provide the subpath Path node by itself. As before,
it'd be recommended to use
ForeignPath *
create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
double rows, Cost startup_cost, Cost total_cost,
List *pathkeys,
Relids required_outer,
Path *subpath,
List *fdw_private)
where subpath is the subpath Path node that has the pathkeys and the
required_outer rels. (subpath is NULL if scanning a base relation.)
Also, it'd be recommended that an FDW generates such ForeignPath nodes
by considering, for each of paths in the rel's pathlist, whether to push
down that path (to generate a ForeignPath node for a foreign join that
has the same pathkeys and parameterization as that path). So, if
generating the ForeignPath node, that path could be used as the subpath
Path node.
(I think the current postgres_fdw patch only considers an unsorted,
unparameterized path for performing a foreign join remotely, but I think
we should also consider presorted and/or parameterized paths.)
I think this idea would apply to the API location that you proposed.
However, ISTM that this idea would work better for the API call from
standard_join_search because the rel's pathlist at that point has more
paths worthy of consideration, in view of not only costs and sizes but
pathkeys and parameterization.
I think the subplan created from the subpath Path node could be used in
an EvalPlanQual recheck, instead of a foreign scan that performs a
foreign join remotely, as discussed previously.
Comments welcome!
Best regards,
Etsuro Fujita
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
Can't FDWs get the join information through the root, which I think we would
pass to the API as the argument?
This is exactly what Tom suggested originally, and it has some appeal,
but neither KaiGai nor I could see how to make it work . Do you have
an idea? It's not too late to go back and change the API.
The problem that was bothering us (or at least what was bothering me)
is that the PlannerInfo provides only a list of SpecialJoinInfo
structures, which don't directly give you the original join order. In
fact, min_righthand and min_lefthand are intended to constraint the
*possible* join orders, and are deliberately designed *not* to specify
a single join order. If you're sending a query to a remote PostgreSQL
node, you don't want to know what all the possible join orders are;
it's the remote side's job to plan the query. You do, however, need
an easy way to identify one join order that you can use to construct a
query. It didn't seem easy to do that without duplicating
make_join_rel(), which seemed like a bad idea.
But maybe there's a good way to do it. Tom wasn't crazy about this
hook both because of the frequency of calls and also because of the
long argument list. I think those concerns are legitimate; I just
couldn't see how to make the other way work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:Can't FDWs get the join information through the root, which I think we would
pass to the API as the argument?This is exactly what Tom suggested originally, and it has some appeal,
but neither KaiGai nor I could see how to make it work . Do you have
an idea? It's not too late to go back and change the API.The problem that was bothering us (or at least what was bothering me)
is that the PlannerInfo provides only a list of SpecialJoinInfo
structures, which don't directly give you the original join order. In
fact, min_righthand and min_lefthand are intended to constraint the
*possible* join orders, and are deliberately designed *not* to specify
a single join order. If you're sending a query to a remote PostgreSQL
node, you don't want to know what all the possible join orders are;
it's the remote side's job to plan the query. You do, however, need
an easy way to identify one join order that you can use to construct a
query. It didn't seem easy to do that without duplicating
make_join_rel(), which seemed like a bad idea.But maybe there's a good way to do it. Tom wasn't crazy about this
hook both because of the frequency of calls and also because of the
long argument list. I think those concerns are legitimate; I just
couldn't see how to make the other way work.
I could have a discussion with Fujita-san about this topic.
He has a little bit tricky, but I didn't have a clear reason to deny,
idea to tackle this matter.
At the line just above set_cheapest() of the standard_join_search(),
at least one built-in join logic are already added to the RelOptInfo,
thus, FDW driver can reference the cheapest path by built-in logic
and its lefttree and righttree that construct a joinrel.
Its assumption is, the best paths by built-in logic are at least
enough reasonable join order than other potential ones.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
The problem that was bothering us (or at least what was bothering me)
is that the PlannerInfo provides only a list of SpecialJoinInfo
structures, which don't directly give you the original join order. In
fact, min_righthand and min_lefthand are intended to constraint the
*possible* join orders, and are deliberately designed *not* to specify
a single join order. If you're sending a query to a remote PostgreSQL
node, you don't want to know what all the possible join orders are;
it's the remote side's job to plan the query. You do, however, need
an easy way to identify one join order that you can use to construct a
query. It didn't seem easy to do that without duplicating
make_join_rel(), which seemed like a bad idea.
In principle it seems like you could traverse root->parse->jointree
as a guide to reconstructing the original syntactic structure; though
I'm not sure how hard it would be to ignore the parts of that tree
that correspond to relations you're not shipping.
But maybe there's a good way to do it. Tom wasn't crazy about this
hook both because of the frequency of calls and also because of the
long argument list. I think those concerns are legitimate; I just
couldn't see how to make the other way work.
In my vision you probably really only want one call per build_join_rel
event (that is, per construction of a new RelOptInfo), not per
make_join_rel event.
It's possible that an FDW that wants to handle joins but is not talking to
a remote query planner would need to grovel through all the join ordering
possibilities individually, and then maybe hooking at make_join_rel is
sensible rather than having to reinvent that logic. But I'd want to see a
concrete use-case first, and I certainly don't think that that's the main
case to design the API around.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I could have a discussion with Fujita-san about this topic.
Also, let me share with the discussion towards entire solution.
The primitive reason of this problem is, Scan node with scanrelid==0
represents a relation join that can involve multiple relations, thus,
its TupleDesc of the records will not fit base relations, however,
ExecScanFetch() was not updated when scanrelid==0 gets supported.
FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible
to generate records according to the fdw_/custom_scan_tlist that
reflects the definition of relation join, and only FDW/CSP know how
to combine these base relations.
In addition, host-side expressions (like Plan->qual) are initialized
to reference the records generated by FDW/CSP, so the least invasive
approach is to allow FDW/CSP to have own logic to recheck, I think.
Below is the structure of ExecScanFetch().
ExecScanFetch(ScanState *node,
ExecScanAccessMtd accessMtd,
ExecScanRecheckMtd recheckMtd)
{
EState *estate = node->ps.state;
if (estate->es_epqTuple != NULL)
{
/*
* We are inside an EvalPlanQual recheck. Return the test tuple if
* one is available, after rechecking any access-method-specific
* conditions.
*/
Index scanrelid = ((Scan *) node->ps.plan)->scanrelid;
Assert(scanrelid > 0);
if (estate->es_epqTupleSet[scanrelid - 1])
{
TupleTableSlot *slot = node->ss_ScanTupleSlot;
:
return slot;
}
}
return (*accessMtd) (node);
}
When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and
checks its visibility (ForeignRecheck() always say 'yep, it is visible'),
then ExecScan() applies its qualifiers by ExecQual().
So, as long as FDW/CSP can return a record that satisfies the TupleDesc
of this relation, made by the tuples in es_epqTuple[] array, rest of the
code paths are common.
I have an idea to solve the problem.
It adds recheckMtd() call if scanrelid==0 just before the assertion above,
and add a callback of FDW on ForeignRecheck().
The role of this new callback is to set up the supplied TupleTableSlot
and check its visibility, but does not define how to do this.
It is arbitrarily by FDW driver, like invocation of alternative plan
consists of only built-in logic.
Invocation of alternative plan is one of the most feasible way to
implement EPQ logic on FDW, so I think FDW also needs a mechanism
that takes child path-nodes like custom_paths in CustomPath node.
Once a valid path node is linked to this list, createplan.c transform
them to relevant plan node, then FDW can initialize and invoke this
plan node during execution, like ForeignRecheck().
This design can solve another problem Fujita-san has also mentioned.
If scan qualifier is pushed-down to the remote query and its expression
node is saved in the private area of ForeignScan, the callback on
ForeignRecheck() can evaluate the qualifier by itself. (Note that only
FDW driver can know where and how expression node being pushed-down
is saved in the private area.)
In the summary, the following three enhancements are a straightforward
way to fix up the problem he reported.
1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0
2. Add a callback of FDW in ForeignRecheck() - to construct a record
according to the fdw_scan_tlist definition and to evaluate its
visibility, or to evaluate qualifier pushed-down if base relation.
3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths,
to construct plan nodes for EPQ evaluation.
On the other hands, we also need to pay attention the development
timeline. It is a really problem of v9.5, however, it looks to me
the straight forward solution needs enhancement of FDW APIs.
I'd like to see people's comment.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
Sent: Saturday, August 01, 2015 10:35 PM
To: Robert Haas; Etsuro Fujita
Cc: PostgreSQL-development; 花田茂
Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQualOn Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:Can't FDWs get the join information through the root, which I think we would
pass to the API as the argument?This is exactly what Tom suggested originally, and it has some appeal,
but neither KaiGai nor I could see how to make it work . Do you have
an idea? It's not too late to go back and change the API.The problem that was bothering us (or at least what was bothering me)
is that the PlannerInfo provides only a list of SpecialJoinInfo
structures, which don't directly give you the original join order. In
fact, min_righthand and min_lefthand are intended to constraint the
*possible* join orders, and are deliberately designed *not* to specify
a single join order. If you're sending a query to a remote PostgreSQL
node, you don't want to know what all the possible join orders are;
it's the remote side's job to plan the query. You do, however, need
an easy way to identify one join order that you can use to construct a
query. It didn't seem easy to do that without duplicating
make_join_rel(), which seemed like a bad idea.But maybe there's a good way to do it. Tom wasn't crazy about this
hook both because of the frequency of calls and also because of the
long argument list. I think those concerns are legitimate; I just
couldn't see how to make the other way work.I could have a discussion with Fujita-san about this topic.
He has a little bit tricky, but I didn't have a clear reason to deny,
idea to tackle this matter.
At the line just above set_cheapest() of the standard_join_search(),
at least one built-in join logic are already added to the RelOptInfo,
thus, FDW driver can reference the cheapest path by built-in logic
and its lefttree and righttree that construct a joinrel.
Its assumption is, the best paths by built-in logic are at least
enough reasonable join order than other potential ones.Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Aug 7, 2015 at 3:37 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
I could have a discussion with Fujita-san about this topic.
Also, let me share with the discussion towards entire solution.
The primitive reason of this problem is, Scan node with scanrelid==0
represents a relation join that can involve multiple relations, thus,
its TupleDesc of the records will not fit base relations, however,
ExecScanFetch() was not updated when scanrelid==0 gets supported.FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible
to generate records according to the fdw_/custom_scan_tlist that
reflects the definition of relation join, and only FDW/CSP know how
to combine these base relations.
In addition, host-side expressions (like Plan->qual) are initialized
to reference the records generated by FDW/CSP, so the least invasive
approach is to allow FDW/CSP to have own logic to recheck, I think.Below is the structure of ExecScanFetch().
ExecScanFetch(ScanState *node,
ExecScanAccessMtd accessMtd,
ExecScanRecheckMtd recheckMtd)
{
EState *estate = node->ps.state;if (estate->es_epqTuple != NULL)
{
/*
* We are inside an EvalPlanQual recheck. Return the test tuple if
* one is available, after rechecking any access-method-specific
* conditions.
*/
Index scanrelid = ((Scan *) node->ps.plan)->scanrelid;Assert(scanrelid > 0);
if (estate->es_epqTupleSet[scanrelid - 1])
{
TupleTableSlot *slot = node->ss_ScanTupleSlot;
:
return slot;
}
}
return (*accessMtd) (node);
}When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and
checks its visibility (ForeignRecheck() always say 'yep, it is visible'),
then ExecScan() applies its qualifiers by ExecQual().
So, as long as FDW/CSP can return a record that satisfies the TupleDesc
of this relation, made by the tuples in es_epqTuple[] array, rest of the
code paths are common.I have an idea to solve the problem.
It adds recheckMtd() call if scanrelid==0 just before the assertion above,
and add a callback of FDW on ForeignRecheck().
The role of this new callback is to set up the supplied TupleTableSlot
and check its visibility, but does not define how to do this.
It is arbitrarily by FDW driver, like invocation of alternative plan
consists of only built-in logic.Invocation of alternative plan is one of the most feasible way to
implement EPQ logic on FDW, so I think FDW also needs a mechanism
that takes child path-nodes like custom_paths in CustomPath node.
Once a valid path node is linked to this list, createplan.c transform
them to relevant plan node, then FDW can initialize and invoke this
plan node during execution, like ForeignRecheck().This design can solve another problem Fujita-san has also mentioned.
If scan qualifier is pushed-down to the remote query and its expression
node is saved in the private area of ForeignScan, the callback on
ForeignRecheck() can evaluate the qualifier by itself. (Note that only
FDW driver can know where and how expression node being pushed-down
is saved in the private area.)In the summary, the following three enhancements are a straightforward
way to fix up the problem he reported.
1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0
2. Add a callback of FDW in ForeignRecheck() - to construct a record
according to the fdw_scan_tlist definition and to evaluate its
visibility, or to evaluate qualifier pushed-down if base relation.
3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths,
to construct plan nodes for EPQ evaluation.On the other hands, we also need to pay attention the development
timeline. It is a really problem of v9.5, however, it looks to me
the straight forward solution needs enhancement of FDW APIs.I'd like to see people's comment.
I'm not an expert in this area, but this plan does not seem unreasonable to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers