Custom Scan APIs (Re: Custom Plan node)
The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.
Below is an example to show how does custom-scan work.
We usually run sequential scan even if clause has inequality operator
that references ctid system column.
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)
An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.
postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)
Of course, more cost effective plan will win if exists.
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)
One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.
postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)
---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.
Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.
void register_custom_provider(const CustomProvider *provider);
CustomProvider is a set of name string and function pointers of callbacks.
Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.
In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.
------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?
+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+ RelOptInfo *joinrel,
+ RelOptInfo *outerrel,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo,
+ List *restrictlist,
+ List *mergeclause_list,
+ SemiAntiJoinFactors *semifactors,
+ Relids param_source_rels,
+ Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.
I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.
----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.
Thanks,
2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
2013/10/3 Robert Haas <robertmhaas@gmail.com>:
Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Hi,
I tried to write up a wikipage to introduce how custom-scan works.
https://wiki.postgresql.org/wiki/CustomScanAPI
Any comments please.
2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.Below is an example to show how does custom-scan work.
We usually run sequential scan even if clause has inequality operator
that references ctid system column.postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)Of course, more cost effective plan will win if exists.
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.void register_custom_provider(const CustomProvider *provider);
CustomProvider is a set of name string and function pointers of callbacks.
Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?+/* Hook for plugins to add custom join path, in addition to default ones */ +typedef void (*add_join_path_hook_type)(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + SpecialJoinInfo *sjinfo, + List *restrictlist, + List *mergeclause_list, + SemiAntiJoinFactors *semifactors, + Relids param_source_rels, + Relids extra_lateral_rels); +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.Thanks,
2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
2013/10/3 Robert Haas <robertmhaas@gmail.com>:
Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The attached patches are the revised custom-scan APIs.
- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.
I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works well.
Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.
Thanks,
2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hi,
I tried to write up a wikipage to introduce how custom-scan works.
https://wiki.postgresql.org/wiki/CustomScanAPI
Any comments please.
2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.Below is an example to show how does custom-scan work.
We usually run sequential scan even if clause has inequality operator
that references ctid system column.postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)Of course, more cost effective plan will win if exists.
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.void register_custom_provider(const CustomProvider *provider);
CustomProvider is a set of name string and function pointers of callbacks.
Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?+/* Hook for plugins to add custom join path, in addition to default ones */ +typedef void (*add_join_path_hook_type)(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + SpecialJoinInfo *sjinfo, + List *restrictlist, + List *mergeclause_list, + SemiAntiJoinFactors *semifactors, + Relids param_source_rels, + Relids extra_lateral_rels); +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.Thanks,
2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
2013/10/3 Robert Haas <robertmhaas@gmail.com>:
Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
The attached patches are the revised custom-scan APIs.
My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]
I'd recommend that you split the part1 patch containing the ctidscan
contrib into its own patch. It is more than half of the patch and its
certainly stands on its own. IMO, I think ctidscan fits a very specific use
case and would be better off being an extension instead of in contrib.
Show quoted text
- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works
well.Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.Thanks,
2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hi,
I tried to write up a wikipage to introduce how custom-scan works.
https://wiki.postgresql.org/wiki/CustomScanAPI
Any comments please.
2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integratecustom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.Below is an example to show how does custom-scan work.
We usually run sequential scan even if clause has inequality operator
that references ctid system column.postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)Of course, more cost effective plan will win if exists.
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND
a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERYPLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) aroundallpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.void register_custom_provider(const CustomProvider *provider);
CustomProvider is a set of name string and function pointers of
callbacks.
Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?+/* Hook for plugins to add custom join path, in addition to default
ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + SpecialJoinInfo *sjinfo, + List *restrictlist, + List *mergeclause_list, + SemiAntiJoinFactors*semifactors,
+ Relids param_source_rels, + Relids extra_lateral_rels); +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.Thanks,
2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
2013/10/3 Robert Haas <robertmhaas@gmail.com>:
Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for your review.
2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:
My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]
This variable was only used in Assert() macro, so it causes a warning if you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.
I'd recommend that you split the part1 patch containing the ctidscan contrib
into its own patch. It is more than half of the patch and its certainly
stands on its own. IMO, I think ctidscan fits a very specific use case and
would be better off being an extension instead of in contrib.
OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.4-custom-scan-part3.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part3.v4.patchDownload+1303-169
pgsql-v9.4-custom-scan-part2.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part2.v4.patchDownload+1247-9
pgsql-v9.4-custom-scan-part1.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part1.v4.patchDownload+1182-15
KaiGai
On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Thanks for your review.
2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:
My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’[-Wunused-variable]
This variable was only used in Assert() macro, so it causes a warning if
you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.
The warning is now gone.
I'd recommend that you split the part1 patch containing the ctidscan
contrib
into its own patch. It is more than half of the patch and its certainly
stands on its own. IMO, I think ctidscan fits a very specific use caseand
would be better off being an extension instead of in contrib.
OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.
Attached is a patch for the documentation. I think the documentation still
needs a little more work, but it is pretty close. I can add some more
detail to it once finish adapting the hadoop_fdw to using the custom scan
api and have a better understanding of all of the calls.
Show quoted text
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
custom-scan.sgml.patchtext/x-patch; charset=US-ASCII; name=custom-scan.sgml.patchDownload+167-167
contrib/ctidscan/ctidscan.c:44: indent with spaces.
contrib/ctidscan/ctidscan.c:250: indent with spaces.
contrib/ctidscan/ctidscan.c:266: trailing whitespace.
contrib/postgres_fdw/deparse.c:1044: indent with spaces.
contrib/postgres_fdw/postgres_fdw.c:940: indent with spaces.
src/backend/commands/explain.c:2097: indent with spaces.
src/backend/optimizer/plan/createplan.c:2097: trailing whitespace.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:
OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.
These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.
I'll continue to review by applying patches onto 2013-11-19 HEAD.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the series of checks.
The attached ones are the revised patches.
I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.
An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.
Nothing were changed from the previous version.
Best regards,
2013/11/27 Shigeru Hanada <shigeru.hanada@gmail.com>:
2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:
OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.I'll continue to review by applying patches onto 2013-11-19 HEAD.
--
Shigeru HANADA
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.4-custom-scan.part-3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.patchDownload+1303-169
pgsql-v9.4-custom-scan.part-2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.patchDownload+1247-9
pgsql-v9.4-custom-scan.part-1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.patchDownload+1198-15
Hi KaiGai-san,
2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached ones are the revised patches.
I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.Nothing were changed from the previous version.
I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:
(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.
(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.
(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?
(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.
I found some typos in documents and comments. Please see attached
patch for detail.
--
Shigeru HANADA
Attachments:
fix_typo.patchapplication/octet-stream; name=fix_typo.patchDownload+18-16
Hanada-san,
Thanks for your reviewing,
2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:
I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.
Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one according to the cost value.
(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.
Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.
Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.
(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?
Probably, src/backend/foreign/foreign.c is a good host for them.
(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.
I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for fixing many my carelessness.
I didn't know "seek" was an irregular verb...
Best regards,
2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:
2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.I found some typos in documents and comments. Please see attached
patch for detail.--
Shigeru HANADA
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.
Please check it. Thanks,
2013/12/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hanada-san,
Thanks for your reviewing,
2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:
I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one according to the cost value.(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?Probably, src/backend/foreign/foreign.c is a good host for them.
(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.4-custom-scan.part-3.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v2.patchDownload+1367-185
pgsql-v9.4-custom-scan.part-2.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v2.patchDownload+1252-14
pgsql-v9.4-custom-scan.part-1.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v2.patchDownload+1198-15
Hi KaiGai-san,
2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.Please check it. Thanks,
The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.
I have some more random comments about EXPLAIN.
1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?
2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.
Regards,
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hanada-san,
Thanks for your reviewing.
2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:
Hi KaiGai-san,
2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:
The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.Please check it. Thanks,
The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.
Fixed,
I have some more random comments about EXPLAIN.
1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?
I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.
postgres=# explain(format xml) select * from t1 where ctid > '(4,0)'::tid;
QUERY PLAN
----------------------------------------------------------
<explain xmlns="http://www.postgresql.org/2009/explain">+
<Query> +
<Plan> +
<Node-Type>Custom Scan</Node-Type> +
<Custom-Provider>ctidscan</Custom-Provider> +
<Relation-Name>t1</Relation-Name> +
<Alias>t1</Alias> +
<Startup-Cost>0.00</Startup-Cost> +
<Total-Cost>12.30</Total-Cost> +
<Plan-Rows>410</Plan-Rows> +
<Plan-Width>36</Plan-Width> +
<Filter>(ctid > '(4,0)'::tid)</Filter> +
</Plan> +
</Query> +
</explain>
(1 row)
2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.
Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.
The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.4-custom-scan.part-3.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v3.patchDownload+1350-169
pgsql-v9.4-custom-scan.part-2.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v3.patchDownload+1252-14
pgsql-v9.4-custom-scan.part-1.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v3.patchDownload+1201-15
Hi Kaigai-san,
2013/12/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:
2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:
The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.Fixed,
Check.
I have some more random comments about EXPLAIN.
1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.
New name seems better, it is what the node express.
2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.
Agreed. It would be enough that Custom Scan Providers can add
arbitrary information, such as "Remote SQL" of postgres_fdw, to
EXPLAIN result via core API. Some kind of framework which helps
authors of Custom Scan Providers, but it should be considered after
the first cut.
The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.
The patch can be applied onto 2013-12-16 HEAD cleanly, and gives no
unexpected error/warinig.
I'm sorry to post separately, but I have some comments on document.
(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.
(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.
Codes for core and contrib seem fine, so I'll mark the patches "Ready
for committer" after the document enhancement.
Regards,
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Hanada-san,
(2013/12/16 14:15), Shigeru Hanada wrote:
I'm sorry to post separately, but I have some comments on document.
(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.
It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.
(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.
I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.
Best regards,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
pgsql-v9.4-custom-scan.part-1.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-1.v4.patchDownload+1201-15
pgsql-v9.4-custom-scan.part-2.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-2.v4.patchDownload+1253-14
pgsql-v9.4-custom-scan.part-3.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-3.v4.patchDownload+1360-169
KaiGai-san,
2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
(2013/12/16 14:15), Shigeru Hanada wrote:
(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.
Check.
(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.
Check.
I think that these patches are enough considered to mark as "Ready for
Committer".
Regards,
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
The attached patches are the ones rebased to the latest git tree, but
no functional
changes from the previous revision on the commit-fest:Nov.
Hanada-san volunteered to review the series of patches, including the
portion for
postgres_fdw, then marked it as "ready for committer" on the last commit fest.
So, I hope someone of committer also volunteer to review the patches for final
checking.
* Part-1 - CustomScan APIs
This patch provides a set of interfaces to interact query-optimizer
and -executor
for extensions. The new add_scan_path_hook or add_join_path_hook allows to
offer alternative ways to scan a particular relation or to join a
particular relations.
Then, once the alternative ways are chosen by the optimizer,
associated callbacks
shall be kicked from the executor. In this case, extension has responsibility to
return a slot that hold a tuple (or empty for end of scan) being
scanned from the
underlying relation.
* Part-2 - contrib/ctidscan
This patch provides a simple example implementation of CustomScan API.
It enables to skip pages when inequality operators are given on ctid system
columns. That is, at least, better than sequential full-scan, so it usually wins
to SeqScan, but Index-scan is much better.
* Part-3 - remote join implementation
This patch provides an example to replace a join by a custom scan node that
runs on a result set of remote join query, on top of existing postgres_fdw
extension. The idea is, a result set of remote query looks like a relation but
intangible, thus, it is feasible to replace a local join by a scan on the result
set of a query executed on the remote host, if both of the relation to be joined
belongs to the identical foreign server.
This patch gives postgres_fdw a capability to run a join on the remote host.
Thanks,
2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>:
KaiGai-san,
2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
(2013/12/16 14:15), Shigeru Hanada wrote:
(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.Check.
(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.Check.
I think that these patches are enough considered to mark as "Ready for
Committer".Regards,
--
Shigeru HANADA
--
KaiGai Kohei <kaigai@kaigai.gr.jp>