Custom Scan APIs (Re: Custom Plan node)

Started by KaiGai Koheiover 12 years ago110 messageshackers
Jump to latest
#1KaiGai Kohei
kaigai@ak.jp.nec.com

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan-remote-join.v1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-remote-join.v1.patchDownload+1264-104
pgsql-v9.4-custom-scan-apis.v1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-apis.v1.patchDownload+2194-24
#2KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#1)
Re: Custom Scan APIs (Re: Custom Plan node)

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#2)
Re: Custom Scan APIs (Re: Custom Plan node)

The attached patches are the revised custom-scan APIs.
- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.

I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works well.

Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.

Thanks,

2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan-part2.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part2.v3.patchDownload+1303-169
pgsql-v9.4-custom-scan-part1.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part1.v3.patchDownload+2430-24
#4Jim Mlodgenski
jimmy76@gmail.com
In reply to: KaiGai Kohei (#3)
Re: Custom Scan APIs (Re: Custom Plan node)

On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

The attached patches are the revised custom-scan APIs.

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

I'd recommend that you split the part1 patch containing the ctidscan
contrib into its own patch. It is more than half of the patch and its
certainly stands on its own. IMO, I think ctidscan fits a very specific use
case and would be better off being an extension instead of in contrib.

Show quoted text

- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.

I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works
well.

Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.

Thanks,

2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate

custom

scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
QUERY PLAN
----------------------------------------------------------------------
Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND

a = 200;

QUERY PLAN
-------------------------------------------------------------------
Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
Index Cond: (a = 200)
Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
WHERE f_leak(b) AND y
like '%aaa%';
QUERY

PLAN

------------------------------------------------------------------------------------------------------

Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
Output: a, b, x, y
Filter: f_leak(b)
Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around

allpaths.c

and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of

callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default

ones */

+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors

*semifactors,

+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.

I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.

----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/10/3 Robert Haas <robertmhaas@gmail.com>:

Well, there were a lot of problems with your demonstration, which have
already been pointed out upthread. I'm skeptical about the idea of
simply replacing planner nodes wholesale, and Tom is outright opposed.
I think you'll do better to focus on a narrower case - I'd suggest
custom scan nodes - and leave the rest as a project for another time.

Thanks, it makes me clear what we should target on v9.4 development.
Towards the next commitfest, I'm planning to develop the following
features:
* CustomScan node that can run custom code instead of built-in
scan nodes.
* Join-pushdown of postgres_fdw using the hook to be located on
the add_paths_to_joinrel(), for demonstration purpose.
* Something new way to scan a relation; probably, your suggested
ctid scan with less or bigger qualifier is a good example, also for
demonstration purpose.

Probably, above set of jobs will be the first chunk of this feature.
Then, let's do other stuff like Append, Sort, Aggregate and so on
later. It seems to me a reasonable strategy.

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Jim Mlodgenski (#4)
Re: Custom Scan APIs (Re: Custom Plan node)

Thanks for your review.

2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

This variable was only used in Assert() macro, so it causes a warning if you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.

I'd recommend that you split the part1 patch containing the ctidscan contrib
into its own patch. It is more than half of the patch and its certainly
stands on its own. IMO, I think ctidscan fits a very specific use case and
would be better off being an extension instead of in contrib.

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan-part3.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part3.v4.patchDownload+1303-169
pgsql-v9.4-custom-scan-part2.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part2.v4.patchDownload+1247-9
pgsql-v9.4-custom-scan-part1.v4.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan-part1.v4.patchDownload+1182-15
#6Jim Mlodgenski
jimmy76@gmail.com
In reply to: KaiGai Kohei (#5)
Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai

On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Thanks for your review.

2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’

[-Wunused-variable]

This variable was only used in Assert() macro, so it causes a warning if
you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.

The warning is now gone.

I'd recommend that you split the part1 patch containing the ctidscan

contrib

into its own patch. It is more than half of the patch and its certainly
stands on its own. IMO, I think ctidscan fits a very specific use case

and

would be better off being an extension instead of in contrib.

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

Attached is a patch for the documentation. I think the documentation still
needs a little more work, but it is pretty close. I can add some more
detail to it once finish adapting the hadoop_fdw to using the custom scan
api and have a better understanding of all of the calls.

Show quoted text

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

custom-scan.sgml.patchtext/x-patch; charset=US-ASCII; name=custom-scan.sgml.patchDownload+167-167
#7Peter Eisentraut
peter_e@gmx.net
In reply to: KaiGai Kohei (#5)
Re: Custom Scan APIs (Re: Custom Plan node)

contrib/ctidscan/ctidscan.c:44: indent with spaces.
contrib/ctidscan/ctidscan.c:250: indent with spaces.
contrib/ctidscan/ctidscan.c:266: trailing whitespace.
contrib/postgres_fdw/deparse.c:1044: indent with spaces.
contrib/postgres_fdw/postgres_fdw.c:940: indent with spaces.
src/backend/commands/explain.c:2097: indent with spaces.
src/backend/optimizer/plan/createplan.c:2097: trailing whitespace.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#5)
Re: Custom Scan APIs (Re: Custom Plan node)

2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.

I'll continue to review by applying patches onto 2013-11-19 HEAD.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#8)
Re: Custom Scan APIs (Re: Custom Plan node)

Thanks for the series of checks.

The attached ones are the revised patches.

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.

Nothing were changed from the previous version.

Best regards,

2013/11/27 Shigeru Hanada <shigeru.hanada@gmail.com>:

2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:

OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.

I'll continue to review by applying patches onto 2013-11-19 HEAD.

--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.patchDownload+1303-169
pgsql-v9.4-custom-scan.part-2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.patchDownload+1247-9
pgsql-v9.4-custom-scan.part-1.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.patchDownload+1198-15
#10Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#9)
Re: Custom Scan APIs (Re: Custom Plan node)

Hi KaiGai-san,

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached ones are the revised patches.

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.

Nothing were changed from the previous version.

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?

(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.

--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#9)
Re: Custom Scan APIs (Re: Custom Plan node)

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

I found some typos in documents and comments. Please see attached
patch for detail.

--
Shigeru HANADA

Attachments:

fix_typo.patchapplication/octet-stream; name=fix_typo.patchDownload+18-16
#12KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#10)
Re: Custom Scan APIs (Re: Custom Plan node)

Hanada-san,

Thanks for your reviewing,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one according to the cost value.

(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.

Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?

Probably, src/backend/foreign/foreign.c is a good host for them.

(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.

I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#11)
Re: Custom Scan APIs (Re: Custom Plan node)

Thanks for fixing many my carelessness.
I didn't know "seek" was an irregular verb...

Best regards,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

I found some typos in documents and comments. Please see attached
patch for detail.

--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#12)
Re: Custom Scan APIs (Re: Custom Plan node)

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

2013/12/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:

Hanada-san,

Thanks for your reviewing,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan. Wiki page is also a good textbook of the
feature. I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one according to the cost value.

(2) FDW framework has executor callbacks based on existing executor
nodes. Is there any plan to integrate them into one way, or wrap on
by another? I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.

Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs. Is it able
to move them into core?

Probably, src/backend/foreign/foreign.c is a good host for them.

(4) postgres_fdw estimates costs of join by calculating local numbers.
How about to support remote estimation by throwing EXPLALAIN query
when use_remote_estimates = true.

I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-3.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v2.patchDownload+1367-185
pgsql-v9.4-custom-scan.part-2.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v2.patchDownload+1252-14
pgsql-v9.4-custom-scan.part-1.v2.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v2.patchDownload+1198-15
#15Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#14)
Re: Custom Scan APIs (Re: Custom Plan node)

Hi KaiGai-san,

2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#15)
Re: Custom Scan APIs (Re: Custom Plan node)

Hanada-san,

Thanks for your reviewing.

2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:

Hi KaiGai-san,

2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.

Fixed,

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?

I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.

postgres=# explain(format xml) select * from t1 where ctid > '(4,0)'::tid;
QUERY PLAN
----------------------------------------------------------
<explain xmlns="http://www.postgresql.org/2009/explain&quot;&gt;+
<Query> +
<Plan> +
<Node-Type>Custom Scan</Node-Type> +
<Custom-Provider>ctidscan</Custom-Provider> +
<Relation-Name>t1</Relation-Name> +
<Alias>t1</Alias> +
<Startup-Cost>0.00</Startup-Cost> +
<Total-Cost>12.30</Total-Cost> +
<Plan-Rows>410</Plan-Rows> +
<Plan-Width>36</Plan-Width> +
<Filter>(ctid &gt; '(4,0)'::tid)</Filter> +
</Plan> +
</Query> +
</explain>
(1 row)

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.

The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-3.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v3.patchDownload+1350-169
pgsql-v9.4-custom-scan.part-2.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v3.patchDownload+1252-14
pgsql-v9.4-custom-scan.part-1.v3.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v3.patchDownload+1201-15
#17Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#16)
Re: Custom Scan APIs (Re: Custom Plan node)

Hi Kaigai-san,

2013/12/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:

2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue. Please
just add #include of utils/lsyscache.h there.

Fixed,

Check.

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name". What is the string
shown there?

I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.

New name seems better, it is what the node express.

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode). I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates. But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.

Agreed. It would be enough that Custom Scan Providers can add
arbitrary information, such as "Remote SQL" of postgres_fdw, to
EXPLAIN result via core API. Some kind of framework which helps
authors of Custom Scan Providers, but it should be considered after
the first cut.

The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.

The patch can be applied onto 2013-12-16 HEAD cleanly, and gives no
unexpected error/warinig.

I'm sorry to post separately, but I have some comments on document.

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

Codes for core and contrib seem fine, so I'll mark the patches "Ready
for committer" after the document enhancement.

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#17)
Re: Custom Scan APIs (Re: Custom Plan node)

Hi Hanada-san,

(2013/12/16 14:15), Shigeru Hanada wrote:

I'm sorry to post separately, but I have some comments on document.

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Best regards,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

pgsql-v9.4-custom-scan.part-1.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-1.v4.patchDownload+1201-15
pgsql-v9.4-custom-scan.part-2.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-2.v4.patchDownload+1253-14
pgsql-v9.4-custom-scan.part-3.v4.patchtext/plain; charset=Shift_JIS; name=pgsql-v9.4-custom-scan.part-3.v4.patchDownload+1360-169
#19Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#18)
Re: Custom Scan APIs (Re: Custom Plan node)

KaiGai-san,

2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2013/12/16 14:15), Shigeru Hanada wrote:

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

Check.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as "Ready for
Committer".

Regards,
--
Shigeru HANADA

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#19)
Re: Custom Scan APIs (Re: Custom Plan node)

Hello,

The attached patches are the ones rebased to the latest git tree, but
no functional
changes from the previous revision on the commit-fest:Nov.
Hanada-san volunteered to review the series of patches, including the
portion for
postgres_fdw, then marked it as "ready for committer" on the last commit fest.
So, I hope someone of committer also volunteer to review the patches for final
checking.

* Part-1 - CustomScan APIs
This patch provides a set of interfaces to interact query-optimizer
and -executor
for extensions. The new add_scan_path_hook or add_join_path_hook allows to
offer alternative ways to scan a particular relation or to join a
particular relations.
Then, once the alternative ways are chosen by the optimizer,
associated callbacks
shall be kicked from the executor. In this case, extension has responsibility to
return a slot that hold a tuple (or empty for end of scan) being
scanned from the
underlying relation.

* Part-2 - contrib/ctidscan
This patch provides a simple example implementation of CustomScan API.
It enables to skip pages when inequality operators are given on ctid system
columns. That is, at least, better than sequential full-scan, so it usually wins
to SeqScan, but Index-scan is much better.

* Part-3 - remote join implementation
This patch provides an example to replace a join by a custom scan node that
runs on a result set of remote join query, on top of existing postgres_fdw
extension. The idea is, a result set of remote query looks like a relation but
intangible, thus, it is feasible to replace a local join by a scan on the result
set of a query executed on the remote host, if both of the relation to be joined
belongs to the identical foreign server.
This patch gives postgres_fdw a capability to run a join on the remote host.

Thanks,

2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>:

KaiGai-san,

2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:

(2013/12/16 14:15), Shigeru Hanada wrote:

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***? According to my trial it works fine like
two similar GUCs.

It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

Check.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as "Ready for
Committer".

Regards,
--
Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

pgsql-v9.4-custom-scan.part-1.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-1.v5.patchDownload+1201-15
pgsql-v9.4-custom-scan.part-2.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-2.v5.patchDownload+1253-14
pgsql-v9.4-custom-scan.part-3.v5.patchapplication/octet-stream; name=pgsql-v9.4-custom-scan.part-3.v5.patchDownload+1360-169
#21KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#20)
#22Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#21)
#23KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#22)
#24Christian Convey
christian.convey@gmail.com
In reply to: KaiGai Kohei (#23)
#25KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Christian Convey (#24)
#26KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#23)
#27Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#26)
#28Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#26)
#29KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Shigeru Hanada (#27)
#30Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: KaiGai Kohei (#26)
#31KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Ashutosh Bapat (#30)
#32Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: KaiGai Kohei (#31)
#33KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Ashutosh Bapat (#32)
#34Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#29)
#35Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#26)
#36Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#31)
#37Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#23)
#38KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#36)
#39Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#38)
#40KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#37)
#41Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#40)
#42Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#40)
#43Stephen Frost
sfrost@snowman.net
In reply to: Shigeru Hanada (#41)
#44KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#39)
#45KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#42)
#46Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: KaiGai Kohei (#44)
#47Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#44)
#48Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#45)
#49KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#47)
#50KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#48)
#51Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#50)
#52KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#51)
#53Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#52)
#54KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#53)
#55Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#54)
#56KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#55)
#57Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#56)
#58Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#42)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#47)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#51)
#61Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#58)
#62Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#61)
#63KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#57)
#64Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#62)
#65Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#60)
#66KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#59)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#64)
#68Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#68)
#70Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#69)
#71Stephen Frost
sfrost@snowman.net
In reply to: Stephen Frost (#55)
#72KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#71)
#73Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Stephen Frost (#68)
#74KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#40)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#71)
#76Stephen Frost
sfrost@snowman.net
In reply to: KaiGai Kohei (#72)
#77Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Robert Haas (#75)
#78Stephen Frost
sfrost@snowman.net
In reply to: Ashutosh Bapat (#73)
#79KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#75)
#80Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#75)
#81Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#78)
#82KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#76)
#83Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#81)
#84Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#75)
#85Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#82)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#84)
#87Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#86)
#88KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Stephen Frost (#87)
#89KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#85)
#90Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#89)
#91KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#90)
#92Tom Lane
tgl@sss.pgh.pa.us
In reply to: KaiGai Kohei (#91)
#93KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#92)
#94KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#92)
#95KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#94)
#96KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#95)
#97KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#45)
#98Simon Riggs
simon@2ndQuadrant.com
In reply to: KaiGai Kohei (#97)
#99Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#98)
#100KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Simon Riggs (#98)
#101KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#99)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#99)
#103Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#102)
#104Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#103)
#105Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#103)
#106Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#105)
#107Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#106)
#108KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#107)
#109KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Tom Lane (#107)
#110KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: KaiGai Kohei (#109)