Re: [v9.5] Custom Plan API
According to the discussion upthread, I revised the custom-plan patch
to focus on regular relation scan but no join support right now, and to
support DDL command to define custom-plan providers.
Planner integration with custom logic to scan a particular relation is
enough simple, unlike various join cases. It's almost similar to what
built-in logic are doing now - custom-plan provider adds a path node
with its cost estimation if it can offer alternative way to scan referenced
relation. (in case of no idea, it does not need to add any paths)
A new DDL syntax I'd like to propose is below:
CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
<name> is as literal, put a unique identifier.
<class> is workload type to be offered by this custom-plan provider.
"scan" is the only option right now, that means base relation scan.
<function_name> is also as literal; it shall perform custom-plan provider.
A custom-plan provider function is assumed to take an argument of
"internal" type to deliver a set of planner information that is needed to
construct custom-plan pathnode.
In case of "scan" class, pointer towards an customScanArg object
shall be delivered on invocation of custom-plan provider.
typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;
In case when the custom-plan provider function being invoked thought
it can offer an alternative scan path on the relation of "baserel", things
to do is (1) construct a CustomPath (or its inherited data type) object
with a table of callback function pointers (2) put its own cost estimation,
and (3) call add_path() to register this path as an alternative one.
Once the custom-path was chosen by query planner, its CreateCustomPlan
callback is called to populate CustomPlan node based on the pathnode.
It also has a table of callback function pointers to handle various planner's
job in setrefs.c and so on.
Similarly, its CreateCustomPlanState callback is called to populate
CustomPlanState node based on the plannode. It also has a table of
callback function pointers to handle various executor's job during quey
execution.
Most of callback designs are not changed from the prior proposition in
v9.4 development cycle, however, here is a few changes.
* CustomPlan became to inherit Scan, and CustomPlanState became to
inherit ScanState. Because some useful routines to implement scan-
logic, like ExecScan, expects state-node has ScanState as its base
type, it's more kindness for extension side. (I'd like to avoid each
extension reinvent ExecScan by copy & paste!)
I'm not sure whether it should be a union of Join in the future, however,
it is a reasonable choice to have compatible layout with Scan/ScanState
to implement alternative "scan" logic.
* Exporting static functions - I still don't have a graceful answer here.
However, it is quite natural that extensions to follow up interface updates
on the future version up of PostgreSQL.
Probably, it shall become clear what class of functions shall be
exported and what class of functions shall be re-implemented within
extension side in the later discussion.
Right now, I exported minimum ones that are needed to implement
alternative scan method - contrib/ctidscan module.
Items to be discussed later:
* planner integration for relations join - probably, we may define new
custom-plan classes as alternative of hash-join, merge-join and
nest-loop. If core can know this custom-plan is alternative of hash-
join, we can utilize core code to check legality of join.
* generic key-value style options in custom-plan definition - Hanada
san proposed me off-list - like foreign data wrapper. It may enable
to configure multiple behavior on a binary.
* ownership and access control of custom-plan. right now, only
superuser can create/drop custom-plan provider definition, thus,
it has no explicit ownership and access control. It seems to me
a reasonable assumption, however, we may have a usecase that
needs custom-plan by unprivileged users.
Thanks,
2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're past the prototyping stage and into productionising what we
know works, AFAIK. If that point is not clear, then we need to
discuss that first.OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and
speculation that's not supported by a lot of evidence. If you think
this isn't prototyping, I wonder what you think *is* prototyping.My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I have
been told will be open source.So there are at least two groups looking at this as a serious option for
Postgres (not including the above paper's authors).That isn't *now*, but it is at least a time scale that fits with acting
on this in the next release, if we can separate out the various ideas and
agree we wish to proceed.I'll submerge again...
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go
2. Custom-path that can replace built-in scan makes sense as a first step
towards the future enhancement. Its planner integration is enough simple
to do.
3. Custom-path that can replace built-in join takes investigation how to
integrate existing planner structure, to avoid (3a) reinvention of
whole of join handling in extension side, and (3b) unnecessary extension
calls towards the case obviously unsupported.So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
on the v9.5 development cycle. Also, we will be able to have discussion
for the (3) portion concurrently, probably, towards 2nd commit-fest.Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
us the face-to-face discussion here.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.5-custom-plan.v1.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v1.patchDownload+2856-36
Kaigai-san,
I've just applied v1 patch, and tried build and install, but I found two issues:
1) The contrib/ctidscan is not automatically built/installed because
it's not described in contrib/Makefile. Is this expected behavior?
2) I got an error message below when building document.
$ cd doc/src/sgml
$ make
openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
"SQL-CREATECUSTOMPLAN"
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'
I'll review another part of the patch, including the design.
2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
According to the discussion upthread, I revised the custom-plan patch
to focus on regular relation scan but no join support right now, and to
support DDL command to define custom-plan providers.Planner integration with custom logic to scan a particular relation is
enough simple, unlike various join cases. It's almost similar to what
built-in logic are doing now - custom-plan provider adds a path node
with its cost estimation if it can offer alternative way to scan referenced
relation. (in case of no idea, it does not need to add any paths)A new DDL syntax I'd like to propose is below:
CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
<name> is as literal, put a unique identifier.
<class> is workload type to be offered by this custom-plan provider.
"scan" is the only option right now, that means base relation scan.
<function_name> is also as literal; it shall perform custom-plan provider.A custom-plan provider function is assumed to take an argument of
"internal" type to deliver a set of planner information that is needed to
construct custom-plan pathnode.
In case of "scan" class, pointer towards an customScanArg object
shall be delivered on invocation of custom-plan provider.typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;In case when the custom-plan provider function being invoked thought
it can offer an alternative scan path on the relation of "baserel", things
to do is (1) construct a CustomPath (or its inherited data type) object
with a table of callback function pointers (2) put its own cost estimation,
and (3) call add_path() to register this path as an alternative one.Once the custom-path was chosen by query planner, its CreateCustomPlan
callback is called to populate CustomPlan node based on the pathnode.
It also has a table of callback function pointers to handle various planner's
job in setrefs.c and so on.Similarly, its CreateCustomPlanState callback is called to populate
CustomPlanState node based on the plannode. It also has a table of
callback function pointers to handle various executor's job during quey
execution.Most of callback designs are not changed from the prior proposition in
v9.4 development cycle, however, here is a few changes.* CustomPlan became to inherit Scan, and CustomPlanState became to
inherit ScanState. Because some useful routines to implement scan-
logic, like ExecScan, expects state-node has ScanState as its base
type, it's more kindness for extension side. (I'd like to avoid each
extension reinvent ExecScan by copy & paste!)
I'm not sure whether it should be a union of Join in the future, however,
it is a reasonable choice to have compatible layout with Scan/ScanState
to implement alternative "scan" logic.* Exporting static functions - I still don't have a graceful answer here.
However, it is quite natural that extensions to follow up interface updates
on the future version up of PostgreSQL.
Probably, it shall become clear what class of functions shall be
exported and what class of functions shall be re-implemented within
extension side in the later discussion.
Right now, I exported minimum ones that are needed to implement
alternative scan method - contrib/ctidscan module.Items to be discussed later:
* planner integration for relations join - probably, we may define new
custom-plan classes as alternative of hash-join, merge-join and
nest-loop. If core can know this custom-plan is alternative of hash-
join, we can utilize core code to check legality of join.
* generic key-value style options in custom-plan definition - Hanada
san proposed me off-list - like foreign data wrapper. It may enable
to configure multiple behavior on a binary.
* ownership and access control of custom-plan. right now, only
superuser can create/drop custom-plan provider definition, thus,
it has no explicit ownership and access control. It seems to me
a reasonable assumption, however, we may have a usecase that
needs custom-plan by unprivileged users.Thanks,
2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're past the prototyping stage and into productionising what we
know works, AFAIK. If that point is not clear, then we need to
discuss that first.OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and
speculation that's not supported by a lot of evidence. If you think
this isn't prototyping, I wonder what you think *is* prototyping.My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I have
been told will be open source.So there are at least two groups looking at this as a serious option for
Postgres (not including the above paper's authors).That isn't *now*, but it is at least a time scale that fits with acting
on this in the next release, if we can separate out the various ideas and
agree we wish to proceed.I'll submerge again...
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go
2. Custom-path that can replace built-in scan makes sense as a first step
towards the future enhancement. Its planner integration is enough simple
to do.
3. Custom-path that can replace built-in join takes investigation how to
integrate existing planner structure, to avoid (3a) reinvention of
whole of join handling in extension side, and (3b) unnecessary extension
calls towards the case obviously unsupported.So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
on the v9.5 development cycle. Also, we will be able to have discussion
for the (3) portion concurrently, probably, towards 2nd commit-fest.Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
us the face-to-face discussion here.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hanada-san,
Thanks for your checks. I oversight the points when I submit the patch, sorry.
The attached one is revised one on documentation stuff and contrib/Makefile.
Thanks,
2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
Kaigai-san,
I've just applied v1 patch, and tried build and install, but I found two issues:
1) The contrib/ctidscan is not automatically built/installed because
it's not described in contrib/Makefile. Is this expected behavior?
2) I got an error message below when building document.$ cd doc/src/sgml
$ make
openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
"SQL-CREATECUSTOMPLAN"
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'I'll review another part of the patch, including the design.
2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
According to the discussion upthread, I revised the custom-plan patch
to focus on regular relation scan but no join support right now, and to
support DDL command to define custom-plan providers.Planner integration with custom logic to scan a particular relation is
enough simple, unlike various join cases. It's almost similar to what
built-in logic are doing now - custom-plan provider adds a path node
with its cost estimation if it can offer alternative way to scan referenced
relation. (in case of no idea, it does not need to add any paths)A new DDL syntax I'd like to propose is below:
CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
<name> is as literal, put a unique identifier.
<class> is workload type to be offered by this custom-plan provider.
"scan" is the only option right now, that means base relation scan.
<function_name> is also as literal; it shall perform custom-plan provider.A custom-plan provider function is assumed to take an argument of
"internal" type to deliver a set of planner information that is needed to
construct custom-plan pathnode.
In case of "scan" class, pointer towards an customScanArg object
shall be delivered on invocation of custom-plan provider.typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;In case when the custom-plan provider function being invoked thought
it can offer an alternative scan path on the relation of "baserel", things
to do is (1) construct a CustomPath (or its inherited data type) object
with a table of callback function pointers (2) put its own cost estimation,
and (3) call add_path() to register this path as an alternative one.Once the custom-path was chosen by query planner, its CreateCustomPlan
callback is called to populate CustomPlan node based on the pathnode.
It also has a table of callback function pointers to handle various planner's
job in setrefs.c and so on.Similarly, its CreateCustomPlanState callback is called to populate
CustomPlanState node based on the plannode. It also has a table of
callback function pointers to handle various executor's job during quey
execution.Most of callback designs are not changed from the prior proposition in
v9.4 development cycle, however, here is a few changes.* CustomPlan became to inherit Scan, and CustomPlanState became to
inherit ScanState. Because some useful routines to implement scan-
logic, like ExecScan, expects state-node has ScanState as its base
type, it's more kindness for extension side. (I'd like to avoid each
extension reinvent ExecScan by copy & paste!)
I'm not sure whether it should be a union of Join in the future, however,
it is a reasonable choice to have compatible layout with Scan/ScanState
to implement alternative "scan" logic.* Exporting static functions - I still don't have a graceful answer here.
However, it is quite natural that extensions to follow up interface updates
on the future version up of PostgreSQL.
Probably, it shall become clear what class of functions shall be
exported and what class of functions shall be re-implemented within
extension side in the later discussion.
Right now, I exported minimum ones that are needed to implement
alternative scan method - contrib/ctidscan module.Items to be discussed later:
* planner integration for relations join - probably, we may define new
custom-plan classes as alternative of hash-join, merge-join and
nest-loop. If core can know this custom-plan is alternative of hash-
join, we can utilize core code to check legality of join.
* generic key-value style options in custom-plan definition - Hanada
san proposed me off-list - like foreign data wrapper. It may enable
to configure multiple behavior on a binary.
* ownership and access control of custom-plan. right now, only
superuser can create/drop custom-plan provider definition, thus,
it has no explicit ownership and access control. It seems to me
a reasonable assumption, however, we may have a usecase that
needs custom-plan by unprivileged users.Thanks,
2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're past the prototyping stage and into productionising what we
know works, AFAIK. If that point is not clear, then we need to
discuss that first.OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and
speculation that's not supported by a lot of evidence. If you think
this isn't prototyping, I wonder what you think *is* prototyping.My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I have
been told will be open source.So there are at least two groups looking at this as a serious option for
Postgres (not including the above paper's authors).That isn't *now*, but it is at least a time scale that fits with acting
on this in the next release, if we can separate out the various ideas and
agree we wish to proceed.I'll submerge again...
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go
2. Custom-path that can replace built-in scan makes sense as a first step
towards the future enhancement. Its planner integration is enough simple
to do.
3. Custom-path that can replace built-in join takes investigation how to
integrate existing planner structure, to avoid (3a) reinvention of
whole of join handling in extension side, and (3b) unnecessary extension
calls towards the case obviously unsupported.So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
on the v9.5 development cycle. Also, we will be able to have discussion
for the (3) portion concurrently, probably, towards 2nd commit-fest.Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
us the face-to-face discussion here.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Shigeru HANADA
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.5-custom-plan.v2.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v2.patchDownload+2859-36
Kaigai-san,
Sorry for lagged response.
Here are my random thoughts about the patch. I couldn't understand
the patch fully, because some of APIs are not used by ctidscan. If
Custom Scan patch v2 review
* Custom plan class comparison
In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
with 's'. Do you plan to use custclass as bit field? If so, values
for custom plan class should not be a character. Otherwise, custclass
should be compared by == operator.
* Purpose of GetSpecialCustomVar()
The reason why FinalizeCustomPlan callback is necessary is not clear to me.
Could you show a case that the API would be useful?
* Purpose of FinalizeCustomPlan()
The reason why FinalizeCustomPlan callback is necessary is not clear
to me, because ctidscan just calls finalize_primnode() and
bms_add_members() with given information. Could you show a case that
the API would be useful?
* Is it ok to call set_cheapest() for all relkind?
Now set_cheapest() is called not for only relation and foreign table
but also custom plan, and other relations such as subquery, function,
and value. Calling call_custom_scan_provider() and set_cheapest() in
the case of RTE_RELATION seems similar to the old construct, how do
you think about this?
* Is it hard to get rid of CopyCustomPlan()?
Copying ForeignScan node doesn't need per-FDW copy function by
limiting fdw_private to have only copy-able objects. Can't we use the
same way for CustomPlan? Letting authors call NodeSetTag or
copyObject() sounds uncomfortable to me.
This would be able to apply to TextOutCustomPlan() and TextOutCustomPath() too.
* MultiExec support is appropriate for the first version?
The cases need MultiExec seems little complex for the first version of
custom scan. What kind of plan do you image for this feature?
* Does SupportBackwardScan() have enough information?
Other scans check target list with TargetListSupportsBackwardScan().
Isn't it necessary to check it for CustomPlan too in
ExecSupportsBackwardScan()?
* Place to call custom plan provider
Is it necessary to call provider when relkind != RELKIND_RELATION? If
yes, isn't it necessary to call for append relation?
I know that we concentrate to replacing scan in the initial version,
so it would not be a serious problem, but it would be good to consider
extensible design.
* Custom Plan Provider is "addpath"?
Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER
seems little odd.
Using handler like FDW makes the design too complex and/or messy?
* superclass of CustomPlanState
CustomPlanState derives ScanState, instead of deriving PlanState
directly. I worry the case of non-heap-scan custom plan, but it might
be ok to postpone consideration about that at the first cut.
* Naming and granularity of objects related to custom plan
I'm not sure the current naming is appropriate, especially difference
between "custom plan" and "provider" and "handler". In the context of
CREATE CUSTOM PLAN statement, what the term "custom plan" means? My
impression is that "custom plan" is an alternative plan type, e.g.
ctidscan or pg_strom_scan. Then what the term "provider" means? My
impression for that is extension, such as ctidscan and pg_strom. The
grammar allows users to pass function via PROVIDER clause of CREATE
CUSTOM SCAN, so the function would be the provider of the custom plan
created by the statement.
* enable_customscan
GUC parameter enable_customscan would be useful for users who want to
disable custom plan feature temporarily. In the case of pg_strom,
using GPU for limited sessions for analytic or batch applications
seems handy.
* Adding pg_custom_plan catalog
Using "cust" as prefix for pg_custom_plan causes ambiguousness which
makes it difficult to choose catalog prefix for a feature named
"Custom Foo" in future. How about using "cusp" (CUStom Plan)?
Or is it better to use pg_custom_plan_provider as catalog relation
name, as the document says that "CREATE CUSTOM PLAN defines custom
plan provider". Then prefix could be "cpp" (Custom Plan Provider).
This seems to match the wording used for pg_foreign_data_wrapper.
* CREATE CUSTOM PLAN statement
This is just a question: We need to emit CREATE CUSTOM PLAN if we want to use
I wonder how it is extended when supporting join as custom class.
* New operators about TID comparison
IMO this portion should be a separated patch, because it adds OID
definition of existing operators such as tidgt and tidle. Is there
any (explicit or implicit) rule about defining macro for oid of an
operator?
* Prototype of get_custom_plan_oid()
custname (or cppname if use the rule I proposed above) seems
appropriate as the parameter name of get_custom_plan_oid() because
similar funcitons use catalog column names in such case.
* Coding conventions
Some lines are indented with white space. Future pgindent run will
fix this issue?
* Unnecessary struct forward declaration
Forward declarations of CustomPathMethods, Plan, and CustomPlan in
includes/nodes/relation.h seem unncecessary. Other headers might have
same issue.
* Unnecessary externing of replace_nestloop_params()
replace_nestloop_params() is extern-ed but it's never called outside
createplan.c.
* Externing fix_scan_expr()
If it's necessary for all custom plan providers to call fix_scan_expr
(via fix_scan_list macro), isn't it able to do it in set_plan_refs()
before calling SetCustomPlanRef()?
* What does T_CustomPlanMarkPos mean?
It's not clear to me when CustomPlanMarkPos works. Is it for a custom
plan provider which supports marking position and rewind to the
position, and ctidscan just lacks capability to do that, so it is not
used anywhere?
* Unnecessary changes in allpaths.c
some comment about Subquery and CTE are changed (perhaps) accidentally.
* Typos
* planenr -> planner, implements -> implement in create_custom_plan.sgml
* CustomScan in nodeCustom.h should be CustomPlan?
* delivered -> derived, in src/backend/optimizer/util/pathnode.c
* Document "Writing Custom Plan Provider" is not provided
Custom Plan Provider author would (and I DO!) hope documents about
writing a custom plan provider.
Regards,
2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hanada-san,
Thanks for your checks. I oversight the points when I submit the patch, sorry.
The attached one is revised one on documentation stuff and contrib/Makefile.Thanks,
2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
Kaigai-san,
I've just applied v1 patch, and tried build and install, but I found two issues:
1) The contrib/ctidscan is not automatically built/installed because
it's not described in contrib/Makefile. Is this expected behavior?
2) I got an error message below when building document.$ cd doc/src/sgml
$ make
openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
"SQL-CREATECUSTOMPLAN"
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'I'll review another part of the patch, including the design.
2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
According to the discussion upthread, I revised the custom-plan patch
to focus on regular relation scan but no join support right now, and to
support DDL command to define custom-plan providers.Planner integration with custom logic to scan a particular relation is
enough simple, unlike various join cases. It's almost similar to what
built-in logic are doing now - custom-plan provider adds a path node
with its cost estimation if it can offer alternative way to scan referenced
relation. (in case of no idea, it does not need to add any paths)A new DDL syntax I'd like to propose is below:
CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
<name> is as literal, put a unique identifier.
<class> is workload type to be offered by this custom-plan provider.
"scan" is the only option right now, that means base relation scan.
<function_name> is also as literal; it shall perform custom-plan provider.A custom-plan provider function is assumed to take an argument of
"internal" type to deliver a set of planner information that is needed to
construct custom-plan pathnode.
In case of "scan" class, pointer towards an customScanArg object
shall be delivered on invocation of custom-plan provider.typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;In case when the custom-plan provider function being invoked thought
it can offer an alternative scan path on the relation of "baserel", things
to do is (1) construct a CustomPath (or its inherited data type) object
with a table of callback function pointers (2) put its own cost estimation,
and (3) call add_path() to register this path as an alternative one.Once the custom-path was chosen by query planner, its CreateCustomPlan
callback is called to populate CustomPlan node based on the pathnode.
It also has a table of callback function pointers to handle various planner's
job in setrefs.c and so on.Similarly, its CreateCustomPlanState callback is called to populate
CustomPlanState node based on the plannode. It also has a table of
callback function pointers to handle various executor's job during quey
execution.Most of callback designs are not changed from the prior proposition in
v9.4 development cycle, however, here is a few changes.* CustomPlan became to inherit Scan, and CustomPlanState became to
inherit ScanState. Because some useful routines to implement scan-
logic, like ExecScan, expects state-node has ScanState as its base
type, it's more kindness for extension side. (I'd like to avoid each
extension reinvent ExecScan by copy & paste!)
I'm not sure whether it should be a union of Join in the future, however,
it is a reasonable choice to have compatible layout with Scan/ScanState
to implement alternative "scan" logic.* Exporting static functions - I still don't have a graceful answer here.
However, it is quite natural that extensions to follow up interface updates
on the future version up of PostgreSQL.
Probably, it shall become clear what class of functions shall be
exported and what class of functions shall be re-implemented within
extension side in the later discussion.
Right now, I exported minimum ones that are needed to implement
alternative scan method - contrib/ctidscan module.Items to be discussed later:
* planner integration for relations join - probably, we may define new
custom-plan classes as alternative of hash-join, merge-join and
nest-loop. If core can know this custom-plan is alternative of hash-
join, we can utilize core code to check legality of join.
* generic key-value style options in custom-plan definition - Hanada
san proposed me off-list - like foreign data wrapper. It may enable
to configure multiple behavior on a binary.
* ownership and access control of custom-plan. right now, only
superuser can create/drop custom-plan provider definition, thus,
it has no explicit ownership and access control. It seems to me
a reasonable assumption, however, we may have a usecase that
needs custom-plan by unprivileged users.Thanks,
2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're past the prototyping stage and into productionising what we
know works, AFAIK. If that point is not clear, then we need to
discuss that first.OK, I'll bite: what here do we know works? Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and
speculation that's not supported by a lot of evidence. If you think
this isn't prototyping, I wonder what you think *is* prototyping.My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I have
been told will be open source.So there are at least two groups looking at this as a serious option for
Postgres (not including the above paper's authors).That isn't *now*, but it is at least a time scale that fits with acting
on this in the next release, if we can separate out the various ideas and
agree we wish to proceed.I'll submerge again...
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go
2. Custom-path that can replace built-in scan makes sense as a first step
towards the future enhancement. Its planner integration is enough simple
to do.
3. Custom-path that can replace built-in join takes investigation how to
integrate existing planner structure, to avoid (3a) reinvention of
whole of join handling in extension side, and (3b) unnecessary extension
calls towards the case obviously unsupported.So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
on the v9.5 development cycle. Also, we will be able to have discussion
for the (3) portion concurrently, probably, towards 2nd commit-fest.Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
us the face-to-face discussion here.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Shigeru HANADA--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hanada-san,
Thanks for your dedicated reviewing.
It's a very long message. So, let me summarize the things
I shall do in the next patch.
* fix bug: custom-plan class comparison
* fix up naming convention and syntax
CREATE CUSTOM PLAN PROVIDER, rather than
CREATE CUSTOM PLAN. Prefix shall be "cpp_".
* fix up: definition of get_custom_plan_oid()
* fix up: unexpected white spaces, to be tabs.
* fix up: remove unnecessary forward declarations.
* fix up: revert replace_nestloop_params() to static
* make SetCustomPlanRef an optional callback
* fix up: typos in various points
* add documentation to explain custom-plan interface.
Also, I want committer's opinion about the issues below
* whether set_cheapest() is called for all relkind?
* how argument of add_path handler shall be derivered?
Individual comments are put below:
Kaigai-san,
Sorry for lagged response.
Here are my random thoughts about the patch. I couldn't understand the
patch fully, because some of APIs are not used by ctidscan. IfCustom Scan patch v2 review
* Custom plan class comparison
In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
with 's'. Do you plan to use custclass as bit field? If so, values for
custom plan class should not be a character. Otherwise, custclass should
be compared by == operator.
Sorry, it is a bug that come from the previous design.
I had an idea that allows a custom plan provider to support multiple kind
of exec nodes, however, I concluded it does not make sense so much. (we
can define multiple CPP for each)
* Purpose of GetSpecialCustomVar()
The reason why FinalizeCustomPlan callback is necessary is not clear to
me.
Could you show a case that the API would be useful?
It is needed feature to replace a built-in join by custom scan, however,
it might be unclear on the scan workloads.
Let me explain why join replacement needed. A join node has two input
slot (inner and outer), its expression node including Var node reference
either of slot according to its varno (INNER_VAR or OUTER_VAR).
In case when a CPP replaced a join, it has to generate an equivalent result
but it may not be a best choice to use two input streams.
(Please remind when we construct remote join on postgres_fdw, all the
materialization was done on remote side, thus we had one input stream to
generate local join equivalent view.)
On the other hands, EXPLAIN command has to understand what column is the
source of varnodes in targetlist of custom-node even if it is rewritten
to use just one slot. For example, which label shall be shown in case when
3rd item of targetlist is originally come from 2nd item of inner slot but
all the materialized result is stored into outer slot.
Only CPP can track its relationship between the original and the newer one.
This interface provides a way to solve a varnode that actually references.
* Purpose of FinalizeCustomPlan()
The reason why FinalizeCustomPlan callback is necessary is not clear to
me, because ctidscan just calls finalize_primnode() and
bms_add_members() with given information. Could you show a case that the
API would be useful?
The main purpose of this callback gives an extension chance to apply
finalize_primenode() if custom-node hold expression tree on its private
fields.
In case when CPP picked up a part of clauses to run its own way, it shall
be attached on neither plan->targetlist nor plan->qual, only CPP knows
where does it attached. So, these orphan expression nodes have to be
treated by CPP.
* Is it ok to call set_cheapest() for all relkind?
Now set_cheapest() is called not for only relation and foreign table but
also custom plan, and other relations such as subquery, function, and value.
Calling call_custom_scan_provider() and set_cheapest() in the case of
RTE_RELATION seems similar to the old construct, how do you think about
this?
I don't think we may be actually able to have some useful custom scan logic
on these special relation forms, however, I also didn't have a special reason
why custom-plan does not need to support these special relations.
I'd like to see committer's opinion here.
* Is it hard to get rid of CopyCustomPlan()?
Copying ForeignScan node doesn't need per-FDW copy function by limiting
fdw_private to have only copy-able objects. Can't we use the same way for
CustomPlan? Letting authors call NodeSetTag or
copyObject() sounds uncomfortable to me.This would be able to apply to TextOutCustomPlan() and TextOutCustomPath()
too.
FDW-like design was the original one, but the latest design was suggestion
by Tom on the v9.4 development cycle, because some data types are not
complianced to copyObject; like Bitmapset.
* MultiExec support is appropriate for the first version?
The cases need MultiExec seems little complex for the first version of custom
scan. What kind of plan do you image for this feature?
It is definitely necessary to exchange multiple rows with custom-format with
upper level if both of nodes are managed by same CPP.
I plan to use this interface for bulk-loading that makes much faster data
loading to GPUs.
* Does SupportBackwardScan() have enough information?
Other scans check target list with TargetListSupportsBackwardScan().
Isn't it necessary to check it for CustomPlan too in
ExecSupportsBackwardScan()?
It derivers CustomPlan node itself that includes Plan node.
If CPP thought it is necessary, it can run equivalent checks here.
* Place to call custom plan provider
Is it necessary to call provider when relkind != RELKIND_RELATION? If yes,
isn't it necessary to call for append relation?I know that we concentrate to replacing scan in the initial version, so
it would not be a serious problem, but it would be good to consider extensible
design.
Regarding of the child relation scan, set_append_rel_pathlist() calls
set_rel_pathlist() that is entry point of custom-scan paths.
If you mention about alternative-path of Append node, yes, it is not
a feature being supported in the first commit.
* Custom Plan Provider is "addpath"?
Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER seems
little odd.
Using handler like FDW makes the design too complex and/or messy?
This design allows to pass a set of information needed according to the
workload; like join not only scan. If we need to extend customXXXXArg in
the future, all we need to extend is data structure definition, not
function prototype itself.
Anyway, I'd like to make a decision for this on committer review stage.
* superclass of CustomPlanState
CustomPlanState derives ScanState, instead of deriving PlanState directly.
I worry the case of non-heap-scan custom plan, but it might be ok to postpone
consideration about that at the first cut.
We have some useful routines to implement custom-scan logic, but they takes
ScanState argument, like ExecScan().
Even though we can copy it and paste to extension code, it is not a good manner.
It takes three pointer variables in addition to PlanState. If CPP does not
take care about regular heap scan, keep them unused. It is quite helpful if
CPP implements some original logic on top of existing heap scan.
* Naming and granularity of objects related to custom plan I'm not sure
the current naming is appropriate, especially difference between "custom
plan" and "provider" and "handler". In the context of CREATE CUSTOM PLAN
statement, what the term "custom plan" means? My impression is that "custom
plan" is an alternative plan type, e.g.
ctidscan or pg_strom_scan. Then what the term "provider" means? My
impression for that is extension, such as ctidscan and pg_strom. The
grammar allows users to pass function via PROVIDER clause of CREATE CUSTOM
SCAN, so the function would be the provider of the custom plan created by
the statement.
Hmm... What you want to say is, CREATE X statement is expected to create X.
On the other hand, "custom-plan" is actually created by custom-plan provider,
not this DDL statement. The DDL statement defined custom-plan "provider".
I also think the suggestion is reasonable.
How about the statement below instead?
CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function;
cpp_kind := SCAN (other types shall be supported later)
* enable_customscan
GUC parameter enable_customscan would be useful for users who want to
disable custom plan feature temporarily. In the case of pg_strom, using
GPU for limited sessions for analytic or batch applications seems handy.
It should be done by extension side individually.
Please imagine a user who install custom-GPU-scan, custom-matview-redirect
and custom-cache-only-scan. Purpose of each CPP are quite individually,
so I don't think enable_customscan makes sense.
* Adding pg_custom_plan catalog
Using "cust" as prefix for pg_custom_plan causes ambiguousness which makes
it difficult to choose catalog prefix for a feature named "Custom Foo" in
future. How about using "cusp" (CUStom Plan)?Or is it better to use pg_custom_plan_provider as catalog relation name,
as the document says that "CREATE CUSTOM PLAN defines custom plan provider".
Then prefix could be "cpp" (Custom Plan Provider).
This seems to match the wording used for pg_foreign_data_wrapper.
My preference "cpp" as a shorten of custom plan provider.
* CREATE CUSTOM PLAN statement
This is just a question: We need to emit CREATE CUSTOM PLAN if we want
to use I wonder how it is extended when supporting join as custom class.
In case of join, I'll extend the syntax as follows:
CREATE CUSTOM PLAN cppname
FOR [HASH JOIN|MERGE JOIN|NEST LOOP]
PROVIDER provider_func;
Like customScanArg, we will define an argument type for each join methods
then provider_func shall be called with this argument.
I think it is well flexible and extendable approach.
* New operators about TID comparison
IMO this portion should be a separated patch, because it adds OID definition
of existing operators such as tidgt and tidle. Is there any (explicit or
implicit) rule about defining macro for oid of an operator?
I don't know the general rules to define static OID definition.
Probably, these are added on demand.
* Prototype of get_custom_plan_oid()
custname (or cppname if use the rule I proposed above) seems appropriate
as the parameter name of get_custom_plan_oid() because similar funcitons
use catalog column names in such case.
I'll rename it as follows:
extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool missing_ok);
* Coding conventions
Some lines are indented with white space. Future pgindent run will fix
this issue?
It's my oversight, to be fixed.
* Unnecessary struct forward declaration Forward declarations of
CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h seem
unncecessary. Other headers might have same issue.
I'll check it. I had try & error during the development. It might leave
a dead code here.
* Unnecessary externing of replace_nestloop_params()
replace_nestloop_params() is extern-ed but it's never called outside
createplan.c.
Indeed, it's not needed until we support custom join logic.
* Externing fix_scan_expr()
If it's necessary for all custom plan providers to call fix_scan_expr (via
fix_scan_list macro), isn't it able to do it in set_plan_refs() before
calling SetCustomPlanRef()?
One alternative idea is:
if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node
has no private expression tree to be fixed up, CPP can have no
SetCustomPlanRef callback. In this case, core backend applies
fix_scan_list on the targetlist and qual, then adjust scanrelid.
It was what I did in the previous revision, that was concerned by Tom
because it assumes too much things to the custom-node. (It is useful
to only custom "scan" node)
* What does T_CustomPlanMarkPos mean?
It's not clear to me when CustomPlanMarkPos works. Is it for a custom plan
provider which supports marking position and rewind to the position, and
ctidscan just lacks capability to do that, so it is not used anywhere?
Its previous design had a flag whether it allows backward scan, in the body
of CustomPlan structure. However, it makes a problem on ExecSupportsMarkRestore()
that takes only node-tag to determine whether the supplied node support
backward scan or not.
Once I tried to change ExecSupportsMarkRestore() to accept node body, then
Tom suggested to use a separated node tag instead.
* Unnecessary changes in allpaths.c
some comment about Subquery and CTE are changed (perhaps) accidentally.
No, it is intentional because set_cheapest() was consolidated.
* Typos
* planenr -> planner, implements -> implement in create_custom_plan.sgml
* CustomScan in nodeCustom.h should be CustomPlan?
* delivered -> derived, in src/backend/optimizer/util/pathnode.c
OK, I'll fix them.
* Document "Writing Custom Plan Provider" is not provided Custom Plan
Provider author would (and I DO!) hope documents about writing a custom
plan provider.
A documentation like fdwhandler.sgml, isn't it?
OK, I'll make it up.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hanada-san,
Thanks for your checks. I oversight the points when I submit the patch,
sorry.
The attached one is revised one on documentation stuff and
contrib/Makefile.
Thanks,
2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
Kaigai-san,
I've just applied v1 patch, and tried build and install, but I found
two issues:
1) The contrib/ctidscan is not automatically built/installed because
it's not described in contrib/Makefile. Is this expected behavior?
2) I got an error message below when building document.$ cd doc/src/sgml
$ make
openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
"SQL-CREATECUSTOMPLAN"
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'I'll review another part of the patch, including the design.
2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
According to the discussion upthread, I revised the custom-plan
patch to focus on regular relation scan but no join support right
now, and to support DDL command to define custom-plan providers.Planner integration with custom logic to scan a particular relation
is enough simple, unlike various join cases. It's almost similar to
what built-in logic are doing now - custom-plan provider adds a path
node with its cost estimation if it can offer alternative way to
scan referenced relation. (in case of no idea, it does not need to
add any paths)A new DDL syntax I'd like to propose is below:
CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
<name> is as literal, put a unique identifier.
<class> is workload type to be offered by this custom-plan provider.
"scan" is the only option right now, that means base relation scan.
<function_name> is also as literal; it shall perform custom-planprovider.
A custom-plan provider function is assumed to take an argument of
"internal" type to deliver a set of planner information that is
needed to construct custom-plan pathnode.
In case of "scan" class, pointer towards an customScanArg object
shall be delivered on invocation of custom-plan provider.typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;In case when the custom-plan provider function being invoked thought
it can offer an alternative scan path on the relation of "baserel",
things to do is (1) construct a CustomPath (or its inherited data
type) object with a table of callback function pointers (2) put its
own cost estimation, and (3) call add_path() to register this path asan alternative one.
Once the custom-path was chosen by query planner, its
CreateCustomPlan callback is called to populate CustomPlan node basedon the pathnode.
It also has a table of callback function pointers to handle various
planner's job in setrefs.c and so on.Similarly, its CreateCustomPlanState callback is called to populate
CustomPlanState node based on the plannode. It also has a table of
callback function pointers to handle various executor's job during
quey execution.Most of callback designs are not changed from the prior proposition
in
v9.4 development cycle, however, here is a few changes.* CustomPlan became to inherit Scan, and CustomPlanState became to
inherit ScanState. Because some useful routines to implement scan-
logic, like ExecScan, expects state-node has ScanState as its base
type, it's more kindness for extension side. (I'd like to avoid each
extension reinvent ExecScan by copy & paste!)
I'm not sure whether it should be a union of Join in the future,however,
it is a reasonable choice to have compatible layout with
Scan/ScanState
to implement alternative "scan" logic.
* Exporting static functions - I still don't have a graceful answer
here.
However, it is quite natural that extensions to follow up interface
updates
on the future version up of PostgreSQL.
Probably, it shall become clear what class of functions shall be
exported and what class of functions shall be re-implemented within
extension side in the later discussion.
Right now, I exported minimum ones that are needed to implement
alternative scan method - contrib/ctidscan module.Items to be discussed later:
* planner integration for relations join - probably, we may define new
custom-plan classes as alternative of hash-join, merge-join and
nest-loop. If core can know this custom-plan is alternative of hash-
join, we can utilize core code to check legality of join.
* generic key-value style options in custom-plan definition - Hanada
san proposed me off-list - like foreign data wrapper. It may enable
to configure multiple behavior on a binary.
* ownership and access control of custom-plan. right now, only
superuser can create/drop custom-plan provider definition, thus,
it has no explicit ownership and access control. It seems to me
a reasonable assumption, however, we may have a usecase that
needs custom-plan by unprivileged users.Thanks,
2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're past the prototyping stage and into productionising what
we know works, AFAIK. If that point is not clear, then we need
to discuss that first.OK, I'll bite: what here do we know works? Not a damn thing
AFAICS; it's all speculation that certain hooks might be useful,
and speculation that's not supported by a lot of evidence. If
you think this isn't prototyping, I wonder what you think *is*prototyping.
My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October,
which I have been told will be open source.So there are at least two groups looking at this as a serious
option for Postgres (not including the above paper's authors).That isn't *now*, but it is at least a time scale that fits with
acting on this in the next release, if we can separate out the
various ideas and agree we wish to proceed.I'll submerge again...
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go 2. Custom-path
that can replace built-in scan makes sense as a first step
towards the future enhancement. Its planner integration is enoughsimple
to do.
3. Custom-path that can replace built-in join takes investigation howto
integrate existing planner structure, to avoid (3a) reinvention
of
whole of join handling in extension side, and (3b) unnecessary
extension
calls towards the case obviously unsupported.
So, I'd like to start the (2) portion towards the upcoming 1st
commit-fest on the v9.5 development cycle. Also, we will be able to
have discussion for the (3) portion concurrently, probably, towards2nd commit-fest.
Unfortunately, I cannot participate PGcon/Ottawa this year. Please
share us the face-to-face discussion here.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Shigeru HANADA--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Resolved by subject fallback
Hanada-san,
The attached patch is revised one.
Updates from the previous version are below:
* System catalog name was changed to pg_custom_plan_provider;
that reflects role of the object being defined.
* Also, prefix of its variable names are changed to "cpp"; that
means custom-plan-provider.
* Syntax also reflects what the command does more. New syntax to
define custom plan provider is:
CREATE CUSTOM PLAN PROVIDER <cpp_name>
FOR <cpp_class> HANDLER <cpp_function>;
* Add custom-plan.sgml to introduce interface functions defined
for path/plan/exec methods.
* FinalizeCustomPlan() callback was simplified to support scan
(and join in the future) at the starting point. As long as
scan/join requirement, no need to control paramids bitmap in
arbitrary way.
* Unnecessary forward declaration in relation.h and plannode.h
were removed, but a few structures still needs to have
forward declarations.
* Fix typos being pointed out.
I'd like to see committer's suggestion regarding to the design
issues below:
* whether set_cheapest() is called for all relkind?
->according to the discussion in v9.4 cycle, I consolidated
set_cheapest() in allpaths.c to set_rel_pathlist().
Hanada-san wonder whether it is necessary to have custom-
plan on none base relations; like sub-query or values-scan.
I don't have reason why not to run custom-plan on these
non usual relations.
* how argument of add_path handler shall be derivered?
-> custom-plan handler function takes an argument with
internal data type; that is a pointer of customScanArg
if custom-plan class is "scan". (It shall be
customHashJoinArg if "hash join" for example).
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
Show quoted text
-----Original Message-----
From: Kaigai Kouhei(海外 浩平)
Sent: Friday, July 04, 2014 1:23 PM
To: 'Shigeru Hanada'; Kohei KaiGai
Cc: Simon Riggs; Tom Lane; Stephen Frost; Robert Haas; Andres Freund;
PgHacker; Jim Mlodgenski; Peter Eisentraut
Subject: Re: [HACKERS] [v9.5] Custom Plan APIHanada-san,
Thanks for your dedicated reviewing.
It's a very long message. So, let me summarize the things I shall do in
the next patch.* fix bug: custom-plan class comparison
* fix up naming convention and syntax
CREATE CUSTOM PLAN PROVIDER, rather than
CREATE CUSTOM PLAN. Prefix shall be "cpp_".
* fix up: definition of get_custom_plan_oid()
* fix up: unexpected white spaces, to be tabs.
* fix up: remove unnecessary forward declarations.
* fix up: revert replace_nestloop_params() to static
* make SetCustomPlanRef an optional callback
* fix up: typos in various points
* add documentation to explain custom-plan interface.Also, I want committer's opinion about the issues below
* whether set_cheapest() is called for all relkind?
* how argument of add_path handler shall be derivered?Individual comments are put below:
Kaigai-san,
Sorry for lagged response.
Here are my random thoughts about the patch. I couldn't understand
the patch fully, because some of APIs are not used by ctidscan. IfCustom Scan patch v2 review
* Custom plan class comparison
In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
with 's'. Do you plan to use custclass as bit field? If so, values
for custom plan class should not be a character. Otherwise, custclass
should be compared by == operator.Sorry, it is a bug that come from the previous design.
I had an idea that allows a custom plan provider to support multiple kind
of exec nodes, however, I concluded it does not make sense so much. (we
can define multiple CPP for each)* Purpose of GetSpecialCustomVar()
The reason why FinalizeCustomPlan callback is necessary is not clear
to me.
Could you show a case that the API would be useful?It is needed feature to replace a built-in join by custom scan, however,
it might be unclear on the scan workloads.Let me explain why join replacement needed. A join node has two input slot
(inner and outer), its expression node including Var node reference either
of slot according to its varno (INNER_VAR or OUTER_VAR).
In case when a CPP replaced a join, it has to generate an equivalent result
but it may not be a best choice to use two input streams.
(Please remind when we construct remote join on postgres_fdw, all the
materialization was done on remote side, thus we had one input stream to
generate local join equivalent view.) On the other hands, EXPLAIN command
has to understand what column is the source of varnodes in targetlist of
custom-node even if it is rewritten to use just one slot. For example, which
label shall be shown in case when 3rd item of targetlist is originally come
from 2nd item of inner slot but all the materialized result is stored into
outer slot.
Only CPP can track its relationship between the original and the newer one.
This interface provides a way to solve a varnode that actually references.* Purpose of FinalizeCustomPlan()
The reason why FinalizeCustomPlan callback is necessary is not clear
to me, because ctidscan just calls finalize_primnode() and
bms_add_members() with given information. Could you show a case that
the API would be useful?The main purpose of this callback gives an extension chance to apply
finalize_primenode() if custom-node hold expression tree on its private
fields.
In case when CPP picked up a part of clauses to run its own way, it shall
be attached on neither plan->targetlist nor plan->qual, only CPP knows where
does it attached. So, these orphan expression nodes have to be treated by
CPP.* Is it ok to call set_cheapest() for all relkind?
Now set_cheapest() is called not for only relation and foreign table
but also custom plan, and other relations such as subquery, function,and value.
Calling call_custom_scan_provider() and set_cheapest() in the case of
RTE_RELATION seems similar to the old construct, how do you think
about this?I don't think we may be actually able to have some useful custom scan logic
on these special relation forms, however, I also didn't have a special reason
why custom-plan does not need to support these special relations.
I'd like to see committer's opinion here.* Is it hard to get rid of CopyCustomPlan()?
Copying ForeignScan node doesn't need per-FDW copy function by
limiting fdw_private to have only copy-able objects. Can't we use the
same way for CustomPlan? Letting authors call NodeSetTag or
copyObject() sounds uncomfortable to me.This would be able to apply to TextOutCustomPlan() and
TextOutCustomPath() too.FDW-like design was the original one, but the latest design was suggestion
by Tom on the v9.4 development cycle, because some data types are not
complianced to copyObject; like Bitmapset.* MultiExec support is appropriate for the first version?
The cases need MultiExec seems little complex for the first version of
custom scan. What kind of plan do you image for this feature?It is definitely necessary to exchange multiple rows with custom-format
with upper level if both of nodes are managed by same CPP.
I plan to use this interface for bulk-loading that makes much faster data
loading to GPUs.* Does SupportBackwardScan() have enough information?
Other scans check target list with TargetListSupportsBackwardScan().
Isn't it necessary to check it for CustomPlan too in
ExecSupportsBackwardScan()?It derivers CustomPlan node itself that includes Plan node.
If CPP thought it is necessary, it can run equivalent checks here.* Place to call custom plan provider
Is it necessary to call provider when relkind != RELKIND_RELATION? If
yes, isn't it necessary to call for append relation?I know that we concentrate to replacing scan in the initial version,
so it would not be a serious problem, but it would be good to consider
extensible design.Regarding of the child relation scan, set_append_rel_pathlist() calls
set_rel_pathlist() that is entry point of custom-scan paths.
If you mention about alternative-path of Append node, yes, it is not a
feature being supported in the first commit.* Custom Plan Provider is "addpath"?
Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER
seems little odd.
Using handler like FDW makes the design too complex and/or messy?This design allows to pass a set of information needed according to the
workload; like join not only scan. If we need to extend customXXXXArg in
the future, all we need to extend is data structure definition, not function
prototype itself.
Anyway, I'd like to make a decision for this on committer review stage.* superclass of CustomPlanState
CustomPlanState derives ScanState, instead of deriving PlanStatedirectly.
I worry the case of non-heap-scan custom plan, but it might be ok to
postpone consideration about that at the first cut.We have some useful routines to implement custom-scan logic, but they takes
ScanState argument, like ExecScan().
Even though we can copy it and paste to extension code, it is not a good
manner.
It takes three pointer variables in addition to PlanState. If CPP does not
take care about regular heap scan, keep them unused. It is quite helpful
if CPP implements some original logic on top of existing heap scan.* Naming and granularity of objects related to custom plan I'm not
sure the current naming is appropriate, especially difference between
"custom plan" and "provider" and "handler". In the context of CREATE
CUSTOM PLAN statement, what the term "custom plan" means? My
impression is that "custom plan" is an alternative plan type, e.g.
ctidscan or pg_strom_scan. Then what the term "provider" means? My
impression for that is extension, such as ctidscan and pg_strom. The
grammar allows users to pass function via PROVIDER clause of CREATE
CUSTOM SCAN, so the function would be the provider of the custom plan
created by the statement.Hmm... What you want to say is, CREATE X statement is expected to create
X.
On the other hand, "custom-plan" is actually created by custom-plan provider,
not this DDL statement. The DDL statement defined custom-plan "provider".
I also think the suggestion is reasonable.How about the statement below instead?
CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function;
cpp_kind := SCAN (other types shall be supported later)* enable_customscan
GUC parameter enable_customscan would be useful for users who want to
disable custom plan feature temporarily. In the case of pg_strom,
using GPU for limited sessions for analytic or batch applications seemshandy.
It should be done by extension side individually.
Please imagine a user who install custom-GPU-scan, custom-matview-redirect
and custom-cache-only-scan. Purpose of each CPP are quite individually,
so I don't think enable_customscan makes sense.* Adding pg_custom_plan catalog
Using "cust" as prefix for pg_custom_plan causes ambiguousness which
makes it difficult to choose catalog prefix for a feature named
"Custom Foo" in future. How about using "cusp" (CUStom Plan)?Or is it better to use pg_custom_plan_provider as catalog relation
name, as the document says that "CREATE CUSTOM PLAN defines custom planprovider".
Then prefix could be "cpp" (Custom Plan Provider).
This seems to match the wording used for pg_foreign_data_wrapper.My preference "cpp" as a shorten of custom plan provider.
* CREATE CUSTOM PLAN statement
This is just a question: We need to emit CREATE CUSTOM PLAN if we
want to use I wonder how it is extended when supporting join as customclass.
In case of join, I'll extend the syntax as follows:
CREATE CUSTOM PLAN cppname
FOR [HASH JOIN|MERGE JOIN|NEST LOOP]
PROVIDER provider_func;Like customScanArg, we will define an argument type for each join methods
then provider_func shall be called with this argument.
I think it is well flexible and extendable approach.* New operators about TID comparison
IMO this portion should be a separated patch, because it adds OID
definition of existing operators such as tidgt and tidle. Is there
any (explicit or
implicit) rule about defining macro for oid of an operator?I don't know the general rules to define static OID definition.
Probably, these are added on demand.* Prototype of get_custom_plan_oid()
custname (or cppname if use the rule I proposed above) seems
appropriate as the parameter name of get_custom_plan_oid() because
similar funcitons use catalog column names in such case.I'll rename it as follows:
extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool
missing_ok);* Coding conventions
Some lines are indented with white space. Future pgindent run will
fix this issue?It's my oversight, to be fixed.
* Unnecessary struct forward declaration Forward declarations of
CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h
seem unncecessary. Other headers might have same issue.I'll check it. I had try & error during the development. It might leave
a dead code here.* Unnecessary externing of replace_nestloop_params()
replace_nestloop_params() is extern-ed but it's never called outside
createplan.c.Indeed, it's not needed until we support custom join logic.
* Externing fix_scan_expr()
If it's necessary for all custom plan providers to call fix_scan_expr
(via fix_scan_list macro), isn't it able to do it in set_plan_refs()
before calling SetCustomPlanRef()?One alternative idea is:
if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node
has no private expression tree to be fixed up, CPP can have no
SetCustomPlanRef callback. In this case, core backend applies
fix_scan_list on the targetlist and qual, then adjust scanrelid.It was what I did in the previous revision, that was concerned by Tom because
it assumes too much things to the custom-node. (It is useful to only custom
"scan" node)* What does T_CustomPlanMarkPos mean?
It's not clear to me when CustomPlanMarkPos works. Is it for a custom
plan provider which supports marking position and rewind to the
position, and ctidscan just lacks capability to do that, so it is notused anywhere?
Its previous design had a flag whether it allows backward scan, in the body
of CustomPlan structure. However, it makes a problem on
ExecSupportsMarkRestore() that takes only node-tag to determine whether
the supplied node support backward scan or not.
Once I tried to change ExecSupportsMarkRestore() to accept node body, then
Tom suggested to use a separated node tag instead.* Unnecessary changes in allpaths.c
some comment about Subquery and CTE are changed (perhaps) accidentally.No, it is intentional because set_cheapest() was consolidated.
* Typos
* planenr -> planner, implements -> implement increate_custom_plan.sgml
* CustomScan in nodeCustom.h should be CustomPlan?
* delivered -> derived, in src/backend/optimizer/util/pathnode.cOK, I'll fix them.
* Document "Writing Custom Plan Provider" is not provided Custom Plan
Provider author would (and I DO!) hope documents about writing a
custom plan provider.A documentation like fdwhandler.sgml, isn't it?
OK, I'll make it up.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hanada-san,
Thanks for your checks. I oversight the points when I submit the
patch,sorry.
The attached one is revised one on documentation stuff and
contrib/Makefile.
Thanks,
2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
Kaigai-san,
I've just applied v1 patch, and tried build and install, but I
foundtwo issues:
1) The contrib/ctidscan is not automatically built/installed
because it's not described in contrib/Makefile. Is this expectedbehavior?
2) I got an error message below when building document.
$ cd doc/src/sgml
$ make
openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index
postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
"SQL-CREATECUSTOMPLAN"
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'I'll review another part of the patch, including the design.
2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
According to the discussion upthread, I revised the custom-plan
patch to focus on regular relation scan but no join support right
now, and to support DDL command to define custom-plan providers.Planner integration with custom logic to scan a particular
relation is enough simple, unlike various join cases. It's almost
similar to what built-in logic are doing now - custom-plan
provider adds a path node with its cost estimation if it can offer
alternative way to scan referenced relation. (in case of no idea,
it does not need to add any paths)A new DDL syntax I'd like to propose is below:
CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
<name> is as literal, put a unique identifier.
<class> is workload type to be offered by this custom-plan provider.
"scan" is the only option right now, that means base relation scan.
<function_name> is also as literal; it shall perform custom-planprovider.
A custom-plan provider function is assumed to take an argument of
"internal" type to deliver a set of planner information that is
needed to construct custom-plan pathnode.
In case of "scan" class, pointer towards an customScanArg object
shall be delivered on invocation of custom-plan provider.typedef struct {
uint32 custom_class;
PlannerInfo *root;
RelOptInfo *baserel;
RangeTblEntry *rte;
} customScanArg;In case when the custom-plan provider function being invoked
thought it can offer an alternative scan path on the relation of
"baserel", things to do is (1) construct a CustomPath (or its
inherited data
type) object with a table of callback function pointers (2) put
its own cost estimation, and (3) call add_path() to register this
path asan alternative one.
Once the custom-path was chosen by query planner, its
CreateCustomPlan callback is called to populate CustomPlan node
basedon the pathnode.
It also has a table of callback function pointers to handle
various planner's job in setrefs.c and so on.Similarly, its CreateCustomPlanState callback is called to
populate CustomPlanState node based on the plannode. It also has a
table of callback function pointers to handle various executor's
job during quey execution.Most of callback designs are not changed from the prior
proposition in
v9.4 development cycle, however, here is a few changes.* CustomPlan became to inherit Scan, and CustomPlanState became to
inherit ScanState. Because some useful routines to implement scan-
logic, like ExecScan, expects state-node has ScanState as its base
type, it's more kindness for extension side. (I'd like to avoideach
extension reinvent ExecScan by copy & paste!)
I'm not sure whether it should be a union of Join in the future,however,
it is a reasonable choice to have compatible layout with
Scan/ScanState
to implement alternative "scan" logic.
* Exporting static functions - I still don't have a graceful
answerhere.
However, it is quite natural that extensions to follow up
interfaceupdates
on the future version up of PostgreSQL.
Probably, it shall become clear what class of functions shall be
exported and what class of functions shall be re-implemented within
extension side in the later discussion.
Right now, I exported minimum ones that are needed to implement
alternative scan method - contrib/ctidscan module.Items to be discussed later:
* planner integration for relations join - probably, we may definenew
custom-plan classes as alternative of hash-join, merge-join and
nest-loop. If core can know this custom-plan is alternative of hash-
join, we can utilize core code to check legality of join.
* generic key-value style options in custom-plan definition - Hanada
san proposed me off-list - like foreign data wrapper. It may enable
to configure multiple behavior on a binary.
* ownership and access control of custom-plan. right now, only
superuser can create/drop custom-plan provider definition, thus,
it has no explicit ownership and access control. It seems to me
a reasonable assumption, however, we may have a usecase that
needs custom-plan by unprivileged users.Thanks,
2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're past the prototyping stage and into productionising
what we know works, AFAIK. If that point is not clear, then
we need to discuss that first.OK, I'll bite: what here do we know works? Not a damn thing
AFAICS; it's all speculation that certain hooks might be
useful, and speculation that's not supported by a lot of
evidence. If you think this isn't prototyping, I wonder what
you think *is*prototyping.
My research contacts advise me of this recent work
http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October,
which I have been told will be open source.So there are at least two groups looking at this as a serious
option for Postgres (not including the above paper's authors).That isn't *now*, but it is at least a time scale that fits with
acting on this in the next release, if we can separate out the
various ideas and agree we wish to proceed.I'll submerge again...
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go 2.
Custom-path that can replace built-in scan makes sense as a firststep
towards the future enhancement. Its planner integration is
enoughsimple
to do.
3. Custom-path that can replace built-in join takes investigation
howto
integrate existing planner structure, to avoid (3a)
reinventionof
whole of join handling in extension side, and (3b) unnecessary
extension
calls towards the case obviously unsupported.
So, I'd like to start the (2) portion towards the upcoming 1st
commit-fest on the v9.5 development cycle. Also, we will be able
to have discussion for the (3) portion concurrently, probably,
towards2nd commit-fest.
Unfortunately, I cannot participate PGcon/Ottawa this year.
Please share us the face-to-face discussion here.Thanks,
--
NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
<kaigai@ak.jp.nec.com>--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Shigeru HANADA--
KaiGai Kohei <kaigai@kaigai.gr.jp>--
Shigeru HANADA
Attachments:
pgsql-v9.5-custom-plan.v3.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v3.patchDownload+3374-36
Import Notes
Resolved by subject fallback
Kaigai-san,
The v3 patch had conflict in src/backend/tcop/utility.c for newly
added IMPORT FOREIGN SCHEMA statement, but it was trivial.
2014-07-08 20:55 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
* System catalog name was changed to pg_custom_plan_provider;
that reflects role of the object being defined.
ISTM that doc/src/sgml/custom-plan.sgml should be also renamed to
custom-plan-provider.sgml.
* Also, prefix of its variable names are changed to "cpp"; that
means custom-plan-provider.
A "custclass" remains in a comment in
src/include/catalog/pg_custom_plan_provider.h.
* Syntax also reflects what the command does more. New syntax to
define custom plan provider is:
CREATE CUSTOM PLAN PROVIDER <cpp_name>
FOR <cpp_class> HANDLER <cpp_function>;
* Add custom-plan.sgml to introduce interface functions defined
for path/plan/exec methods.
* FinalizeCustomPlan() callback was simplified to support scan
(and join in the future) at the starting point. As long as
scan/join requirement, no need to control paramids bitmap in
arbitrary way.
* Unnecessary forward declaration in relation.h and plannode.h
were removed, but a few structures still needs to have
forward declarations.
* Fix typos being pointed out.
Check. I found some typos and a wording "datatype" which is not used
in any other place. Please refer the attached patch.
--
Shigeru HANADA
Attachments:
fix_typo_in_v3.patchapplication/octet-stream; name=fix_typo_in_v3.patchDownload+10-10
Hanada-san,
Thanks for your checking. The attached v4 patch is rebased one on the
latest master branch. Indeed, merge conflict was trivial.
Updates from the v3 are below:
- custom-plan.sgml was renamed to custom-plan-provider.sgml
- fix up the comments in pg_custom_plan_provider.h that mentioned
about old field name.
- applied your patch to fix up typos. (thanks so much!)
- put "costs off" on the EXPLAIN command in the regression test of
ctidscan extension.
Nothing to comment on the design and implementation from your
viewpoint any more?
Thanks,
2014-07-14 19:07 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
Kaigai-san,
The v3 patch had conflict in src/backend/tcop/utility.c for newly
added IMPORT FOREIGN SCHEMA statement, but it was trivial.2014-07-08 20:55 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
* System catalog name was changed to pg_custom_plan_provider;
that reflects role of the object being defined.ISTM that doc/src/sgml/custom-plan.sgml should be also renamed to
custom-plan-provider.sgml.* Also, prefix of its variable names are changed to "cpp"; that
means custom-plan-provider.A "custclass" remains in a comment in
src/include/catalog/pg_custom_plan_provider.h.* Syntax also reflects what the command does more. New syntax to
define custom plan provider is:
CREATE CUSTOM PLAN PROVIDER <cpp_name>
FOR <cpp_class> HANDLER <cpp_function>;
* Add custom-plan.sgml to introduce interface functions defined
for path/plan/exec methods.
* FinalizeCustomPlan() callback was simplified to support scan
(and join in the future) at the starting point. As long as
scan/join requirement, no need to control paramids bitmap in
arbitrary way.
* Unnecessary forward declaration in relation.h and plannode.h
were removed, but a few structures still needs to have
forward declarations.
* Fix typos being pointed out.Check. I found some typos and a wording "datatype" which is not used
in any other place. Please refer the attached patch.--
Shigeru HANADA
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.5-custom-plan.v4.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v4.patchDownload+3374-36
Kaigai-san,
2014-07-14 22:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hanada-san,
Thanks for your checking. The attached v4 patch is rebased one on the
latest master branch. Indeed, merge conflict was trivial.Updates from the v3 are below:
- custom-plan.sgml was renamed to custom-plan-provider.sgml
- fix up the comments in pg_custom_plan_provider.h that mentioned
about old field name.
- applied your patch to fix up typos. (thanks so much!)
- put "costs off" on the EXPLAIN command in the regression test of
ctidscan extension.
Checked, but the patch fails sanity-check test, you need to modify
expected file of the test.
Nothing to comment on the design and implementation from your
viewpoint any more?
As much as I can tell, the design seems reasonable. After fix for the
small issue above, I'll move the patch status to "Ready for
committer".
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-07-14 22:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
Hanada-san,
Thanks for your checking. The attached v4 patch is rebased one on the
latest master branch. Indeed, merge conflict was trivial.Updates from the v3 are below:
- custom-plan.sgml was renamed to custom-plan-provider.sgml
- fix up the comments in pg_custom_plan_provider.h that mentioned
about old field name.
- applied your patch to fix up typos. (thanks so much!)
- put "costs off" on the EXPLAIN command in the regression test of
ctidscan extension.Checked, but the patch fails sanity-check test, you need to modify expected
file of the test.
Sorry, expected result of sanity-check test was not updated on
renaming to pg_custom_plan_provider.
The attached patch fixed up this point.
Nothing to comment on the design and implementation from your
viewpoint any more?As much as I can tell, the design seems reasonable. After fix for the small
issue above, I'll move the patch status to "Ready for committer".
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
pgsql-v9.5-custom-plan.v5.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v5.patchDownload+3374-36
Kaigai-san,
2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Sorry, expected result of sanity-check test was not updated on
renaming to pg_custom_plan_provider.
The attached patch fixed up this point.
I confirmed that all regression tests passed, so I marked the patch as
"Ready for committer".
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before. It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-07-16 10:43:08 +0900, Shigeru Hanada wrote:
Kaigai-san,
2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Sorry, expected result of sanity-check test was not updated on
renaming to pg_custom_plan_provider.
The attached patch fixed up this point.I confirmed that all regression tests passed, so I marked the patch as
"Ready for committer".
I personally don't see how this patch is 'ready for committer'. I
realize that that state is sometimes used to denote that review needs to
be "escalated", but it still seemspremature.
Unless I miss something there hasn't been any API level review of this?
Also, aren't there several open items?
Perhaps there needs to be a stage between 'needs review' and 'ready for
committer'?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before. It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?
Yeah, I thought that was pretty bogus too, but it's well down the
list of issues that were there last time I looked at this ...
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I personally don't see how this patch is 'ready for committer'. I realize
that that state is sometimes used to denote that review needs to be
"escalated", but it still seemspremature.Unless I miss something there hasn't been any API level review of this?
Also, aren't there several open items?
Even though some interface specifications are revised according to the
comment from Tom on the last development cycle, the current set of
interfaces are not reviewed by committers. I really want this.
Here are two open items that we want to wait for committers comments.
* Whether set_cheapest() is called for all relkind?
This pactch moved set_cheapest() to the end of set_rel_pathlist(),
to consolidate entrypoint of custom-plan-provider handler function.
It also implies CPP can provider alternative paths towards non-regular
relations (like sub-queries, functions, ...).
Hanada-san wonder whether we really have a case to run alternative
sub-query code. Even though I don't have usecases for alternative
sub-query execution logic, but we also don't have a reason why not
to restrict it.
* How argument of add_path handler shall be derivered?
The handler function (that adds custom-path to the required relation
scan if it can provide) is declared with an argument with INTERNAL
data type. Extension needs to have type-cast on the supplied pointer
to customScanArg data-type (or potentially customHashJoinArg and
so on...) according to the custom plan class.
I think it is well extendable design than strict argument definitions,
but Hanada-san wonder whether it is the best design.
Perhaps there needs to be a stage between 'needs review' and 'ready for
committer'?
It needs clarification of 'ready for committer'. I think interface
specification is a kind of task to be discussed with committers
because preference/viewpoint of rr-reviewer are not always same
opinion with them.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]
Sent: Friday, July 18, 2014 3:12 AM
To: Shigeru Hanada
Cc: Kaigai Kouhei(海外 浩平); Kohei KaiGai; Simon Riggs; Tom Lane; Stephen
Frost; Robert Haas; PgHacker; Jim Mlodgenski; Peter Eisentraut
Subject: Re: [HACKERS] [v9.5] Custom Plan APIOn 2014-07-16 10:43:08 +0900, Shigeru Hanada wrote:
Kaigai-san,
2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Sorry, expected result of sanity-check test was not updated on
renaming to pg_custom_plan_provider.
The attached patch fixed up this point.I confirmed that all regression tests passed, so I marked the patch as
"Ready for committer".I personally don't see how this patch is 'ready for committer'. I realize
that that state is sometimes used to denote that review needs to be
"escalated", but it still seemspremature.Unless I miss something there hasn't been any API level review of this?
Also, aren't there several open items?Perhaps there needs to be a stage between 'needs review' and 'ready for
committer'?Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before. It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?Yeah, I thought that was pretty bogus too, but it's well down the list of
issues that were there last time I looked at this ...
IIRC, CustomPlanMarkPos was suggested to keep the interface of
ExecSupportsMarkRestore() that takes plannode tag to determine
whether it support Mark/Restore.
As my original proposition did, it seems to me a flag field in
CustomPlan structure is straightforward, if we don't hesitate to
change ExecSupportsMarkRestore().
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-07-18 10:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before. It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?Yeah, I thought that was pretty bogus too, but it's well down the list of
issues that were there last time I looked at this ...IIRC, CustomPlanMarkPos was suggested to keep the interface of
ExecSupportsMarkRestore() that takes plannode tag to determine
whether it support Mark/Restore.
As my original proposition did, it seems to me a flag field in
CustomPlan structure is straightforward, if we don't hesitate to
change ExecSupportsMarkRestore().
The attached patch revised the above point.
It eliminates CustomPlanMarkPos, and adds flags field on CustomXXX
structure to inform the backend whether the custom plan provider can
support mark-restore position and backward scan.
This change requires ExecSupportsMarkRestore() to reference
contents of Path node, not only node-tag, so its declaration was also
changed to take a pointer to Path node.
The only caller of this function is final_cost_mergejoin() right now.
It just gives pathtype field of Path node on its invocation, so this change
does not lead significant degradation.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.5-custom-plan.v6.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v6.patchDownload+3360-40
The attached patch is the rebased custom-plan api, without any
functional changes
from the latest version; that added a flag field to custom-plan node
to show whether
it support mark/restore or backward-scan.
Towards the upcoming commit-fest, let me summarize the brief overview
of this patch.
The purpose of custom-plan interface, implemented with this patch, is to allow
extensions to provide alternative way to scan (and potentially joining
and so on)
relation, in addition to the built-in logic.
If one or more extensions are installed as custom-plan provider, it
can tell the planner
alternative way to scan a relation using CustomPath node with cost estimation.
As usual manner, the planner will chose a particular path based on the cost.
If custom one would not be chosen, it's gone and nothing different.
Once a custom-plan gets chosen, the custom-plan provider that performs on behalf
of the custom-plan node shall be invoked during query execution. It is
responsible to
scan the relation by its own way.
One expected usage of this interface is GPU acceleration that I'm working also.
The custom-plan provider shall be invoked via the function being
installed as custom-
plan provider, with an argument that packs all the necessary
information to construct
a custom-path node. In case of relation scan, customScanArg that contains
PlannerInfo, RelOptInfo and RangeTblEntry shall be informed.
The function is registered using a new command:
CREATE CUSTOM PLAN PROVIDER <name>
FOR SCAN HANDLER <handler_funtion>;
According to the discussion before, CustomXXX node is designed to have private
fields of extension like a manner of object oriented language.
CustomXXX node has a few common and minimum required fields, but no private
pointer. Extension declares its own Path/Plan/PlanState structure that inherits
CustomXXX node on the head of structure declaration, but not all. It can have
private fields on the later half of the structure.
The contrib/ctidscan is a good example to see how extension can utilize the
interface.
Once a CustomPlan/PlanState node is constructed, the rest of processes are
what other executor-nodes are doing. It shall be invoked at beginning, ending
and running of the executor, then callback function in the table of function
pointers shall be called.
Thanks,
2014-07-23 10:47 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
2014-07-18 10:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before. It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?Yeah, I thought that was pretty bogus too, but it's well down the list of
issues that were there last time I looked at this ...IIRC, CustomPlanMarkPos was suggested to keep the interface of
ExecSupportsMarkRestore() that takes plannode tag to determine
whether it support Mark/Restore.
As my original proposition did, it seems to me a flag field in
CustomPlan structure is straightforward, if we don't hesitate to
change ExecSupportsMarkRestore().The attached patch revised the above point.
It eliminates CustomPlanMarkPos, and adds flags field on CustomXXX
structure to inform the backend whether the custom plan provider can
support mark-restore position and backward scan.
This change requires ExecSupportsMarkRestore() to reference
contents of Path node, not only node-tag, so its declaration was also
changed to take a pointer to Path node.
The only caller of this function is final_cost_mergejoin() right now.
It just gives pathtype field of Path node on its invocation, so this change
does not lead significant degradation.Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
pgsql-v9.5-custom-plan.v7.patchapplication/octet-stream; name=pgsql-v9.5-custom-plan.v7.patchDownload+3374-36
On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before. It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?Yeah, I thought that was pretty bogus too, but it's well down the
list of issues that were there last time I looked at this ...
I think the threshold question for this incarnation of the patch is
whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
a way of installing new plan providers into the database. If we are,
then I can go ahead and enumerate a long list of things that will need
to be fixed to make that code acceptable (such as adding pg_dump
support). But if we're not, there's no point in spending any time on
that part of the patch.
I can see a couple of good reasons to think that this approach might
be reasonable:
- In some ways, a custom plan provider (really, at this point, a
custom scan provider) is very similar to a foreign data wrapper. To
the guts of PostgreSQL, an FDW is a sort of black box that knows how
to scan some data not managed by PostgreSQL. A custom plan provider
is similar, except that it scans data that *is* managed by PostgreSQL.
- There's also some passing resemblance between a custom plan provider
and an access method. Access methods provide a set of APIs for fast
access to data via an index, while custom plan providers provide an
API for fast access to data via magic that the core system doesn't
(and need not) understand. While access methods don't have associated
SQL syntax, they do include catalog structure, so perhaps this should
too, by analogy.
All that having been said, I'm having a hard time mustering up
enthusiasm for this way of doing things. As currently constituted,
the pg_custom_plan_provider catalog contains only a name, a "class"
that is always 's' for scan, and a handler function OID. Quite
frankly, that's a whole lot of nothing. If we got rid of the
pg_catalog structure and just had something like
RegisterCustomPlanProvider(char *name, void (*)(customScanArg *),
which could be invoked from _PG_init(), hundreds and hundreds of lines
of code could go away and we wouldn't lose any actual functionality;
you'd just list your custom plan providers in shared_preload_libraries
or local_preload_libraries instead of listing them in a system
catalog. In fact, you might even have more functionality, because you
could load providers into particular sessions rather than system-wide,
which isn't possible with this design.
I think the underlying issue here really has to do with when custom
plan providers get invoked - what triggers that? For foreign data
wrappers, we have some relations that are plain tables (relkind = 'r')
and no foreign data wrapper code is invoked. We have others that are
flagged as foreign tables (relkind = 'f') and for those we look up the
matching FDW (via ftserver) and run the code. Similarly, for an index
AM, we notice that the relation is an index (relkind = 'r') and then
consult relam to figure out which index AM we should invoke. But as
KaiGai is conceiving this feature, it's quite different. Rather than
applying only to particular relations, and being mutually exclusive
with other options that might apply to those relations, it applies to
*everything* in the database in addition to whatever other options may
be present. The included ctidscan implementation is just as good an
example as PG-Strom: you inspect the query and see, based on the
operators present, whether there's any hope of accelerating things.
In other words, there's no user configuration - and also, not
irrelevantly, no persistent on-disk state the way you have for an
index, or even an FDW, which has on disk state to the extent that
there have to be catalog entries tying a particular FDW to a
particular table.
A lot of the previous discussion of this topic revolves around the
question of whether we can unify the use case that this patch is
targeting with other things - e.g. Citus's desire to store its data
files within the data directory while retaining control over data
access, thus not a perfect fit for FDWs; the desire to push joins down
to foreign servers; more generally, the desire to replace a join with
a custom plan that may or may not use access paths for the underlying
relations as subpaths. I confess I'm not seeing a whole lot of
commonality with anything other than the custom-join-path idea, which
probably shares many of what I believe to be the relevant
characteristics of a custom scan as conceived by KaiGai: namely, that
all of the decisions about whether to inject a custom path in
particular circumstances are left up to the provider itself based on
inspection of the specific query, rather than being a result of user
configuration.
So, I'm tentatively in favor of stripping the DDL support out of this
patch and otherwise trying to boil it down to something that's really
minimal, but I'd like to hear what others think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
I think the threshold question for this incarnation of the patch is
whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
a way of installing new plan providers into the database.
I tend to agree with your conclusion that that's a whole lot of
infrastructure with very little return. I don't see anything here
we shouldn't do via function hooks instead, and/or a "register" callback
from a dynamically loaded library.
Also, we tend to think (for good reason) that once something is embedded
at the SQL level it's frozen; we are much more willing to redesign C-level
APIs. There is no possible way that it's a good idea for this stuff to
get frozen in its first iteration.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers