add_partial_path() may remove dominated path but still in use
Hello,
I've investigated a crash report of PG-Strom for a few days, then I doubt
add_partial_path() can unexpectedly release dominated old partial path
but still referenced by other Gather node, and it leads unexpected system
crash.
Please check at the gpuscan.c:373
https://github.com/heterodb/pg-strom/blob/master/src/gpuscan.c#L373
The create_gpuscan_path() constructs a partial custom-path, then it is
added to the partial_pathlist of the baserel.
If both of old and new partial paths have no pathkeys and old-path has
larger cost, add_partial_path detaches the old path from the list, then
calls pfree() to release old_path itself.
On the other hands, the baserel may have GatherPath which references
the partial-path on its pathlist. Here is no check whether the old partial-
paths are referenced by others, or not.
To ensure my assumption, I tried to inject elog() before/after the call of
add_partial_path() and just before the pfree(old_path) in add_partial_path().
----------------------------------------------------------------
dbt3=# explain select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
and l_discount between 0.08 - 0.01 and 0.08 + 0.01
and l_quantity < 24 limit 1;
INFO: GpuScan:389 GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{PATH :pathtype 18 :parent_relids (b 1) :required_outer (b)
:parallel_aware true :parallel_safe true :parallel_workers 2 :rows
63254 :startup_cost 0.00 :total_cost 325579.73 :pathkeys <>}
:single_copy false :num_workers 2}
INFO: add_partial_path:830 old_path(0x28f3f88) is removed
WARNING: could not dump unrecognized node type: 2139062143
INFO: GpuScan:401 GATHER(0x28f3c30), SUBPATH(0x28f3f88): {GATHERPATH
:pathtype 44 :parent_relids (b 1) :required_outer (b) :parallel_aware
false :parallel_safe false :parallel_workers 0 :rows 151810
:startup_cost 1000.00 :total_cost 341760.73 :pathkeys <> :subpath
{(HOGE)} :single_copy false :num_workers 2}
----------------------------------------------------------------
At the L389, GatherPath in the baresel->pathlist is healthy. Its
subpath (0x28f3f88) is
a valid T_Scan path node.
Then, gpuscan.c adds a cheaper path-node so add_partial_path()
considers the above
subpath (0x28f3f88) is dominated by the new custom-path, and release it.
So, elog() at L401 says subpath has unrecognized node type: 2139062143
== 0x7f7f7f7f
that implies the memory region was already released by pfree().
Reference counter or other mechanism to tack referenced paths may be an idea
to avoid unintentional release of path-node.
On the other hands, it seems to me the pfree() at add_path /
add_partial_path is not
a serious memory management because other objects referenced by the path-node
are not released here.
It is sufficient if we detach dominated path-node from the pathlist /
partial_pathlist.
How about your opinions?
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
Kohei KaiGai <kaigai@heterodb.com> writes:
I've investigated a crash report of PG-Strom for a few days, then I doubt
add_partial_path() can unexpectedly release dominated old partial path
but still referenced by other Gather node, and it leads unexpected system
crash.
Hm. This seems comparable to the special case in plain add_path, where it
doesn't attempt to free IndexPaths because of the risk that they're still
referenced. So maybe we should just drop the pfree here.
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.
regards, tom lane
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
I've investigated a crash report of PG-Strom for a few days, then I doubt
add_partial_path() can unexpectedly release dominated old partial path
but still referenced by other Gather node, and it leads unexpected system
crash.Hm. This seems comparable to the special case in plain add_path, where it
doesn't attempt to free IndexPaths because of the risk that they're still
referenced. So maybe we should just drop the pfree here.However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.
Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths(). Even if extension adds some partial paths later,
the first generate_gather_paths() has to generate Gather node based on
incomplete information.
If we could ensure Gather node shall be made after all the partial nodes
are added, it may be a solution for the problem.
Of course, relocation of the hook may have a side-effect. Anyone may
expect the pathlist contains all the path-node including Gather node, for
editorialization of the pathlist.
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.
Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().
Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.
regards, tom lane
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.
On the other hands, the later hook must be dedicated to add regular paths,
and also provides a chance for extensions to manipulate pre-built path-list
including Gather-path.
As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
a particular path-node, including Gather-node, by manipulation of the cost
value. Horiguchi-san, is it right?
Likely, this kind of extension needs to use the later hook.
I expect these hooks are located as follows:
set_rel_pathlist(...)
{
:
<snip>
:
/* for partial / regular paths */
if (set_rel_pathlist_hook)
(*set_rel_pathlist_hook) (root, rel, rti, rte);
/* generate Gather-node */
if (rel->reloptkind == RELOPT_BASEREL)
generate_gather_paths(root, rel);
/* for regular paths and manipulation */
if (post_rel_pathlist_hook)
(*post_rel_pathlist_hook) (root, rel, rti, rte);
set_cheapest();
}
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.
+1. This idea sounds sensible to me.
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.
Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use. What
advantage you are seeing in allowing both partial and full paths in
the first hook?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.+1. This idea sounds sensible to me.
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use. What
advantage you are seeing in allowing both partial and full paths in
the first hook?
Two advantages. The first one is, it follows same manner of
set_foreign_pathlist()
which allows to add both of full and partial path if FDW supports parallel-scan.
The second one is practical. During the path construction, extension needs to
check availability to run (e.g, whether operators in WHERE-clause is supported
on GPU device...), calculate its estimated cost and so on. Not a small
portion of
them are common for both of full and partial path. So, if the first
hook accepts to
add both kind of paths at once, extension can share the common properties.
Probably, the second hook is only used for a few corner case where an extension
wants to manipulate path-list already built, like pg_hint_plan.
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.+1. This idea sounds sensible to me.
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use. What
advantage you are seeing in allowing both partial and full paths in
the first hook?Two advantages. The first one is, it follows same manner of
set_foreign_pathlist()
which allows to add both of full and partial path if FDW supports parallel-scan.
The second one is practical. During the path construction, extension needs to
check availability to run (e.g, whether operators in WHERE-clause is supported
on GPU device...), calculate its estimated cost and so on. Not a small
portion of
them are common for both of full and partial path. So, if the first
hook accepts to
add both kind of paths at once, extension can share the common properties.
You have a point, though I am not sure how much difference it can
create for cost computation as ideally, both will have different
costing model. I understand there are some savings by avoiding some
common work, is there any way to cache the required information?
Probably, the second hook is only used for a few corner case where an extension
wants to manipulate path-list already built, like pg_hint_plan.
Okay, but it could be some work for extension authors who are using
the current hook, not sure they would like to divide the work between
first and second hook.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
2018年12月31日(月) 22:25 Amit Kapila <amit.kapila16@gmail.com>:
On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.+1. This idea sounds sensible to me.
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use. What
advantage you are seeing in allowing both partial and full paths in
the first hook?Two advantages. The first one is, it follows same manner of
set_foreign_pathlist()
which allows to add both of full and partial path if FDW supports parallel-scan.
The second one is practical. During the path construction, extension needs to
check availability to run (e.g, whether operators in WHERE-clause is supported
on GPU device...), calculate its estimated cost and so on. Not a small
portion of
them are common for both of full and partial path. So, if the first
hook accepts to
add both kind of paths at once, extension can share the common properties.You have a point, though I am not sure how much difference it can
create for cost computation as ideally, both will have different
costing model. I understand there are some savings by avoiding some
common work, is there any way to cache the required information?
I have no idea for the clean way.
We may be able to have an opaque pointer for extension usage, however,
it may be problematic if multiple extension uses the hook.
Probably, the second hook is only used for a few corner case where an extension
wants to manipulate path-list already built, like pg_hint_plan.Okay, but it could be some work for extension authors who are using
the current hook, not sure they would like to divide the work between
first and second hook.
I guess they don't divide their code, but choose either of them.
In case of PG-Strom, even if there are two hooks around the point, it will use
the first hook only, unless it does not prohibit to call add_path() here.
However, some adjustments are required. Its current implementation makes
GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
is called after the generate_gather_paths().
Once we could choose the first hook, no need to make a GatherPath by itself,
because PostgreSQL-core will make the path if partial custom-path is enough
reasonable cost. Likely, this adjustment is more preferable one.
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
I tried to make a patch to have dual hooks at set_rel_pathlist(), and
adjusted PG-Strom for the new design. It stopped to create GatherPath
by itself, just added a partial path for the base relation.
It successfully made a plan using parallel custom-scan node, without
system crash.
As I mentioned above, it does not use the new "post_rel_pathlist_hook"
because we can add both of partial/regular path-node at the first hook
with no particular problems.
Thanks,
dbt3=# explain select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < cast(date '1994-01-01' + interval '1 year' as date)
and l_discount between 0.08 - 0.01 and 0.08 + 0.01
and l_quantity < 24 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Limit (cost=144332.62..144332.63 rows=1 width=4)
-> Aggregate (cost=144332.62..144332.63 rows=1 width=4)
-> Gather (cost=144285.70..144329.56 rows=408 width=4)
Workers Planned: 2
-> Parallel Custom Scan (GpuPreAgg) on lineitem
(cost=143285.70..143288.76 rows=204 width=4)
Reduction: NoGroup
Outer Scan: lineitem (cost=1666.67..143246.16
rows=63254 width=8)
Outer Scan Filter: ((l_discount >= '0.07'::double
precision) AND
(l_discount <=
'0.09'::double precision) AND
(l_quantity <
'24'::double precision) AND
(l_shipdate <
'1995-01-01'::date) AND
(l_shipdate >=
'1994-01-01'::date))
(8 rows)
Thanks,
2019年1月2日(水) 22:34 Kohei KaiGai <kaigai@heterodb.com>:
2018年12月31日(月) 22:25 Amit Kapila <amit.kapila16@gmail.com>:
On Mon, Dec 31, 2018 at 5:48 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月31日(月) 13:10 Amit Kapila <amit.kapila16@gmail.com>:
On Sun, Dec 30, 2018 at 9:01 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
Kohei KaiGai <kaigai@heterodb.com> writes:
2018年12月29日(土) 1:44 Tom Lane <tgl@sss.pgh.pa.us>:
However, first I'd like to know why this situation is arising in the first
place. To have the situation you're describing, we'd have to have
attempted to make some Gather paths before we have all the partial paths
for the relation they're for. Why is that a good thing to do? It seems
like such Gathers are necessarily being made with incomplete information,
and we'd be better off to fix things so that none are made till later.Because of the hook location, Gather-node shall be constructed with built-in
and foreign partial scan node first, then extension gets a chance to add its
custom paths (partial and full).
At the set_rel_pathlist(), set_rel_pathlist_hook() is invoked next to the
generate_gather_paths().Hmm. I'm inclined to think that we should have a separate hook
in which extensions are allowed to add partial paths, and that
set_rel_pathlist_hook should only be allowed to add regular paths.+1. This idea sounds sensible to me.
I have almost same opinion, but the first hook does not need to be
dedicated for partial paths. As like set_foreign_pathlist() doing, we can
add both of partial and regular paths here, then generate_gather_paths()
may generate a Gather-path on top of the best partial-path.Won't it be confusing for users if we allow both partial and full
paths in first hook and only full paths in the second hook?
Basically, in many cases, the second hook won't be of much use. What
advantage you are seeing in allowing both partial and full paths in
the first hook?Two advantages. The first one is, it follows same manner of
set_foreign_pathlist()
which allows to add both of full and partial path if FDW supports parallel-scan.
The second one is practical. During the path construction, extension needs to
check availability to run (e.g, whether operators in WHERE-clause is supported
on GPU device...), calculate its estimated cost and so on. Not a small
portion of
them are common for both of full and partial path. So, if the first
hook accepts to
add both kind of paths at once, extension can share the common properties.You have a point, though I am not sure how much difference it can
create for cost computation as ideally, both will have different
costing model. I understand there are some savings by avoiding some
common work, is there any way to cache the required information?I have no idea for the clean way.
We may be able to have an opaque pointer for extension usage, however,
it may be problematic if multiple extension uses the hook.Probably, the second hook is only used for a few corner case where an extension
wants to manipulate path-list already built, like pg_hint_plan.Okay, but it could be some work for extension authors who are using
the current hook, not sure they would like to divide the work between
first and second hook.I guess they don't divide their code, but choose either of them.
In case of PG-Strom, even if there are two hooks around the point, it will use
the first hook only, unless it does not prohibit to call add_path() here.
However, some adjustments are required. Its current implementation makes
GatherPath node with partial CustomScanPath because set_rel_pathlist_hook()
is called after the generate_gather_paths().
Once we could choose the first hook, no need to make a GatherPath by itself,
because PostgreSQL-core will make the path if partial custom-path is enough
reasonable cost. Likely, this adjustment is more preferable one.Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
Attachments:
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patchapplication/octet-stream; name=pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patchDownload
src/backend/optimizer/path/allpaths.c | 22 +++++++++++++++++-----
src/include/optimizer/paths.h | 1 +
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d9a2f9b..100274e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -62,6 +62,7 @@ int min_parallel_index_scan_size;
/* Hook for plugins to get control in set_rel_pathlist() */
set_rel_pathlist_hook_type set_rel_pathlist_hook = NULL;
+set_rel_pathlist_hook_type post_rel_pathlist_hook = NULL;
/* Hook for plugins to replace standard_join_search() */
join_search_hook_type join_search_hook = NULL;
@@ -479,6 +480,14 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}
/*
+ * Allow a plugin to editorialize on the set of Paths for this base
+ * relation. It could add new regular paths (such as CustomPaths) by
+ * calling add_path(), partial paths by add_partial_path().
+ */
+ if (set_rel_pathlist_hook)
+ (*set_rel_pathlist_hook) (root, rel, rti, rte);
+
+ /*
* If this is a baserel, consider gathering any partial paths we may have
* created for it. (If we tried to gather inheritance children, we could
* end up with a very large number of gather nodes, each trying to grab
@@ -489,12 +498,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
generate_gather_paths(root, rel);
/*
- * Allow a plugin to editorialize on the set of Paths for this base
- * relation. It could add new paths (such as CustomPaths) by calling
- * add_path(), or delete or modify paths added by the core code.
+ * Allow a plugin to manipulate all the set of Paths for this base
+ * relation. It could delete or modify paths added by the core or
+ * extensions at the prior hook.
+ * Note that we don't recommend to add partial paths here, because
+ * GatherPath is already built above, and alternative cheaper partial
+ * path may release path-node in use.
*/
- if (set_rel_pathlist_hook)
- (*set_rel_pathlist_hook) (root, rel, rti, rte);
+ if (post_rel_pathlist_hook)
+ (*post_rel_pathlist_hook) (root, rel, rti, rte);
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 1d3fb5c..1950775 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -31,6 +31,7 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
Index rti,
RangeTblEntry *rte);
extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+extern PGDLLIMPORT set_rel_pathlist_hook_type post_rel_pathlist_hook;
/* Hook for plugins to get control in add_paths_to_joinrel() */
typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patchapplication/octet-stream; name=pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patchDownload
src/backend/optimizer/path/allpaths.c | 22 +++++++++++++++++-----
src/include/optimizer/paths.h | 1 +
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5f74d3b..757249b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -64,6 +64,7 @@ int min_parallel_index_scan_size;
/* Hook for plugins to get control in set_rel_pathlist() */
set_rel_pathlist_hook_type set_rel_pathlist_hook = NULL;
+set_rel_pathlist_hook_type post_rel_pathlist_hook = NULL;
/* Hook for plugins to replace standard_join_search() */
join_search_hook_type join_search_hook = NULL;
@@ -480,6 +481,14 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}
/*
+ * Allow a plugin to editorialize on the set of Paths for this base
+ * relation. It could add new regular paths (such as CustomPaths) by
+ * calling add_path(), partial paths by add_partial_path().
+ */
+ if (set_rel_pathlist_hook)
+ (*set_rel_pathlist_hook) (root, rel, rti, rte);
+
+ /*
* If this is a baserel, we should normally consider gathering any partial
* paths we may have created for it.
*
@@ -497,12 +506,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
generate_gather_paths(root, rel, false);
/*
- * Allow a plugin to editorialize on the set of Paths for this base
- * relation. It could add new paths (such as CustomPaths) by calling
- * add_path(), or delete or modify paths added by the core code.
+ * Allow a plugin to manipulate all the set of Paths for this base
+ * relation. It could delete or modify paths added by the core or
+ * extensions at the prior hook.
+ * Note that we don't recommend to add partial paths here, because
+ * GatherPath is already built above, and alternative cheaper partial
+ * path may release path-node in use.
*/
- if (set_rel_pathlist_hook)
- (*set_rel_pathlist_hook) (root, rel, rti, rte);
+ if (post_rel_pathlist_hook)
+ (*post_rel_pathlist_hook) (root, rel, rti, rte);
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..74bd9c3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -31,6 +31,7 @@ typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
Index rti,
RangeTblEntry *rte);
extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
+extern PGDLLIMPORT set_rel_pathlist_hook_type post_rel_pathlist_hook;
/* Hook for plugins to get control in add_paths_to_joinrel() */
typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai <kaigai@heterodb.com> wrote in <CAOP8fzY1Oqf-LGdrZT+TAu+JajwPGn1uYnpWWUPL=2LiattjYA@mail.gmail.com>
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
On the other hands, the later hook must be dedicated to add regular paths,
and also provides a chance for extensions to manipulate pre-built path-list
including Gather-path.
As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
a particular path-node, including Gather-node, by manipulation of the cost
value. Horiguchi-san, is it right?
Mmm. I haven't expected that it is mentioned here.
Actually in the hook, it changes enable_* planner variables, or
directory manipuraltes path costs or even can clear and
regenerate the path list and gather paths for the parallel
case. It will be happy if we had a chance to manpurate partial
paths before genrating gahter paths.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
2019年1月9日(水) 13:18 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
At Sun, 30 Dec 2018 12:31:22 +0900, Kohei KaiGai <kaigai@heterodb.com> wrote in <CAOP8fzY1Oqf-LGdrZT+TAu+JajwPGn1uYnpWWUPL=2LiattjYA@mail.gmail.com>
2018年12月30日(日) 4:12 Tom Lane <tgl@sss.pgh.pa.us>:
On the other hands, the later hook must be dedicated to add regular paths,
and also provides a chance for extensions to manipulate pre-built path-list
including Gather-path.
As long as I know, pg_hint_plan uses the set_rel_pathlist_hook to enforce
a particular path-node, including Gather-node, by manipulation of the cost
value. Horiguchi-san, is it right?Mmm. I haven't expected that it is mentioned here.
Actually in the hook, it changes enable_* planner variables, or
directory manipuraltes path costs or even can clear and
regenerate the path list and gather paths for the parallel
case. It will be happy if we had a chance to manpurate partial
paths before genrating gahter paths.
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)
+1. That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)+1. That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.
Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.
Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
Attachments:
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patchapplication/octet-stream; name=pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patchDownload
src/backend/optimizer/path/allpaths.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d9a2f9b..9a44c3b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,6 +479,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}
/*
+ * Allow a plugin to editorialize on the set of Paths for this base
+ * relation. It could add new paths (such as CustomPaths) by calling
+ * add_path(), or add_partial_path() if parallel aware, or delete or
+ * modify paths added by the core code.
+ */
+ if (set_rel_pathlist_hook)
+ (*set_rel_pathlist_hook) (root, rel, rti, rte);
+
+ /*
* If this is a baserel, consider gathering any partial paths we may have
* created for it. (If we tried to gather inheritance children, we could
* end up with a very large number of gather nodes, each trying to grab
@@ -488,14 +497,6 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (rel->reloptkind == RELOPT_BASEREL)
generate_gather_paths(root, rel);
- /*
- * Allow a plugin to editorialize on the set of Paths for this base
- * relation. It could add new paths (such as CustomPaths) by calling
- * add_path(), or delete or modify paths added by the core code.
- */
- if (set_rel_pathlist_hook)
- (*set_rel_pathlist_hook) (root, rel, rti, rte);
-
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patchapplication/octet-stream; name=pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patchDownload
src/backend/optimizer/path/allpaths.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5f74d3b..6f73dcd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -480,6 +480,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}
/*
+ * Allow a plugin to editorialize on the set of Paths for this base
+ * relation. It could add new paths (such as CustomPaths) by calling
+ * add_path(), or add_partial_path() if parallel aware, or delete or
+ * modify paths added by the core code.
+ */
+ if (set_rel_pathlist_hook)
+ (*set_rel_pathlist_hook) (root, rel, rti, rte);
+
+ /*
* If this is a baserel, we should normally consider gathering any partial
* paths we may have created for it.
*
@@ -496,14 +505,6 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
bms_membership(root->all_baserels) != BMS_SINGLETON)
generate_gather_paths(root, rel, false);
- /*
- * Allow a plugin to editorialize on the set of Paths for this base
- * relation. It could add new paths (such as CustomPaths) by calling
- * add_path(), or delete or modify paths added by the core code.
- */
- if (set_rel_pathlist_hook)
- (*set_rel_pathlist_hook) (root, rel, rti, rte);
-
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)+1. That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.
Seems reasonable to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello, sorry for the absence.
At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoYyxBgkfN_APBdxdutFMukb=P-EgGNY-NbauRcL7mGnmA@mail.gmail.com>
On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)+1. That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.Seems reasonable to me.
Also seems reasonable to me. The extension can call
generate_gather_paths redundantly as is but it almost doesn't
harm, so it is acceptable even in a minor release.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Let me remind the thread.
If no more comments, objections, or better ideas, please commit this fix.
Thanks,
2019年1月17日(木) 18:29 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
Hello, sorry for the absence.
At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoYyxBgkfN_APBdxdutFMukb=P-EgGNY-NbauRcL7mGnmA@mail.gmail.com>
On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)+1. That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.Seems reasonable to me.
Also seems reasonable to me. The extension can call
generate_gather_paths redundantly as is but it almost doesn't
harm, so it is acceptable even in a minor release.regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
Hello,
Let me remind the thread again.
I'm waiting for the fix getting committed for a month...
2019年1月22日(火) 20:50 Kohei KaiGai <kaigai@heterodb.com>:
Let me remind the thread.
If no more comments, objections, or better ideas, please commit this fix.Thanks,
2019年1月17日(木) 18:29 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
Hello, sorry for the absence.
At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoYyxBgkfN_APBdxdutFMukb=P-EgGNY-NbauRcL7mGnmA@mail.gmail.com>
On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
2019年1月11日(金) 5:52 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
So, is it sufficient if set_rel_pathlist_hook is just relocated in
front of the generate_gather_paths?
If we have no use case for the second hook, here is little necessity
to have the post_rel_pathlist_hook() here.
(At least, PG-Strom will use the first hook only.)+1. That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.Seems reasonable to me.
Also seems reasonable to me. The extension can call
generate_gather_paths redundantly as is but it almost doesn't
harm, so it is acceptable even in a minor release.regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>
Attachments:
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patchapplication/octet-stream; name=pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patchDownload
src/backend/optimizer/path/allpaths.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5f74d3b..6f73dcd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -480,6 +480,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}
/*
+ * Allow a plugin to editorialize on the set of Paths for this base
+ * relation. It could add new paths (such as CustomPaths) by calling
+ * add_path(), or add_partial_path() if parallel aware, or delete or
+ * modify paths added by the core code.
+ */
+ if (set_rel_pathlist_hook)
+ (*set_rel_pathlist_hook) (root, rel, rti, rte);
+
+ /*
* If this is a baserel, we should normally consider gathering any partial
* paths we may have created for it.
*
@@ -496,14 +505,6 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
bms_membership(root->all_baserels) != BMS_SINGLETON)
generate_gather_paths(root, rel, false);
- /*
- * Allow a plugin to editorialize on the set of Paths for this base
- * relation. It could add new paths (such as CustomPaths) by calling
- * add_path(), or delete or modify paths added by the core code.
- */
- if (set_rel_pathlist_hook)
- (*set_rel_pathlist_hook) (root, rel, rti, rte);
-
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patchapplication/octet-stream; name=pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patchDownload
src/backend/optimizer/path/allpaths.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d9a2f9b..9a44c3b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,6 +479,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}
/*
+ * Allow a plugin to editorialize on the set of Paths for this base
+ * relation. It could add new paths (such as CustomPaths) by calling
+ * add_path(), or add_partial_path() if parallel aware, or delete or
+ * modify paths added by the core code.
+ */
+ if (set_rel_pathlist_hook)
+ (*set_rel_pathlist_hook) (root, rel, rti, rte);
+
+ /*
* If this is a baserel, consider gathering any partial paths we may have
* created for it. (If we tried to gather inheritance children, we could
* end up with a very large number of gather nodes, each trying to grab
@@ -488,14 +497,6 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (rel->reloptkind == RELOPT_BASEREL)
generate_gather_paths(root, rel);
- /*
- * Allow a plugin to editorialize on the set of Paths for this base
- * relation. It could add new paths (such as CustomPaths) by calling
- * add_path(), or delete or modify paths added by the core code.
- */
- if (set_rel_pathlist_hook)
- (*set_rel_pathlist_hook) (root, rel, rti, rte);
-
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
On Wed, Feb 6, 2019 at 10:35 AM Kohei KaiGai <kaigai@heterodb.com> wrote:
Hello,
Let me remind the thread again.
I'm waiting for the fix getting committed for a month...
It seems you would also like to see this back-patched. I am not sure
if that is a good idea as there is some risk of breaking existing
usage. Tom, do you have any opinion on this patch? It seems to me
you were thinking to have a separate hook for partial paths, but the
patch has solved the problem by moving the hook location. I think
whatever is the case we should try to reach some consensus and move
forward with this patch as KaiGai-San is waiting from quite some time.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes:
It seems you would also like to see this back-patched. I am not sure
if that is a good idea as there is some risk of breaking existing
usage. Tom, do you have any opinion on this patch? It seems to me
you were thinking to have a separate hook for partial paths, but the
patch has solved the problem by moving the hook location.
I was expecting Haas to take point on this, but since he doesn't seem
to be doing so, I'll push it. I don't think there's any material
risk of breaking things --- the only functionality lost is the ability to
remove or modify baserel Gather paths, which I doubt anybody is interested
in doing. Certainly that's way less useful than the ability to add
partial paths and have them be included in Gather-building.
In a green field I'd rather have had a separate hook for adding partial
paths, but it's not clear that that really buys much of anything except
logical cleanliness ... against which it adds cost since the using
extension(s) have to figure out what's going on twice.
Also this way does have the advantage that it retroactively fixes things
for extensions that may be trying to make partial paths today.
regards, tom lane