Question about some changes in 11.3

Started by Mat Aryeover 6 years ago6 messages
#1Mat Arye
mat@timescale.com

Hi,

11.3 included some change to partition table planning. Namely
commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
known empty.") seems to redo all paths for partitioned tables
in apply_scanjoin_target_to_paths. It clears the paths in:

```
if (rel_is_partitioned)
rel->pathlist = NIL
```

Then the code rebuild the paths. However, the rebuilt append path never
gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously gets
thrown away and the rebuilt append path can never be influenced by this
hook. Is this intended behavior? Am I missing something?

Thanks,
Mat
TimescaleDB

#2Mat Arye
mat@timescale.com
In reply to: Mat Arye (#1)
1 attachment(s)
Re: Question about some changes in 11.3

On Fri, May 24, 2019 at 5:05 PM Mat Arye <mat@timescale.com> wrote:

Hi,

11.3 included some change to partition table planning. Namely
commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
known empty.") seems to redo all paths for partitioned tables
in apply_scanjoin_target_to_paths. It clears the paths in:

```
if (rel_is_partitioned)
rel->pathlist = NIL
```

Then the code rebuild the paths. However, the rebuilt append path never
gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously gets
thrown away and the rebuilt append path can never be influenced by this
hook. Is this intended behavior? Am I missing something?

Thanks,
Mat
TimescaleDB

I've attached a small patch to address this discrepancy for when the
set_rel_pathlist_hook is called so that's it is called for actual paths
used for partitioned rels. Please let me know if I am misunderstanding how
this should be handled.

Attachments:

call_set_rel_pathlist_hook_on_part_rels.patchapplication/octet-stream; name=call_set_rel_pathlist_hook_on_part_rels.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 56ccde977c..b06cbc22ee 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -414,6 +414,25 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 	Assert(rel->rows > 0 || IS_DUMMY_REL(rel));
 }
 
+/*
+ * call_pathlist_hook
+ *     Call the hook that allows extensions to editorialize pathlists on a base rel.
+ *
+ */
+void
+call_pathlist_hook(PlannerInfo *root, RelOptInfo *rel,
+				 Index rti, RangeTblEntry *rte)
+{
+	/*
+	 * Allow a plugin to editorialize on the set of Paths for this base
+	 * relation.  It could add new paths (such as CustomPaths) by calling
+	 * add_path(), or add_partial_path() if parallel aware.  It could also
+	 * delete or modify paths added by the core code.
+	 */
+	if (set_rel_pathlist_hook)
+		(*set_rel_pathlist_hook) (root, rel, rti, rte);
+}
+
 /*
  * set_rel_pathlist
  *	  Build access paths for a base relation
@@ -479,14 +498,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
-	/*
-	 * Allow a plugin to editorialize on the set of Paths for this base
-	 * relation.  It could add new paths (such as CustomPaths) by calling
-	 * add_path(), or add_partial_path() if parallel aware.  It could also
-	 * delete or modify paths added by the core code.
-	 */
-	if (set_rel_pathlist_hook)
-		(*set_rel_pathlist_hook) (root, rel, rti, rte);
+	call_pathlist_hook(root, rel, rti, rte);
 
 	/*
 	 * If this is a baserel, we should normally consider gathering any partial
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f46914e54..3bed90e20a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7008,6 +7008,8 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 
 		/* Build new paths for this relation by appending child paths. */
 		add_paths_to_append_rel(root, rel, live_children);
+
+		call_pathlist_hook(root, rel, rel->relid, root->simple_rte_array[rel->relid]);
 	}
 
 	/*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde307ad..a21c316dca 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,4 +240,7 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 						List *live_childrels);
 
+extern void call_pathlist_hook(PlannerInfo *root, RelOptInfo *rel,
+				 Index rti, RangeTblEntry *rte);
+
 #endif							/* PATHS_H */
#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Mat Arye (#1)
Re: Question about some changes in 11.3

Hi Mat,

On 2019/05/25 6:05, Mat Arye wrote:

Hi,

11.3 included some change to partition table planning. Namely
commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
known empty.") seems to redo all paths for partitioned tables
in apply_scanjoin_target_to_paths. It clears the paths in:

```
if (rel_is_partitioned)
rel->pathlist = NIL
```

Then the code rebuild the paths. However, the rebuilt append path never
gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously gets
thrown away and the rebuilt append path can never be influenced by this
hook.

By dropping the old paths like done here, the core code is simply
forgetting that set_rel_pathlist_hook may have editorialized over them,
which seems like an oversight of that commit.

Your proposal to call set_rel_pathlist_hook() after
add_paths_to_append_rel() to rebuild the Append paths sounds fine to me.

Thanks,
Amit

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mat Arye (#2)
Re: Question about some changes in 11.3

Mat Arye <mat@timescale.com> writes:

On Fri, May 24, 2019 at 5:05 PM Mat Arye <mat@timescale.com> wrote:

11.3 included some change to partition table planning. Namely
commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
known empty.") seems to redo all paths for partitioned tables
in apply_scanjoin_target_to_paths. It clears the paths in:

```
if (rel_is_partitioned)
rel->pathlist = NIL
```

Then the code rebuild the paths. However, the rebuilt append path never
gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously gets
thrown away and the rebuilt append path can never be influenced by this
hook. Is this intended behavior? Am I missing something?

Hm. I'd say this was already broken by the invention of
apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
still work for you, but it's not hard to envision applications of
set_rel_pathlist_hook for which it would not have worked. The contract
for set_rel_pathlist_hook is supposed to be that it gets to editorialize
on all normal (non-Gather) paths created by the core code, and that's
no longer the case now that apply_scanjoin_target_to_paths can add more.

I've attached a small patch to address this discrepancy for when the
set_rel_pathlist_hook is called so that's it is called for actual paths
used for partitioned rels. Please let me know if I am misunderstanding how
this should be handled.

I'm not very happy with this patch either, as it makes the situation
even more confused, not less so. The best-case scenario is that the
set_rel_pathlist_hook runs twice and does useless work; the worst case
is that it gets confused completely by being called twice for the same
rel. I think we need to maintain the invariant that that hook is
called exactly once per baserel.

I wonder whether we could fix matters by postponing the
set_rel_pathlist_hook call till later in the same cases where
we postpone generate_gather_paths, ie, it's the only baserel.

That would make its name pretty misleading, though. Maybe we
should leave it alone and invent a separate hook to be called
by/after apply_scanjoin_target_to_paths? Although I don't
know if it'd be useful to add a new hook to v11 at this point.
Extensions would have a hard time knowing if they could use it.

regards, tom lane

#5Mat Arye
mat@timescale.com
In reply to: Tom Lane (#4)
Re: Question about some changes in 11.3

Thanks for taking a look at this Tom.

On Mon, Jun 3, 2019 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mat Arye <mat@timescale.com> writes:

On Fri, May 24, 2019 at 5:05 PM Mat Arye <mat@timescale.com> wrote:

11.3 included some change to partition table planning. Namely
commit 925f46f ("Fix handling of targetlist SRFs when scan/join

relation is

known empty.") seems to redo all paths for partitioned tables
in apply_scanjoin_target_to_paths. It clears the paths in:

```
if (rel_is_partitioned)
rel->pathlist = NIL
```

Then the code rebuild the paths. However, the rebuilt append path never
gets the
set_rel_pathlist_hook called. Thus, the work that hook did previously

gets

thrown away and the rebuilt append path can never be influenced by this
hook. Is this intended behavior? Am I missing something?

Hm. I'd say this was already broken by the invention of
apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
still work for you, but it's not hard to envision applications of
set_rel_pathlist_hook for which it would not have worked. The contract
for set_rel_pathlist_hook is supposed to be that it gets to editorialize
on all normal (non-Gather) paths created by the core code, and that's
no longer the case now that apply_scanjoin_target_to_paths can add more.

Yeah it worked for our cases because (I guess) out paths turned out to be
lower cost,
but I see your point.

I've attached a small patch to address this discrepancy for when the
set_rel_pathlist_hook is called so that's it is called for actual paths
used for partitioned rels. Please let me know if I am misunderstanding

how

this should be handled.

I'm not very happy with this patch either, as it makes the situation
even more confused, not less so. The best-case scenario is that the
set_rel_pathlist_hook runs twice and does useless work; the worst case
is that it gets confused completely by being called twice for the same
rel. I think we need to maintain the invariant that that hook is
called exactly once per baserel.

Yeah getting called once per baserel is a nice invariant to have.

I wonder whether we could fix matters by postponing the
set_rel_pathlist_hook call till later in the same cases where
we postpone generate_gather_paths, ie, it's the only baserel.

That would make its name pretty misleading, though.

How would simply delaying the hook make the name misleading? I am also
wondering if
using the condition `rel->reloptkind == RELOPT_BASEREL &&
bms_membership(root->all_baserels) != BMS_SINGLETON` is sufficient.
Is it really guaranteed that `apply_scanjoin_target_to_paths` will not be
called in other cases?

Maybe we
should leave it alone and invent a separate hook to be called
by/after apply_scanjoin_target_to_paths? Although I don't
know if it'd be useful to add a new hook to v11 at this point.
Extensions would have a hard time knowing if they could use it.

I think for us, either approach would work. We just need a place to
add/modify
some paths. FWIW, I think delaying the hook is easier to deal with on our
end if it could work
since we don't have to deal with two different code paths but either is
workable.

Certainly if we go with the new hook approach I think it should be added to
v11 as well.
That way extensions that need the functionality can hook into it and deal
with patch level
differences instead of having no way at all to get at this functionality.

I am more than happy to work on a new patch once we settle on an approach.

Show quoted text

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mat Arye (#5)
Re: Question about some changes in 11.3

Mat Arye <mat@timescale.com> writes:

On Mon, Jun 3, 2019 at 4:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. I'd say this was already broken by the invention of
apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
still work for you, but it's not hard to envision applications of
set_rel_pathlist_hook for which it would not have worked. The contract
for set_rel_pathlist_hook is supposed to be that it gets to editorialize
on all normal (non-Gather) paths created by the core code, and that's
no longer the case now that apply_scanjoin_target_to_paths can add more.
...
I wonder whether we could fix matters by postponing the
set_rel_pathlist_hook call till later in the same cases where
we postpone generate_gather_paths, ie, it's the only baserel.

Is it really guaranteed that `apply_scanjoin_target_to_paths` will not be
called in other cases?

Well, apply_scanjoin_target_to_paths is called in *all* cases. It only
throws away the original paths for partitioned rels, though.

I spent some more time looking at this, and I am afraid that my idea
of postponing set_rel_pathlist_hook into apply_scanjoin_target_to_paths
isn't going to work: there is not anyplace in that function where we
could call the hook without the API being noticeably different from
what it is at the current call site. In particular, if we try to call
it down near the end so that it still has the property of being able
to remove any core-generated path, then there's a *big* difference for
queries involving SRFs: we've already plastered ProjectSetPath(s) atop
the original paths, and any user of the hook would have to know to do
likewise for freshly-generated paths. That would certainly break
existing hook users.

I'm inclined to think that the safest course is to leave
set_rel_pathlist_hook as it stands, and invent a new hook that is called
in apply_scanjoin_target_to_paths just before the generate_gather_paths
call. (Or, perhaps, just after that --- but the precedent of
set_rel_pathlist_hook suggests that "before" is more useful.)
For your use-case you'd have to get into both hooks, and they'd both have
to know that if they're dealing with a partitioned baserel that is the
only baserel in the query, the new hook is where to generate paths
rather than the old hook. Maybe it'd be worth having the core code
export some simple test function for that, rather than having the details
of those semantics be wired into various extensions.

I think it'd be all right to put a patch done that way into the v11
branch. It would not make anything any worse for code that uses
set_rel_pathlist_hook and is OK with the v11 behavior. Code that
needs to use the new hook would fail to load into 11-before-11.whatever,
but that's probably better than loading and then doing the wrong thing.

regards, tom lane