add_path() for Path without InitPlan: cost comparison vs. Paths that require one
Hi folks,
I’ve run into a planning conundrum with my query rewriting extension for MVs when attempting to rewrite a RECURSIVE CTE.
RECURSIVE CTEs are expensive — and presumably tricky to optimise — and so a good use case for query rewrite against an MV; all the more so if Yugo’s Incremental View Maintenance concept gets traction.
I want to add an alternative Path for the UPPERREL_FINAL of the CTE root, but my new MV scan path (which is actually a thin CustomScan atop a scan of the MV) is rejected in favour of the existing paths.
This seems to be because my Path is more expensive than the Rel’s existing Paths when considered alone. (The CTE’s final scan is actually a scan Path over a worktable, so it really is much lighter.)
However, if I factor back in the cost of the InitPlan, things net out much more in favour of a scan against the MV. Of course, the add_path() comparison logic doesn’t include the InitPlan cost, so the point is moot.
I’m wondering how I should approach this problem. First pass, I can’t see how to achieve an amicable solution with existing infrastructure.
I have a few possible solutions. Do any of the following make sense?
1. Override the add_path() logic to force my Path to win? This was initially my least favourite approach, but perhaps it’s actually the most pragmatic. Advantage is I think I could do this entirely in my EXTENSION.
2. Make a new version of add_path() which is more aware of dependencies.
Seems #2 could have utility in PG generally. If I’m not wrong, my guess is that one of the reasons for the >=2-references-for-materialising-a-CTE;1-for-inlining policy is that we don’t have the planner logic to trade off materialisation versus inlining.
Also, I am wondering if my MV rewrite logic misses cases where the planner decides to materialise an intermediate result as an InitPlan for later processing.
3. I considered creating a new root PlannerInfo structure, and burying the existing one another level down, alongside my MV scan, in a Gather-like arrangement. That coverts the costing conundrum to a choice between roots. Obviously that will include the InitPlan costs. I figured I could eliminate one sub-root much as Path elimination works. But on reflection, I’m not sure PG has enough flexibility in the Path concept to support this route forward.
I’d welcome any view, ideas or advice.
d.
Dent John <denty@qqdd.eu> writes:
However, if I factor back in the cost of the InitPlan, things net out much more in favour of a scan against the MV. Of course, the add_path() comparison logic doesn’t include the InitPlan cost, so the point is moot.
Please explain yourself. InitPlans will, as a rule, get stuck into the
same place in the plan tree regardless of which paths are chosen; that's
why we need not consider them in path cost comparisons. Moreover, once
the initplan's own subplan is determined, it's going to be the same
regardless of the shape of the outer query --- so if we did factor it
in, it'd contribute the same cost to every outer path, and thus it
still wouldn't change any decisions. So I don't follow what you're
on about here.
regards, tom lane
Hi Tom,
On 25 Jul 2019, at 14:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Please explain yourself. InitPlans will, as a rule, get stuck into the
same place in the plan tree regardless of which paths are chosen; that's
why we need not consider them in path cost comparisons.
Ah that’s true. I didn’t realise that at the time I wrote.
But I think my problem is still real...
Moreover, once
the initplan's own subplan is determined, it's going to be the same
regardless of the shape of the outer query ---
Yes that’s true too.
so if we did factor it
in, it'd contribute the same cost to every outer path, and thus it
still wouldn't change any decisions.
I think I’m exposed to the problem because I’m changing how certain queries are fulfilled.
And in the case of a RECURSIVE CTE, the plan ends up being an InitPlan that materializes the CTE, and then a scan of that materialized result.
The problem is that I can fulfil the entire query with a scan against an MV table. Point is it’s an alternative that achieves both the InitPlan (because it’s unnecessary) and the final scan.
But the cost comparison during add_path() is only taking into account the cost of the final scan, which is so cheap that it is preferable even to a simple scan of an MV.
So I don't follow what you're
on about here.
Hmm. Having written the above, I realise I’m not clear on why my extension isn’t offered the opportunity to materialise the work table for the InitPlan.
Sorry. I should have thought about that question first. It might just be an error in my code. I’ll follow up with an answer.
Show quoted text
regards, tom lane