Optimization issue of branching UNION ALL
Hi,
Client report on a corner case have shown up possible minor
non-optimality in procedure of transformation of simple UNION ALL
statement tree.
Complaint is about auto-generated query with 1E4 simple union all's (see
t.sh to generate a demo script). The reason: in REL_11_STABLE it is
planned and executed in a second, but REL_12_STABLE and beyond makes
matters worse: planning of such a query needs tons of gigabytes of RAM.
Superficial study revealed possibly unnecessary operations that could be
avoided:
1. Walking across a query by calling substitute_phv_relids() even if
lastPHId shows that no one phv is presented.
2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?
In attachment you can see some sketch that reduces a number of planner
cycles/copyings.
--
Regards
Andrey Lepikhov
Postgres Professional
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
Complaint is about auto-generated query with 1E4 simple union all's (see
t.sh to generate a demo script). The reason: in REL_11_STABLE it is
planned and executed in a second, but REL_12_STABLE and beyond makes
matters worse: planning of such a query needs tons of gigabytes of RAM.
v11 (and prior versions) sucks just as badly. In this example it
accidentally escapes trouble because it doesn't know how to pull up
a subquery with empty FROM. But if you make the query look like
SELECT 1,1 FROM dual
UNION ALL
SELECT 2,2 FROM dual
UNION ALL
SELECT 3,3 FROM dual
...
then v11 chokes as well. (Seems like we've overlooked the need
for check_stack_depth() and CHECK_FOR_INTERRUPTS() here ...)
Superficial study revealed possibly unnecessary operations that could be
avoided:
1. Walking across a query by calling substitute_phv_relids() even if
lastPHId shows that no one phv is presented.
Yeah, we could do that, and it'd help some.
2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?
After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted. So we could do something like
the attached.
This passes check-world, but I don't have quite enough confidence
in it to just commit it.
regards, tom lane
Attachments:
union_all_optimization-v2.patchtext/x-diff; charset=us-ascii; name=union_all_optimization-v2.patchDownload+42-19
On Thu, Dec 22, 2022 at 9:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
Superficial study revealed possibly unnecessary operations that could be
avoided:
1. Walking across a query by calling substitute_phv_relids() even if
lastPHId shows that no one phv is presented.Yeah, we could do that, and it'd help some.
I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery. I'm not sure why we have
this check. It seems not necessary.
In remove_result_refs, I don't think we need to check 'lastPHId' again
before calling substitute_phv_relids, since it has been checked a few
lines earlier.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery. I'm not sure why we have
this check. It seems not necessary.
Yeah, I was wondering about that too ... maybe it was important
in some previous state of the code? I didn't do any archeology
though.
In remove_result_refs, I don't think we need to check 'lastPHId' again
before calling substitute_phv_relids, since it has been checked a few
lines earlier.
Oh, duh ...
regards, tom lane
I wrote:
Richard Guo <guofenglinux@gmail.com> writes:
I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery. I'm not sure why we have
this check. It seems not necessary.
Yeah, I was wondering about that too ... maybe it was important
in some previous state of the code? I didn't do any archeology
though.
After a bit of "git blame"-ing, it appears that that hasSubLinks
check was introduced in e006a24ad, which added a FlattenedSubLink
node type and needed to fix them up here:
+ * We also have to fix the relid sets of any FlattenedSubLink nodes in
+ * the parent query. (This could perhaps be done by ResolveNew, but it
Then when I got rid of FlattenedSubLink in e549722a8, I neglected
to remove that check. So I think maybe we don't need it, but I've
not tested.
regards, tom lane
On 22/12/2022 06:50, Tom Lane wrote:
2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted. So we could do something like
the attached.This passes check-world, but I don't have quite enough confidence
in it to just commit it.
Thanks, I have written the letter because of some doubts too. But only
one weak point I could imagine - if someday sql standard will be changed.
Your code looks better, than previous attempt.
--
regards,
Andrey Lepikhov
Postgres Professional
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
Thanks, I have written the letter because of some doubts too. But only
one weak point I could imagine - if someday sql standard will be changed.
Yeah, if they ever decide that LATERAL should be allowed to reference a
previous sub-query of UNION ALL, that'd probably break this. But it'd
break a lot of other code too, so I'm not going to worry about it.
I pushed the main fix to HEAD only, and the recursion checks to
all branches.
regards, tom lane