Optimization issue of branching UNION ALL

Started by Andrei Lepikhovover 3 years ago7 messageshackers
Jump to latest
#1Andrei Lepikhov
lepihov@gmail.com

Hi,

Client report on a corner case have shown up possible minor
non-optimality in procedure of transformation of simple UNION ALL
statement tree.
Complaint is about auto-generated query with 1E4 simple union all's (see
t.sh to generate a demo script). The reason: in REL_11_STABLE it is
planned and executed in a second, but REL_12_STABLE and beyond makes
matters worse: planning of such a query needs tons of gigabytes of RAM.

Superficial study revealed possibly unnecessary operations that could be
avoided:
1. Walking across a query by calling substitute_phv_relids() even if
lastPHId shows that no one phv is presented.
2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?

In attachment you can see some sketch that reduces a number of planner
cycles/copyings.

--
Regards
Andrey Lepikhov
Postgres Professional

Attachments:

t.shapplication/x-shellscript; name=t.shDownload
union_all_optimization.patchtext/x-patch; charset=UTF-8; name=union_all_optimization.patchDownload+25-15
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#1)
Re: Optimization issue of branching UNION ALL

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

Complaint is about auto-generated query with 1E4 simple union all's (see
t.sh to generate a demo script). The reason: in REL_11_STABLE it is
planned and executed in a second, but REL_12_STABLE and beyond makes
matters worse: planning of such a query needs tons of gigabytes of RAM.

v11 (and prior versions) sucks just as badly. In this example it
accidentally escapes trouble because it doesn't know how to pull up
a subquery with empty FROM. But if you make the query look like

SELECT 1,1 FROM dual
UNION ALL
SELECT 2,2 FROM dual
UNION ALL
SELECT 3,3 FROM dual
...

then v11 chokes as well. (Seems like we've overlooked the need
for check_stack_depth() and CHECK_FOR_INTERRUPTS() here ...)

Superficial study revealed possibly unnecessary operations that could be
avoided:
1. Walking across a query by calling substitute_phv_relids() even if
lastPHId shows that no one phv is presented.

Yeah, we could do that, and it'd help some.

2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?

After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted. So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.

regards, tom lane

Attachments:

union_all_optimization-v2.patchtext/x-diff; charset=us-ascii; name=union_all_optimization-v2.patchDownload+42-19
#3Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#2)
Re: Optimization issue of branching UNION ALL

On Thu, Dec 22, 2022 at 9:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

Superficial study revealed possibly unnecessary operations that could be
avoided:
1. Walking across a query by calling substitute_phv_relids() even if
lastPHId shows that no one phv is presented.

Yeah, we could do that, and it'd help some.

I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery. I'm not sure why we have
this check. It seems not necessary.

In remove_result_refs, I don't think we need to check 'lastPHId' again
before calling substitute_phv_relids, since it has been checked a few
lines earlier.

Thanks
Richard

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#3)
Re: Optimization issue of branching UNION ALL

Richard Guo <guofenglinux@gmail.com> writes:

I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery. I'm not sure why we have
this check. It seems not necessary.

Yeah, I was wondering about that too ... maybe it was important
in some previous state of the code? I didn't do any archeology
though.

In remove_result_refs, I don't think we need to check 'lastPHId' again
before calling substitute_phv_relids, since it has been checked a few
lines earlier.

Oh, duh ...

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: Optimization issue of branching UNION ALL

I wrote:

Richard Guo <guofenglinux@gmail.com> writes:

I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery. I'm not sure why we have
this check. It seems not necessary.

Yeah, I was wondering about that too ... maybe it was important
in some previous state of the code? I didn't do any archeology
though.

After a bit of "git blame"-ing, it appears that that hasSubLinks
check was introduced in e006a24ad, which added a FlattenedSubLink
node type and needed to fix them up here:

+	 * We also have to fix the relid sets of any FlattenedSubLink nodes in
+	 * the parent query.  (This could perhaps be done by ResolveNew, but it

Then when I got rid of FlattenedSubLink in e549722a8, I neglected
to remove that check. So I think maybe we don't need it, but I've
not tested.

regards, tom lane

#6Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#2)
Re: Optimization issue of branching UNION ALL

On 22/12/2022 06:50, Tom Lane wrote:

2. Iterative passes along the append_rel_list for replacing vars in the
translated_vars field. I can't grasp real necessity of passing all the
append_rel_list during flattening of an union all leaf subquery. No one
can reference this leaf, isn't it?

After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted. So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.

Thanks, I have written the letter because of some doubts too. But only
one weak point I could imagine - if someday sql standard will be changed.
Your code looks better, than previous attempt.

--
regards,
Andrey Lepikhov
Postgres Professional

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#6)
Re: Optimization issue of branching UNION ALL

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

Thanks, I have written the letter because of some doubts too. But only
one weak point I could imagine - if someday sql standard will be changed.

Yeah, if they ever decide that LATERAL should be allowed to reference a
previous sub-query of UNION ALL, that'd probably break this. But it'd
break a lot of other code too, so I'm not going to worry about it.

I pushed the main fix to HEAD only, and the recursion checks to
all branches.

regards, tom lane