Optimization of NestLoop join in the case of guaranteed empty inner subtree

Started by Andrei Lepikhovover 6 years ago3 messageshackers
Jump to latest
#1Andrei Lepikhov
lepihov@gmail.com

During NestLoop execution we have bad corner case: if outer subtree
contains tuples the join node will scan inner subtree even if it does
not return any tuples.

To reproduce the problem see 'problem.sql' in attachment:
Out of explain analyze see in 'problem_explain.txt'

As you can see, executor scan each of 1e5 outer tuples despite the fact
that inner can't return any tuples.

Teodor Sigaev and I developed a patch to solve this problem. Result of
explain analyze procedure can be found in the 'optimized_execution.txt'.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachments:

problem.sqlapplication/sql; name=problem.sqlDownload
problem_explain.txttext/plain; charset=UTF-8; name=problem_explain.txtDownload
0001-Skip-scan-of-outer-subtree-if-inner-of-the-NestedLoo.patchtext/x-patch; charset=UTF-8; name=0001-Skip-scan-of-outer-subtree-if-inner-of-the-NestedLoo.patchDownload+24-5
optimized_execution.txttext/plain; charset=UTF-8; name=optimized_execution.txtDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#1)
Re: Optimization of NestLoop join in the case of guaranteed empty inner subtree

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

During NestLoop execution we have bad corner case: if outer subtree
contains tuples the join node will scan inner subtree even if it does
not return any tuples.

So the first question about corner-case optimizations like this is always
"how much overhead does it add in the normal case where it fails to gain
anything?". I see no performance numbers in your proposal.

I do not much like anything about the code, either: as written it's
only helpful for an especially narrow corner case (so narrow that
I wonder if it really ever helps at all: surely calling a nodeMaterial
whose tuplestore is empty doesn't cost much). But that doesn't stop it
from adding a bool to the generic PlanState struct, with global
implications. What I'd expected from your text description is that
nodeNestLoop would remember whether its inner child had returned zero rows
the first time, and assume that subsequent executions could be skipped
unless the inner child's parameters change.

regards, tom lane

#3Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#2)
Re: Optimization of NestLoop join in the case of guaranteed empty inner subtree

On 12/11/19 8:49 PM, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

During NestLoop execution we have bad corner case: if outer subtree
contains tuples the join node will scan inner subtree even if it does
not return any tuples.

So the first question about corner-case optimizations like this is always
"how much overhead does it add in the normal case where it fails to gain
anything?". I see no performance numbers in your proposal.

I thought it is trivial. But quick study shows no differences that can
be seen.

I do not much like anything about the code, either: as written it's
only helpful for an especially narrow corner case (so narrow that
I wonder if it really ever helps at all: surely calling a nodeMaterial
whose tuplestore is empty doesn't cost much).

Scanning of large outer can be very costly. If you will try to play with
analytical queries you can find cases, where nested loops uses
materialization of zero tuples. At least one of the cases for this is
finding data gaps.
Also, this optimization exists in logic of hash join.

But that doesn't stop it
from adding a bool to the generic PlanState struct, with global
implications. What I'd expected from your text description is that
nodeNestLoop would remember whether its inner child had returned zero rows
the first time, and assume that subsequent executions could be skipped
unless the inner child's parameters change.

This note I was waiting for. I agree with you that adding a bool
variable to PlanState is excessful. See in attachment another version of
the optimization.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachments:

v2-0001-Skip-scan-of-outer-subtree-if-inner-of-the-NestedLoo.patchtext/x-patch; charset=UTF-8; name=v2-0001-Skip-scan-of-outer-subtree-if-inner-of-the-NestedLoo.patchDownload+13-5