Possible documentation inaccuracy in optimizer README

Started by Zeyuan Huabout 1 year ago7 messagesdocs
Jump to latest
#1Zeyuan Hu
ferrishu3886@gmail.com

Hello,

In https://github.com/postgres/postgres/tree/master/src/backend/optimizer,
there are two examples on the dynamic programming (DP) algorithm used in
the optimizer, which I think
have some inaccuracy:

SELECT *
FROM tab1, tab2, tab3, tab4
WHERE tab1.col = tab2.col AND
tab2.col = tab3.col AND
tab3.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{2 3},{3 4}
{1 2 3},{2 3 4}
{1 2 3 4}
(other possibilities will be excluded for lack of join clauses)

SELECT *
FROM tab1, tab2, tab3, tab4
WHERE tab1.col = tab2.col AND
tab1.col = tab3.col AND
tab1.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{1 3},{1 4}
{1 2 3},{1 3 4},{1 2 4}
{1 2 3 4}

Both examples seem to ignore the transitivity imposed by the join clauses,
which creates a gap between the documentation here and what the actual
optimizer does.
For the first example, in the actual optimizer, {1,3},{1,4},{2,4} are also
considered. For the second example, {2,3}, {2,4}, {3,4} are also
considered.
The reason is that, for example 1, tab1.col = tab2.col AND tab2.col =
tab3.col imply that tab1.col = tab3.col due to transitivity, which leads to
the consideration of {1,3}.
The rest of the missing entries follow the same reasoning.

best
regards,

Zeyuan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeyuan Hu (#1)
Re: Possible documentation inaccuracy in optimizer README

Zeyuan Hu <ferrishu3886@gmail.com> writes:

In https://github.com/postgres/postgres/tree/master/src/backend/optimizer,
there are two examples on the dynamic programming (DP) algorithm used in
the optimizer, which I think
have some inaccuracy:

You're right that these examples do not consider the effects of
clauses generated by the EquivalenceClass machinery. But I don't
think the exposition would be improved by mentioning that here.
The point of these examples is that we don't consider joining
rels that have no linking clauses at all.

We could possibly avoid the inaccuracy by making the examples use
some other operators that are not equijoins. But I wonder if that
would not be more confusing rather than less so.

regards, tom lane

#3Zeyuan Hu
ferrishu3886@gmail.com
In reply to: Tom Lane (#2)
Re: Possible documentation inaccuracy in optimizer README

Hello Tom,

Thanks for the clarification. I now understand the main goal of the
examples. I was confused by the remarks in the example "(other
possibilities will be excluded for lack of join clauses)" as this is not a
true statement: some possibilities are not shown for lack of join clauses
while others are not shown for the sake of simplicity. I think it would be
nice to add what you explained somewhere in the text to indicate this is a
partial example with the main goal of illustrating join rels that have no
linking clauses are not considered by the optimizer; I got the impression
that these two examples are to illustrate how DP works in the optimizer.

best
regards,

Zeyuan

On Mon, Apr 7, 2025 at 9:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Zeyuan Hu <ferrishu3886@gmail.com> writes:

In

https://github.com/postgres/postgres/tree/master/src/backend/optimizer,

there are two examples on the dynamic programming (DP) algorithm used in
the optimizer, which I think
have some inaccuracy:

You're right that these examples do not consider the effects of
clauses generated by the EquivalenceClass machinery. But I don't
think the exposition would be improved by mentioning that here.
The point of these examples is that we don't consider joining
rels that have no linking clauses at all.

We could possibly avoid the inaccuracy by making the examples use
some other operators that are not equijoins. But I wonder if that
would not be more confusing rather than less so.

regards, tom lane

#4David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#2)
Re: Possible documentation inaccuracy in optimizer README

On Tue, 8 Apr 2025 at 16:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We could possibly avoid the inaccuracy by making the examples use
some other operators that are not equijoins. But I wonder if that
would not be more confusing rather than less so.

I don't think it'd hurt to mention that we're just ignoring the
existence of ECs for this example. Something like:

--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -143,7 +143,10 @@ For example:
     {1 2},{2 3},{3 4}
     {1 2 3},{2 3 4}
     {1 2 3 4}
-    (other possibilities will be excluded for lack of join clauses)
+    (other possibilities will be excluded for lack of join clauses
+    (technically, EquivalenceClasses do allow us to determine derived join
+    clauses for this case, but we ignore that for the simplicity of this
+    example))

SELECT *
FROM tab1, tab2, tab3, tab4

If it'll stop a future question or someone from being confused, it
seems worthwhile.

David

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#4)
Re: Possible documentation inaccuracy in optimizer README

David Rowley <dgrowleyml@gmail.com> writes:

I don't think it'd hurt to mention that we're just ignoring the
existence of ECs for this example.

Seems like a reasonable approach.

-    (other possibilities will be excluded for lack of join clauses)
+    (other possibilities will be excluded for lack of join clauses
+    (technically, EquivalenceClasses do allow us to determine derived join
+    clauses for this case, but we ignore that for the simplicity of this
+    example))

Maybe better:

Other possibilities will be excluded for lack of join clauses.
(In reality, use of EquivalenceClasses would allow us to
deduce additional join clauses that allow more join
combinations, but here we ignore that to preserve the
simplicity of this example.)

regards, tom lane

#6David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#5)
Re: Possible documentation inaccuracy in optimizer README

On Wed, 9 Apr 2025 at 14:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe better:

Other possibilities will be excluded for lack of join clauses.
(In reality, use of EquivalenceClasses would allow us to
deduce additional join clauses that allow more join
combinations, but here we ignore that to preserve the
simplicity of this example.)

Looks good to me.

David

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#6)
Re: Possible documentation inaccuracy in optimizer README

David Rowley <dgrowleyml@gmail.com> writes:

On Wed, 9 Apr 2025 at 14:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe better:

Other possibilities will be excluded for lack of join clauses.
(In reality, use of EquivalenceClasses would allow us to
deduce additional join clauses that allow more join
combinations, but here we ignore that to preserve the
simplicity of this example.)

Looks good to me.

OK, done. I moved the text after noticing that it really applies
to both of the examples here.

regards, tom lane