From d9a633243aa7a2c3dc4e6b4b9c75a93823730d0d Mon Sep 17 00:00:00 2001 From: "David G. Johnston" Date: Thu, 9 Jun 2022 15:14:20 +0000 Subject: [PATCH] doc: Make selectivity example match text; rewrite proof in code --- doc/src/sgml/planstats.sgml | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index df85ea5eea..bdd2f4afd0 100644 --- a/doc/src/sgml/planstats.sgml +++ b/doc/src/sgml/planstats.sgml @@ -391,18 +391,20 @@ tablename | null_frac | n_distinct | most_common_vals In this case there is no MCV information for - unique2 because all the values appear to be - unique, so we use an algorithm that relies only on the number of - distinct values for both relations together with their null fractions: + unique2 and all the values appear to be + unique (n_distinct = -1), so we use an algorithm that relies on the row + count estimates for both relations (num_rows, not shown, but "tenk") + together with the column null fractions (zero for both): -selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2) +selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2) = (1 - 0) * (1 - 0) / max(10000, 10000) = 0.0001 This is, subtract the null fraction from one for each of the relations, - and divide by the maximum of the numbers of distinct values. + and divide by the row count of the larger relation (this value does get + scaled in the non-unique case). The number of rows that the join is likely to emit is calculated as the cardinality of the Cartesian product of the two inputs, multiplied by the -- 2.25.1