ERROR: corrupt MVNDistinct entry
I ran into this error in estimate_multivariate_ndistinct, and it can
be reproduced with the query below.
create table t (a int, b int);
insert into t select 1, 1;
create statistics s (ndistinct) on a, b from t;
analyze;
explain select 1 from t t1
left join (select a c1, coalesce(a) c2 from t t2) s on true
group by s.c1, s.c2;
ERROR: corrupt MVNDistinct entry
And the first bad commit is:
2489d76c4906f4461a364ca8ad7e0751ead8aa0d is the first bad commit
commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Jan 30 13:16:20 2023 -0500
Make Vars be outer-join-aware.
So in this query, there are two grouping expressions: s.c1 is Var t2.a
with nullingrels set to {3}; s.c2 is a PHV with nullingrels also being
{3}, and its contained expression is Var t2.a with empty nullingrels.
This eventually leads to estimate_num_groups creating two separate
GroupVarInfos for Var t2.a: one with nullingrels {3}, and another with
empty nullingrels. As a result, estimate_multivariate_ndistinct
incorrectly assumes there are two matching expressions. When it later
fails to find the exact match for the combination, it mistakenly
concludes that there is a corrupt MVNDistinct entry.
It seems to me that when estimating the number of groups, we do not
need to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the grouping expressions, and we should not
count the same Var more than once.
So I wonder if we can fix this issue by removing the nullingrels
within the grouping expressions first in estimate_num_groups. Such as:
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
#include "storage/bufmgr.h"
#include "utils/acl.h"
@@ -3446,6 +3447,10 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
if (groupExprs == NIL || (pgset && *pgset == NIL))
return 1.0;
+ groupExprs = (List *) remove_nulling_relids((Node *) groupExprs,
+ root->outer_join_rels,
+ NULL);
+
/*
* Count groups derived from boolean grouping expressions. For other
* expressions, find the unique Vars used, treating an expression as a Var
Any thoughts?
Thanks
Richard
On 12/24/24 15:00, Richard Guo wrote:
Any thoughts?
I have a couple of notes.
1. The nulling_relids provides us sensible information about possible
nulls inside the input. We are not using it to estimate the number of
such nulls for now. Does Your idea consist of obtaining 'clear'
statistics and reusing nulling_relids knowledge somewhere later?
2. It is ok for Vars. But what about expressions? We use equal() in
distinct, MCV and dependencies modules. Do we need to remove nulls
before using extended statistics as a general rule?
--
regards, Andrei Lepikhov
On Tue, Dec 24, 2024 at 5:00 PM Richard Guo <guofenglinux@gmail.com> wrote:
It seems to me that when estimating the number of groups, we do not
need to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the grouping expressions, and we should not
count the same Var more than once.
Furthermore, accounting for nullingrels in the grouping expressions
could prevent us from matching a grouping expression to expressional
index columns or to the expressions in extended statistics, as these
expressions are not decorated with any nullingrels bits. As an
example, consider
create table t (a int, b int);
insert into t select i%10, i%10 from generate_series(1,1000)i;
create statistics s (ndistinct) on (a+b), (a-b) from t;
analyze;
-- after v16
explain (costs on)
select count(*) from t t1 left join t t2 on t1.a = t2.a group by t2.a+t2.b;
QUERY PLAN
--------------------------------------------------------------------------
HashAggregate (cost=1920.00..1921.25 rows=100 width=12)
Group Key: (t2.a + t2.b)
-> Hash Left Join (cost=27.50..1420.00 rows=100000 width=4)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=4)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(7 rows)
-- before v16
explain (costs on)
select count(*) from t t1 left join t t2 on t1.a = t2.a group by t2.a+t2.b;
QUERY PLAN
--------------------------------------------------------------------------
HashAggregate (cost=1920.00..1920.12 rows=10 width=12)
Group Key: (t2.a + t2.b)
-> Hash Left Join (cost=27.50..1420.00 rows=100000 width=4)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=4)
-> Hash (cost=15.00..15.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(7 rows)
Attached is a more formal patch to strip out all the nullingrels from
the grouping expressions before we estimate number of groups.
Thanks
Richard
Attachments:
v1-0001-Do-not-account-for-nullingrels-when-estimating-number-of-groups.patchapplication/octet-stream; name=v1-0001-Do-not-account-for-nullingrels-when-estimating-number-of-groups.patchDownload
From d144852e78aae6cff4dde85ac5e21cf92c5f8c2d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 25 Dec 2024 12:34:18 +0900
Subject: [PATCH v1] Do not account for nullingrels when estimating number of
groups
When estimating number of groups, we do not need to concern ourselves
with the outer joins that could null the Vars/PHVs contained in the
grouping expressions. Accounting for nullingrels in the grouping
expressions could cause estimate_num_groups to count the same Var
multiple times if it's marked with different nullingrels. This is
incorrect, and could lead to "ERROR: corrupt MVNDistinct entry" when
searching for multivariate n-distinct.
Furthermore, accounting for nullingrels in the grouping expressions
could prevent us from matching a grouping expression to expressional
index columns or to the expressions in extended statistics, leading to
inaccurate estimates.
To fix, strip out all the nullingrels from the grouping expressions
before we estimate number of groups.
---
src/backend/utils/adt/selfuncs.c | 12 ++++++++++++
src/test/regress/expected/join.out | 21 +++++++++++++++++++++
src/test/regress/sql/join.sql | 13 +++++++++++++
3 files changed, 46 insertions(+)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 08fa6774d9..57c61963d2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
#include "storage/bufmgr.h"
#include "utils/acl.h"
@@ -3446,6 +3447,17 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
if (groupExprs == NIL || (pgset && *pgset == NIL))
return 1.0;
+ /*
+ * Strip out all the nullingrels bits from the grouping expressions.
+ * These nullingrels bits could cause the same Var to be counted multiple
+ * times if it's marked with different nullingrels. They could also
+ * prevent us from matching a grouping expression to expressional index
+ * columns or to the expressions in extended statistics.
+ */
+ groupExprs = (List *) remove_nulling_relids((Node *) copyObject(groupExprs),
+ root->outer_join_rels,
+ NULL);
+
/*
* Count groups derived from boolean grouping expressions. For other
* expressions, find the unique Vars used, treating an expression as a Var
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0c9b312eaf..ef737a4946 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -8217,3 +8217,24 @@ SELECT * FROM rescan_bhs t1 LEFT JOIN rescan_bhs t2 ON t1.a IN
RESET enable_seqscan;
RESET enable_indexscan;
+-- Test that we do not account for nullingrels when estimating number of groups
+CREATE TABLE group_tbl (a INT, b INT);
+INSERT INTO group_tbl SELECT 1, 1;
+CREATE STATISTICS group_tbl_stat (ndistinct) ON a, b FROM group_tbl;
+ANALYZE group_tbl;
+EXPLAIN (COSTS OFF)
+SELECT 1 FROM group_tbl t1
+ LEFT JOIN (SELECT a c1, COALESCE(a) c2 FROM group_tbl t2) s ON TRUE
+GROUP BY s.c1, s.c2;
+ QUERY PLAN
+--------------------------------------------
+ Group
+ Group Key: t2.a, (COALESCE(t2.a))
+ -> Sort
+ Sort Key: t2.a, (COALESCE(t2.a))
+ -> Nested Loop Left Join
+ -> Seq Scan on group_tbl t1
+ -> Seq Scan on group_tbl t2
+(7 rows)
+
+DROP TABLE group_tbl;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8cfc1053cb..ff6ca94aa2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -3033,3 +3033,16 @@ SELECT * FROM rescan_bhs t1 LEFT JOIN rescan_bhs t2 ON t1.a IN
RESET enable_seqscan;
RESET enable_indexscan;
+
+-- Test that we do not account for nullingrels when estimating number of groups
+CREATE TABLE group_tbl (a INT, b INT);
+INSERT INTO group_tbl SELECT 1, 1;
+CREATE STATISTICS group_tbl_stat (ndistinct) ON a, b FROM group_tbl;
+ANALYZE group_tbl;
+
+EXPLAIN (COSTS OFF)
+SELECT 1 FROM group_tbl t1
+ LEFT JOIN (SELECT a c1, COALESCE(a) c2 FROM group_tbl t2) s ON TRUE
+GROUP BY s.c1, s.c2;
+
+DROP TABLE group_tbl;
--
2.43.0
On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
I have a couple of notes.
1. The nulling_relids provides us sensible information about possible
nulls inside the input. We are not using it to estimate the number of
such nulls for now. Does Your idea consist of obtaining 'clear'
statistics and reusing nulling_relids knowledge somewhere later?
Are you referring to the nullfrac estimates? A RelOptInfo's
nulling_relids records all above outer joins that can null this rel.
However, I cannot see how it helps with nullfrac.
2. It is ok for Vars. But what about expressions? We use equal() in
distinct, MCV and dependencies modules. Do we need to remove nulls
before using extended statistics as a general rule?
AFAIU, the expressions in extended statistics are not decorated with
any nullingrels bits, are they?
Thanks
Richard
On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
2. It is ok for Vars. But what about expressions? We use equal() in
distinct, MCV and dependencies modules. Do we need to remove nulls
before using extended statistics as a general rule?AFAIU, the expressions in extended statistics are not decorated with
any nullingrels bits, are they?
I've just realized that there may be other places with similar issues,
not just in estimate_num_groups. For instance,
-- after v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
QUERY PLAN
--------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..15032.50 rows=5000 width=16)
Filter: ((t2.a + t2.b) IS NULL)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(5 rows)
-- before v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
QUERY PLAN
--------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..15032.50 rows=1 width=16)
Filter: ((t2.a + t2.b) IS NULL)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(5 rows)
In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
It seems that we need to strip out the nullingrels bits from
expressions before matching them to extended statistics or
expressional index columns in more places.
Thanks
Richard
On 25/12/2024 16:36, Richard Guo wrote:
On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
2. It is ok for Vars. But what about expressions? We use equal() in
distinct, MCV and dependencies modules. Do we need to remove nulls
before using extended statistics as a general rule?AFAIU, the expressions in extended statistics are not decorated with
any nullingrels bits, are they?I've just realized that there may be other places with similar issues,
not just in estimate_num_groups. For instance,
I'm pleased to see that you've grasped my initially unclear idea. Yeah,
it seems that all types of statistics may be lost because of varnullingrels.
In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).It seems that we need to strip out the nullingrels bits from
expressions before matching them to extended statistics or
expressional index columns in more places.
I think Tomas Vondra may have a decisive opinion in this place: we have
already discussed some approaches to calculate NULLs generated by RHS of
Left Join shortly.
Maybe we can commit a quick cure like the one provided in your patch,
but we should remember this example - it is not apparent to me how to
estimate a group of clauses in the case when part of Vars has
varnullingrels and part of them - doesn't.
Also, I think this is a good example that an explain analyse summary
could have some sort of extended statistics usage report. It can help to
clearly identify cases when extended statistics don't work, but should.
- something like already implemented in SQL Server.
--
regards, Andrei Lepikhov
On Wed, Dec 25, 2024 at 6:36 PM Richard Guo <guofenglinux@gmail.com> wrote:
In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
Furthermore, even without extended statistics or expressional index
columns, the nullingrels can still cause issues with selectivity
estimates. They may cause examine_variable to fail in identifying a
valid RelOptInfo for an expression, making the relation size not
available, which is required in many cases. For instance,
create table t (a int, b int);
insert into t select i, i from generate_series(1,10)i;
analyze t;
-- in v16 and later
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..4.94 rows=1 width=24)
Join Filter: (t1.a = t3.a)
-> Nested Loop Left Join (cost=0.00..3.73 rows=1 width=16)
Filter: (t1.a = COALESCE(t2.a))
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8)
-> Materialize (cost=0.00..1.15 rows=10 width=8)
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8)
-> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8)
Filter: (a < 8)
(9 rows)
-- in v15 and before
explain (costs on)
select * from t t1
left join t t2 on true
left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
QUERY PLAN
-----------------------------------------------------------------------
Hash Left Join (cost=1.21..5.04 rows=10 width=24)
Hash Cond: (t1.a = t3.a)
-> Nested Loop Left Join (cost=0.00..3.73 rows=10 width=16)
Filter: (t1.a = COALESCE(t2.a))
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8)
-> Materialize (cost=0.00..1.15 rows=10 width=8)
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8)
-> Hash (cost=1.12..1.12 rows=7 width=8)
-> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8)
Filter: (a < 8)
(10 rows)
In v16 and later, when calculating the join selectivity for "t1.a =
coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of
coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the
corresponding RelOptInfo is not identifiable. This results in very
inaccurate join selectivity.
I'm wondering if we also need to strip out the nullingrels from the
expression in examine_variable(). I tried doing so and noticed a plan
diff in regression test join.sql.
@@ -2573,10 +2573,11 @@
-> Materialize
-> Seq Scan on int4_tbl t2
Filter: (f1 > 1)
- -> Seq Scan on int4_tbl t3
+ -> Materialize
+ -> Seq Scan on int4_tbl t3
-> Materialize
-> Seq Scan on int4_tbl t4
-(13 rows)
+(14 rows)
Thanks
Richard
On Fri, Dec 27, 2024 at 1:16 PM Richard Guo <guofenglinux@gmail.com> wrote:
I'm wondering if we also need to strip out the nullingrels from the
expression in examine_variable(). I tried doing so and noticed a plan
diff in regression test join.sql.
Here is an updated patch that implements this change, and also moves
the removal of the nullingrels within grouping expressions from
estimate_num_groups to add_unique_group_var.
I'm not quite sure how this change affects the exprs_known_equal call
in add_unique_group_var, possibly making it even less precise (it is
alreay somewhat fuzzy, as we do not provide a valid opfamily).
However, I think it should be fine for estimation purposes. Moreover,
in many cases, such as with groupClause, distinctClause, and
WindowClause's PARTITION BY clauses, we have already removed any items
that can be proven redundant.
Regarding the back-patch, this patch is a bug fix, which suggests it
should be back-patched. However, it also changes some plans by fixing
the cost estimation. Does anyone know what our usual approach is in
this situation?
Thanks
Richard
Attachments:
v2-0001-Ignore-nullingrels-when-looking-up-statistics.patchapplication/octet-stream; name=v2-0001-Ignore-nullingrels-when-looking-up-statistics.patchDownload
From 1a8786a75666094d015b33abca4dbb95a0ad72a8 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 31 Dec 2024 12:29:36 +0900
Subject: [PATCH v2] Ignore nullingrels when looking up statistics
When looking up statistical data about an expression, we do not need
to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the expression. Accounting for nullingrels in
the expression could cause estimate_num_groups to count the same Var
multiple times if it's marked with different nullingrels. This is
incorrect, and could lead to "ERROR: corrupt MVNDistinct entry" when
searching for multivariate n-distinct.
Furthermore, the nullingrels could prevent us from matching an
expression to expressional index columns or to the expressions in
extended statistics, leading to inaccurate estimates.
To fix, strip out all the nullingrels from the expression before we
look up statistical data about it.
---
src/backend/utils/adt/selfuncs.c | 30 +++++++++++++++++++++++++++---
src/test/regress/expected/join.out | 26 ++++++++++++++++++++++++--
src/test/regress/sql/join.sql | 13 +++++++++++++
3 files changed, 64 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 08fa6774d9..aa38b60da7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
#include "storage/bufmgr.h"
#include "utils/acl.h"
@@ -3306,6 +3307,15 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
ndistinct = get_variable_numdistinct(vardata, &isdefault);
+ /*
+ * The nullingrels bits within the var could cause the same var to be
+ * counted multiple times if it is marked with different nullingrels.
+ * They could also prevent us from matching the var to the expressions in
+ * extended statistics (see estimate_multivariate_ndistinct). So strip
+ * them out first.
+ */
+ var = remove_nulling_relids(var, root->outer_join_rels, NULL);
+
foreach(lc, varinfos)
{
varinfo = (GroupVarInfo *) lfirst(lc);
@@ -5077,7 +5087,13 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
{
int relid;
- if (bms_get_singleton_member(varnos, &relid))
+ /*
+ * Check if the expression is in vars of a single base relation. We
+ * need to exclude the OJs that might be present in the expression.
+ */
+ if (bms_get_singleton_member(bms_difference(varnos,
+ root->outer_join_rels),
+ &relid))
{
if (varRelid == 0 || varRelid == relid)
{
@@ -5107,8 +5123,6 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
}
}
- bms_free(varnos);
-
vardata->var = node;
vardata->atttype = exprType(node);
vardata->atttypmod = exprTypmod(node);
@@ -5132,6 +5146,14 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
ListCell *slist;
Oid userid;
+ /*
+ * The nullingrels bits within the expression could prevent us from
+ * matching it to expressional index columns or to the expressions in
+ * extended statistics. So strip them out first.
+ */
+ if (bms_overlap(varnos, root->outer_join_rels))
+ node = remove_nulling_relids(node, root->outer_join_rels, NULL);
+
/*
* Determine the user ID to use for privilege checks: either
* onerel->userid if it's set (e.g., in case we're accessing the table
@@ -5402,6 +5424,8 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
}
}
}
+
+ bms_free(varnos);
}
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0c9b312eaf..079fcf46f0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2573,10 +2573,11 @@ where t1.f1 = coalesce(t2.f1, 1);
-> Materialize
-> Seq Scan on int4_tbl t2
Filter: (f1 > 1)
- -> Seq Scan on int4_tbl t3
+ -> Materialize
+ -> Seq Scan on int4_tbl t3
-> Materialize
-> Seq Scan on int4_tbl t4
-(13 rows)
+(14 rows)
explain (costs off)
select * from int4_tbl t1
@@ -8217,3 +8218,24 @@ SELECT * FROM rescan_bhs t1 LEFT JOIN rescan_bhs t2 ON t1.a IN
RESET enable_seqscan;
RESET enable_indexscan;
+-- Test that we do not account for nullingrels when looking up statistics
+CREATE TABLE group_tbl (a INT, b INT);
+INSERT INTO group_tbl SELECT 1, 1;
+CREATE STATISTICS group_tbl_stat (ndistinct) ON a, b FROM group_tbl;
+ANALYZE group_tbl;
+EXPLAIN (COSTS OFF)
+SELECT 1 FROM group_tbl t1
+ LEFT JOIN (SELECT a c1, COALESCE(a) c2 FROM group_tbl t2) s ON TRUE
+GROUP BY s.c1, s.c2;
+ QUERY PLAN
+--------------------------------------------
+ Group
+ Group Key: t2.a, (COALESCE(t2.a))
+ -> Sort
+ Sort Key: t2.a, (COALESCE(t2.a))
+ -> Nested Loop Left Join
+ -> Seq Scan on group_tbl t1
+ -> Seq Scan on group_tbl t2
+(7 rows)
+
+DROP TABLE group_tbl;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8cfc1053cb..779d56cb30 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -3033,3 +3033,16 @@ SELECT * FROM rescan_bhs t1 LEFT JOIN rescan_bhs t2 ON t1.a IN
RESET enable_seqscan;
RESET enable_indexscan;
+
+-- Test that we do not account for nullingrels when looking up statistics
+CREATE TABLE group_tbl (a INT, b INT);
+INSERT INTO group_tbl SELECT 1, 1;
+CREATE STATISTICS group_tbl_stat (ndistinct) ON a, b FROM group_tbl;
+ANALYZE group_tbl;
+
+EXPLAIN (COSTS OFF)
+SELECT 1 FROM group_tbl t1
+ LEFT JOIN (SELECT a c1, COALESCE(a) c2 FROM group_tbl t2) s ON TRUE
+GROUP BY s.c1, s.c2;
+
+DROP TABLE group_tbl;
--
2.43.0
On Tue, Dec 31, 2024 at 5:40 PM Richard Guo <guofenglinux@gmail.com> wrote:
Regarding the back-patch, this patch is a bug fix, which suggests it
should be back-patched. However, it also changes some plans by fixing
the cost estimation. Does anyone know what our usual approach is in
this situation?
Although this patch could result in plan changes, it fixes an actual
bug, so I've back-patched it to v16.
Thanks
Richard