Optimize cardinality estimation when unique keys are fully covered
Hi,
Recently, I ran into a cardinality estimation problem where the query
predicates fully cover a primary key or unique index. The issue can be
reproduced with the following case:
```sql
create table t1(a int, b int, primary key(a, b));
DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;
create table t2(a int);
insert into t2 select 1;
analyze t1, t2;
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)
```
It can be observed that the Index Cond fully covers the primary key index.
Naturally, the correct rows estimate should be 1, but the optimizer
estimates the two conditions independently, resulting in an overestimated
row count. This overestimation has a greater impact in scenarios with
partitioned tables and multi-table joins, making the planner more inclined
to choose hash or merge joins.
We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.
Best Regards,
Fei Changhong
Alibaba Cloud Computing Ltd.
Attachments:
optimize-cardinality-fully-covered-uk.patchapplication/octet-stream; charset=utf-8; name=optimize-cardinality-fully-covered-uk.patchDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8335cf5b5c5..92d65b40050 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5358,16 +5358,28 @@ void
set_baserel_size_estimates(PlannerInfo *root, RelOptInfo *rel)
{
double nrows;
+ bool coveringUniquekey = false;
/* Should only be applied to base relations */
Assert(rel->relid > 0);
- nrows = rel->tuples *
- clauselist_selectivity(root,
- rel->baserestrictinfo,
- 0,
- JOIN_INNER,
- NULL);
+ if (rel->rtekind == RTE_RELATION)
+ {
+ RangeTblEntry *rte = rt_fetch(rel->relid, root->parse->rtable);
+
+ coveringUniquekey = clauses_covering_uniquekey(rte->relid, rel->relid,
+ rel->baserestrictinfo);
+ }
+
+ if (coveringUniquekey)
+ nrows = 1;
+ else
+ nrows = rel->tuples *
+ clauselist_selectivity(root,
+ rel->baserestrictinfo,
+ 0,
+ JOIN_INNER,
+ NULL);
rel->rows = clamp_row_est(nrows);
@@ -5390,6 +5402,7 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
{
List *allclauses;
double nrows;
+ bool coveringUniquekey = false;
/*
* Estimate the number of rows returned by the parameterized scan, knowing
@@ -5398,12 +5411,32 @@ get_parameterized_baserel_size(PlannerInfo *root, RelOptInfo *rel,
* non-join clauses during selectivity estimation.
*/
allclauses = list_concat_copy(param_clauses, rel->baserestrictinfo);
- nrows = rel->tuples *
- clauselist_selectivity(root,
- allclauses,
- rel->relid, /* do not use 0! */
- JOIN_INNER,
- NULL);
+
+ if (rel->rtekind == RTE_RELATION)
+ {
+ RangeTblEntry *rte = rt_fetch(rel->relid, root->parse->rtable);
+
+ coveringUniquekey = clauses_covering_uniquekey(rte->relid, rel->relid,
+ allclauses);
+ }
+
+ if (coveringUniquekey)
+ {
+ nrows = 1;
+
+ if (!rel->adjust_param_clauses)
+ {
+ rel->adjust_rows = nrows;
+ rel->adjust_param_clauses = allclauses;
+ }
+ }
+ else
+ nrows = rel->tuples *
+ clauselist_selectivity(root,
+ allclauses,
+ rel->relid, /* do not use 0! */
+ JOIN_INNER,
+ NULL);
nrows = clamp_row_est(nrows);
/* For safety, make sure result is not more than the base estimate */
if (nrows > rel->rows)
@@ -5440,14 +5473,41 @@ set_joinrel_size_estimates(PlannerInfo *root, RelOptInfo *rel,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ Cardinality outer_rows = outer_rel->rows;
+ Cardinality inner_rows = inner_rel->rows;
+ List *join_clauses = list_copy(restrictlist);
+ List *remove_clauses = NIL;
+ ListCell *l1;
+ ListCell *l2;
+
+ if (outer_rel->adjust_param_clauses)
+ {
+ remove_clauses = outer_rel->adjust_param_clauses;
+ outer_rows = outer_rel->adjust_rows;
+ }
+ else if (inner_rel->adjust_param_clauses)
+ {
+ remove_clauses = inner_rel->adjust_param_clauses;
+ inner_rows = inner_rel->adjust_rows;
+ }
+
+ foreach(l1, remove_clauses)
+ {
+ foreach(l2, join_clauses)
+ {
+ if (lfirst(l1) == lfirst(l2))
+ join_clauses = foreach_delete_current(join_clauses, l2);
+ }
+ }
+
rel->rows = calc_joinrel_size_estimate(root,
rel,
outer_rel,
inner_rel,
- outer_rel->rows,
- inner_rel->rows,
+ outer_rows,
+ inner_rows,
sjinfo,
- restrictlist);
+ join_clauses);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 1158bc194c3..1dbaa34f803 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -304,6 +304,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->all_partrels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->adjust_param_clauses = NIL;
+ rel->adjust_rows = 0;
/*
* Pass assorted information down the inheritance hierarchy.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 540aa9628d7..cfb4a3fbf81 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -269,6 +269,8 @@ static bool get_actual_variable_endpoint(Relation heapRel,
static RelOptInfo *find_join_input_rel(PlannerInfo *root, Relids relids);
static double btcost_correlation(IndexOptInfo *index,
VariableStatData *vardata);
+static bool clauses_varattnos_covering_index(Oid indexOid, Bitmapset *varattnos);
+static Bitmapset *pull_eq_clauses_varattnos(List *clauses, Index rtIndex);
/* Define support routines for MCV hash tables */
#define SH_PREFIX MCVHashTable
@@ -7333,6 +7335,7 @@ genericcostestimate(PlannerInfo *root,
double qual_arg_cost;
List *selectivityQuals;
ListCell *l;
+ Bitmapset *varattnos;
/*
* If the index is partial, AND the index predicate with the explicitly
@@ -7365,11 +7368,15 @@ genericcostestimate(PlannerInfo *root,
}
}
- /* Estimate the fraction of main-table tuples that will be visited */
- indexSelectivity = clauselist_selectivity(root, selectivityQuals,
- index->rel->relid,
- JOIN_INNER,
- NULL);
+ varattnos = pull_eq_clauses_varattnos(selectivityQuals, index->rel->relid);
+ if (clauses_varattnos_covering_index(index->indexoid, varattnos))
+ indexSelectivity = 1.0 / index->rel->tuples;
+ else
+ /* Estimate the fraction of main-table tuples that will be visited */
+ indexSelectivity = clauselist_selectivity(root, selectivityQuals,
+ index->rel->relid,
+ JOIN_INNER,
+ NULL);
/*
* If caller didn't give us an estimate, estimate the number of index
@@ -9123,3 +9130,123 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*indexPages = index->pages;
}
+
+bool
+clauses_covering_uniquekey(Oid relid, Index rtIndex, List *clauses)
+{
+ ListCell *l;
+ Bitmapset *varattnos;
+ Relation relation;
+ List *indexOids;
+ bool covered = false;
+
+ varattnos = pull_eq_clauses_varattnos(clauses, rtIndex);
+ if (!varattnos)
+ return false;
+
+ relation = table_open(relid, AccessShareLock);
+ indexOids = RelationGetIndexList(relation);
+
+ foreach(l, indexOids)
+ {
+ if (clauses_varattnos_covering_index(lfirst_oid(l), varattnos))
+ {
+ covered = true;
+ break;
+ }
+ }
+
+ list_free(indexOids);
+ table_close(relation, AccessShareLock);
+
+ return covered;
+}
+
+static bool
+clauses_varattnos_covering_index(Oid indexOid, Bitmapset *varattnos)
+{
+ Relation indexRelation;
+ Bitmapset *index_attrs = NULL;
+ int i;
+ bool covered = false;
+
+ indexRelation = index_open(indexOid, AccessShareLock);
+
+ /* Must is a primary key or unique index */
+ if (!indexRelation->rd_index->indisprimary && !indexRelation->rd_index->indisunique)
+ {
+ index_close(indexRelation, AccessShareLock);
+ return false;
+ }
+
+ for (i = 0; i < indexRelation->rd_index->indnkeyatts; i++)
+ {
+ int attrnum = indexRelation->rd_index->indkey.values[i];
+
+ /*
+ * Collect all key attribute numbers from this index. Skip expression
+ * indexes (attrnum == 0) as they cannot be checked for coverage by
+ * simple attribute numbers.
+ */
+ if (attrnum != 0)
+ index_attrs = bms_add_member(index_attrs, attrnum);
+ }
+
+ /*
+ * Check if varattnos fully covers all attributes of this index. If so,
+ * this means the constraints are sufficient to uniquely identify rows.
+ */
+ if (bms_is_subset(index_attrs, varattnos))
+ covered = true;
+
+ bms_free(index_attrs);
+ index_close(indexRelation, AccessShareLock);
+
+ return covered;
+}
+
+
+static Bitmapset *
+pull_eq_clauses_varattnos(List *clauses, Index rtIndex)
+{
+ ListCell *l;
+ Bitmapset *varattnos = NULL;
+
+ foreach(l, clauses)
+ {
+ Node *clause = (Node *) lfirst(l);
+ RestrictInfo *rinfo;
+
+ if (!IsA(clause, RestrictInfo))
+ continue;
+
+ rinfo = (RestrictInfo *) clause;
+ clause = (Node *) rinfo->clause;
+
+ if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *expr = (OpExpr *) clause;
+
+ if (strcmp(get_opname(expr->opno), "=") != 0)
+ continue;
+
+ if (IsA(linitial(expr->args), Var))
+ {
+ Var *var = (Var *) linitial(expr->args);
+
+ if (var->varlevelsup == 0 && var->varno == rtIndex)
+ varattnos = bms_add_member(varattnos, var->varattno);
+ }
+
+ if (IsA(lsecond(expr->args), Var))
+ {
+ Var *var = (Var *) lsecond(expr->args);
+
+ if (var->varlevelsup == 0 && var->varno == rtIndex)
+ varattnos = bms_add_member(varattnos, var->varattno);
+ }
+ }
+ }
+
+ return varattnos;
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..74285e3709b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -769,8 +769,8 @@ cleanup_directories_atexit(void)
if (made_new_pgdata)
{
pg_log_info("removing data directory \"%s\"", pg_data);
- if (!rmtree(pg_data, true))
- pg_log_error("failed to remove data directory");
+ // if (!rmtree(pg_data, true))
+ // pg_log_error("failed to remove data directory");
}
else if (found_existing_pgdata)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..dba1064adc3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1122,6 +1122,9 @@ typedef struct RelOptInfo
/* extension state */
void **extension_state pg_node_attr(read_write_ignore);
int extension_state_allocated;
+
+ Cardinality adjust_rows;
+ List *adjust_param_clauses;
} RelOptInfo;
/*
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index fb4fa53363d..b6c0406e5ef 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -240,6 +240,7 @@ extern List *add_predicate_to_index_quals(IndexOptInfo *index,
extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
double loop_count,
GenericCosts *costs);
+extern bool clauses_covering_uniquekey(Oid relid, Index rtIndex, List *clauses);
/* Functions in array_selfuncs.c */
On Nov 22, 2025, at 00:54, feichanghong <feichanghong@qq.com> wrote:
Hi,
Recently, I ran into a cardinality estimation problem where the query
predicates fully cover a primary key or unique index. The issue can be
reproduced with the following case:```sql
create table t1(a int, b int, primary key(a, b));DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;create table t2(a int);
insert into t2 select 1;analyze t1, t2;
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)```
It can be observed that the Index Cond fully covers the primary key index.
Naturally, the correct rows estimate should be 1, but the optimizer
estimates the two conditions independently, resulting in an overestimated
row count. This overestimation has a greater impact in scenarios with
partitioned tables and multi-table joins, making the planner more inclined
to choose hash or merge joins.We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.
Apologies for the garbled text due to my email client. Please find the
reproduction SQL below again:
```sql
create table t1(a int, b int, primary key(a, b));
DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;
create table t2(a int);
insert into t2 select 1;
analyze t1, t2;
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)
```
Best Regards,
Fei Changhong
"=?utf-8?B?ZmVpY2hhbmdob25n?=" <feichanghong@qq.com> writes:
We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.
I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things. The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths). Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan. If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.
Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead of
Nested Loop (cost=0.43..1.57 rows=32 width=12)
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4)
-> Index Only Scan using t1_pkey on t1 (cost=0.43..4.76 rows=32 width=8)
Index Cond: ((a = t2.a) AND (b = 0))
we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same. And the t1 scan is not where the problem is: it already
made the right choice there. To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size. I don't see how this line of
work can fix that. (Yes, I see the hack you put into
set_joinrel_size_estimates. It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)
regards, tom lane
On Fri, Nov 21, 2025 at 01:52:07PM -0500, Tom Lane wrote:
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan. If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.
But wouldn't knowing that for an inner indexscan the cardinality is one
then drive the optimizer to choose inner indexscan over other methods?
Nico
--
On Nov 22, 2025, at 02:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"=?utf-8?B?ZmVpY2hhbmdob25n?=" <feichanghong@qq.com> writes:
We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things. The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths). Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan. If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.
It should be noted that the row estimate for a base relation will only be
adjusted to "1" when the baserestrictinfo fully covers a unique index,
for example "WHERE a = 1 AND b = 0". See set_baserel_size_estimates for
details. In contrast, "WHERE t1.a = t2.a AND t1.b = 0" will only affect
the ppi_rows of a parameterized path; see get_parameterized_baserel_size
for details.
In addition, for indexscan and bitmapscan, I also adjusted the
indexSelectivity in genericcostestimate.
For example, in the following case: in the first SQL, the
baserestrictinfo covers the primary key, so its "rows=1". In the second
SQL, it does not, and the planner chooses a hash join with "rows=98045"
(note that we still adjusted the join row estimate through the
parameterized path).
```sql
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on t1 (cost=0.00..21367.77 rows=1 width=8) (actual time=0.027..260.106 rows=1.00 loops=1)
Filter: ((a = 1) AND (b = 0))
Rows Removed by Filter: 1099999
Buffers: shared hit=4868
Planning Time: 0.111 ms
Execution Time: 260.136 ms
(6 rows)
postgres=# set enable_indexscan to off;
SET
postgres=# set enable_bitmapscan to off;
SET
postgres=# set max_parallel_workers_per_gather to 0;
SET
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.02..18986.82 rows=1 width=12) (actual time=0.042..280.086 rows=1.00 loops=1)
Hash Cond: (t1.a = t2.a)
Buffers: shared hit=4869
-> Seq Scan on t1 (cost=0.00..18617.81 rows=98045 width=8) (actual time=0.025..264.100 rows=100000.00 loops=1)
Filter: (b = 0)
Rows Removed by Filter: 1000000
Buffers: shared hit=4868
-> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.010..0.012 rows=1.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.006..0.007 rows=1.00 loops=1)
Buffers: shared hit=1
Planning Time: 0.210 ms
Execution Time: 280.137 ms
(14 rows)
```
Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead ofNested Loop (cost=0.43..1.57 rows=32 width=12)
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4)
-> Index Only Scan using t1_pkey on t1 (cost=0.43..4.76 rows=32 width=8)
Index Cond: ((a = t2.a) AND (b = 0))we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same. And the t1 scan is not where the problem is: it already
made the right choice there. To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size. I don't see how this line of
work can fix that. (Yes, I see the hack you put into
set_joinrel_size_estimates. It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)
For join cardinality estimation, we still follow the principle that the
rows should remain consistent regardless of which path is chosen. This is
also set in set_joinrel_size_estimates. The only adjustment is: if we know
that pushing down some join_clauses to lower-level nodes can yield a more
accurate row estimate (for example, in "t1.a = t2.a AND t1.b = 0" the
clause "t1.a = t2.a"), then we use the rows estimate after pushdown in the
lower node, and remove this join clause when computing join selectivity.
Of course, this is just a demo of an initial idea, and there are many
aspects that need improvement. Many thanks for your suggestions and
comments.
Best Regards,
Fei Changhong