Adjust tuples estimate for appendrels
(This was briefly discussed in [1]/messages/by-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com, which primarily focuses on the
incremental sort regression. So start a new thread for this topic.)
In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel. Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied. Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may. Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.
AFAICS, doing so can lead to issues in cost estimates. For instance,
when estimating the number of distinct values from an appendrel, we
would not be able to adjust the estimate based on the restriction
selectivity (see estimate_num_groups()).
Attached is a patch that sets an appendrel's tuples to the total
number of tuples accumulated from each live child, which I believe
aligns better with reality.
Here's a simple example that demonstrates how this change improves
cost estimates in certain cases.
create table p (a int, b int, c float) partition by range(a);
create table p1 partition of p for values from (0) to (1000);
create table p2 partition of p for values from (1000) to (2000);
insert into p select i%2000, random(1, 100000), random(1, 100000) from
generate_series(1, 1000000)i;
analyze p;
explain analyze select b from p where c < 10000 group by b;
-- without this patch
HashAggregate (cost=18651.38..19568.54 rows=91716 width=4)
(actual time=467.859..487.227 rows=63346 loops=1)
-- with this patch
HashAggregate (cost=18651.38..19275.60 rows=62422 width=4)
(actual time=447.383..466.351 rows=63346 loops=1)
Unfortunately, I failed to come up with a stable test case that shows
a plan diff with this change. So the attached patch does not include
a test case for now.
BTW, the comment within set_append_rel_size() says that:
/*
* Set "raw tuples" count equal to "rows" for the appendrel; needed
* because some places assume rel->tuples is valid for any baserel.
*/
I wonder if this assumption still holds today. If we create an empty
table, analyze it, and then use it in a query, the table will have
rel->tuples set to zero and rel->rows set to one, which doesn't cause
any issues today.
[1]: /messages/by-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com
Thanks
Richard
Attachments:
v1-0001-Adjust-tuples-estimate-for-appendrels.patchapplication/octet-stream; name=v1-0001-Adjust-tuples-estimate-for-appendrels.patchDownload
From 3bdf95317a5dc082f71b3d40d5700dffce618317 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 24 Jan 2025 16:07:22 +0900
Subject: [PATCH v1] Adjust tuples estimate for appendrels
In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel. Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied. Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may. Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.
Doing so can lead to issues in cost estimates in some cases. For
instance, when estimating the number of distinct values from an
appendrel, we would not be able to adjust the estimate based on the
restriction selectivity.
This patch addresses this by setting an appendrel's tuples to the
total number of tuples accumulated from each live child, which better
aligns with reality.
---
src/backend/optimizer/path/allpaths.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1115ebeee2..a13331c269 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -958,6 +958,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
{
int parentRTindex = rti;
bool has_live_children;
+ double parent_tuples;
double parent_rows;
double parent_size;
double *parent_attrsizes;
@@ -995,6 +996,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
* have zero rows and/or width, if they were excluded by constraints.
*/
has_live_children = false;
+ parent_tuples = 0;
parent_rows = 0;
parent_size = 0;
nattrs = rel->max_attr - rel->min_attr + 1;
@@ -1161,6 +1163,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
*/
Assert(childrel->rows > 0);
+ parent_tuples += childrel->tuples;
parent_rows += childrel->rows;
parent_size += childrel->reltarget->width * childrel->rows;
@@ -1207,17 +1210,12 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
int i;
Assert(parent_rows > 0);
+ rel->tuples = parent_tuples;
rel->rows = parent_rows;
rel->reltarget->width = rint(parent_size / parent_rows);
for (i = 0; i < nattrs; i++)
rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
- /*
- * Set "raw tuples" count equal to "rows" for the appendrel; needed
- * because some places assume rel->tuples is valid for any baserel.
- */
- rel->tuples = parent_rows;
-
/*
* Note that we leave rel->pages as zero; this is important to avoid
* double-counting the appendrel tree in total_table_pages.
--
2.43.0
Richard Guo <guofenglinux@gmail.com> 于2025年1月27日周一 16:06写道:
(This was briefly discussed in [1], which primarily focuses on the
incremental sort regression. So start a new thread for this topic.)In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel. Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied. Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may. Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.AFAICS, doing so can lead to issues in cost estimates. For instance,
when estimating the number of distinct values from an appendrel, we
would not be able to adjust the estimate based on the restriction
selectivity (see estimate_num_groups()).Attached is a patch that sets an appendrel's tuples to the total
number of tuples accumulated from each live child, which I believe
aligns better with reality.
I basically agree with you. I just have a little advice.
Can we give some explanation about why we do this adjust.
For example, adding below sentence you said above:
"doing so can lead to issues in cost estimates. For instance,
when estimating the number of distinct values from an appendrel, we
would not be able to adjust the estimate based on the restriction
selectivity (see estimate_num_groups())."
Others looks good to me.
Here's a simple example that demonstrates how this change improves
cost estimates in certain cases.create table p (a int, b int, c float) partition by range(a);
create table p1 partition of p for values from (0) to (1000);
create table p2 partition of p for values from (1000) to (2000);insert into p select i%2000, random(1, 100000), random(1, 100000) from
generate_series(1, 1000000)i;analyze p;
explain analyze select b from p where c < 10000 group by b;
-- without this patch
HashAggregate (cost=18651.38..19568.54 rows=91716 width=4)
(actual time=467.859..487.227 rows=63346 loops=1)-- with this patch
HashAggregate (cost=18651.38..19275.60 rows=62422 width=4)
(actual time=447.383..466.351 rows=63346 loops=1)Unfortunately, I failed to come up with a stable test case that shows
a plan diff with this change. So the attached patch does not include
a test case for now.
Yeah, it seems not easy to write test case to reflect this change. We
usually set costs off in EXPLAIN.
I think it's ok without a test case.
BTW, the comment within set_append_rel_size() says that:
/*
* Set "raw tuples" count equal to "rows" for the appendrel; needed
* because some places assume rel->tuples is valid for any baserel.
*/I wonder if this assumption still holds today. If we create an empty
table, analyze it, and then use it in a query, the table will have
rel->tuples set to zero and rel->rows set to one, which doesn't cause
any issues today.[1]
/messages/by-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.comThanks
Richard
--
Thanks,
Tender Wang
Hi!
On 27.01.2025 11:05, Richard Guo wrote:
(This was briefly discussed in [1], which primarily focuses on the
incremental sort regression. So start a new thread for this topic.)In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel. Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied. Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may. Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.AFAICS, doing so can lead to issues in cost estimates. For instance,
when estimating the number of distinct values from an appendrel, we
would not be able to adjust the estimate based on the restriction
selectivity (see estimate_num_groups()).Attached is a patch that sets an appendrel's tuples to the total
number of tuples accumulated from each live child, which I believe
aligns better with reality.Here's a simple example that demonstrates how this change improves
cost estimates in certain cases.create table p (a int, b int, c float) partition by range(a);
create table p1 partition of p for values from (0) to (1000);
create table p2 partition of p for values from (1000) to (2000);insert into p select i%2000, random(1, 100000), random(1, 100000) from
generate_series(1, 1000000)i;analyze p;
explain analyze select b from p where c < 10000 group by b;
-- without this patch
HashAggregate (cost=18651.38..19568.54 rows=91716 width=4)
(actual time=467.859..487.227 rows=63346 loops=1)-- with this patch
HashAggregate (cost=18651.38..19275.60 rows=62422 width=4)
(actual time=447.383..466.351 rows=63346 loops=1)
I looked at it and agree with your solution.
Unfortunately, I failed to come up with a stable test case that shows
a plan diff with this change. So the attached patch does not include
a test case for now.
I created a stable test:
create table p (a int, b int, c float) partition by range(a);
create table p1 partition of p for values from (0) to (100);
create table p2 partition of p for values from (100) to (1000);
insert into p select i%200, i%300, i%400from
generate_series(1, 1000)i;
analyze p;
SELECT * FROM check_estimated_rows('select b from p where c < 10 group
by b');
estimated | actual
-----------+--------
27| 29
(1row)
drop table p;
I added it in the diff file.
--
Regards,
Alena Rybakina
Postgres Professional
Attachments:
test.difftext/x-patch; charset=UTF-8; name=test.diffDownload
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index f0707e7f7ea..f01d0d41bfd 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4468,4 +4468,17 @@ explain (costs off) select * from hp_contradict_test where a === 1 and b === 1 a
drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
+create table p (a int, b int, c float) partition by range(a);
+create table p1 partition of p for values from (0) to (100);
+create table p2 partition of p for values from (100) to (1000);
+insert into p select i%200, i%300, i%400 from
+generate_series(1, 1000)i;
+analyze p;
+SELECT * FROM check_estimated_rows('select b from p where c < 10 group by b');
+ estimated | actual
+-----------+--------
+ 27 | 29
+(1 row)
+
+drop table p;
drop function explain_analyze(text);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ea9a4fe4a23..0e7a775e469 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1353,4 +1353,15 @@ drop table hp_contradict_test;
drop operator class part_test_int4_ops2 using hash;
drop operator ===(int4, int4);
+create table p (a int, b int, c float) partition by range(a);
+create table p1 partition of p for values from (0) to (100);
+create table p2 partition of p for values from (100) to (1000);
+
+insert into p select i%200, i%300, i%400 from
+generate_series(1, 1000)i;
+
+analyze p;
+SELECT * FROM check_estimated_rows('select b from p where c < 10 group by b');
+
+drop table p;
drop function explain_analyze(text);
Thanks for all the reviews. Attached is an updated patch that
resolves the review comments.
Thanks
Richard
Attachments:
v2-0001-Adjust-tuples-estimate-for-appendrels.patchapplication/octet-stream; name=v2-0001-Adjust-tuples-estimate-for-appendrels.patchDownload
From e42d7482f2174991219d6ad886bb5e4c3b1e288d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 24 Jan 2025 16:07:22 +0900
Subject: [PATCH v2] Adjust tuples estimate for appendrels
In set_append_rel_size(), we currently set rel->tuples to rel->rows
for an appendrel. Generally, rel->tuples is the raw number of tuples
in the relation and rel->rows is the estimated number of tuples after
the relation's restriction clauses have been applied. Although an
appendrel itself doesn't directly enforce any quals today, its child
relations may. Therefore, setting rel->tuples equal to rel->rows for
an appendrel isn't always appropriate.
Doing so can lead to issues in cost estimates in some cases. For
instance, when estimating the number of distinct values from an
appendrel, we would not be able to adjust the estimate based on the
restriction selectivity.
This patch addresses this by setting an appendrel's tuples to the
total number of tuples accumulated from each live child, which better
aligns with reality.
This is arguably a bug, but nobody has complained about that until
now, so no back-patch.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com
---
src/backend/optimizer/path/allpaths.c | 19 +++++++++++------
src/test/regress/expected/inherit.out | 30 +++++++++++++++++++++++++++
src/test/regress/sql/inherit.sql | 21 +++++++++++++++++++
3 files changed, 64 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1115ebeee2..b5bc9b602e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -958,6 +958,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
{
int parentRTindex = rti;
bool has_live_children;
+ double parent_tuples;
double parent_rows;
double parent_size;
double *parent_attrsizes;
@@ -983,6 +984,15 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/*
* Initialize to compute size estimates for whole append relation.
*
+ * We handle tuples estimates by setting "tuples" to the total number of
+ * tuples accumulated from each live child, rather than using "rows".
+ * Although an appendrel itself doesn't directly enforce any quals, its
+ * child relations may. Therefore, setting "tuples" equal to "rows" for
+ * an appendrel isn't always appropriate, and can lead to inaccurate cost
+ * estimates. For example, when estimating the number of distinct values
+ * from an appendrel, we would be unable to adjust the estimate based on
+ * the restriction selectivity (see estimate_num_groups).
+ *
* We handle width estimates by weighting the widths of different child
* rels proportionally to their number of rows. This is sensible because
* the use of width estimates is mainly to compute the total relation
@@ -995,6 +1005,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
* have zero rows and/or width, if they were excluded by constraints.
*/
has_live_children = false;
+ parent_tuples = 0;
parent_rows = 0;
parent_size = 0;
nattrs = rel->max_attr - rel->min_attr + 1;
@@ -1161,6 +1172,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
*/
Assert(childrel->rows > 0);
+ parent_tuples += childrel->tuples;
parent_rows += childrel->rows;
parent_size += childrel->reltarget->width * childrel->rows;
@@ -1207,17 +1219,12 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
int i;
Assert(parent_rows > 0);
+ rel->tuples = parent_tuples;
rel->rows = parent_rows;
rel->reltarget->width = rint(parent_size / parent_rows);
for (i = 0; i < nattrs; i++)
rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
- /*
- * Set "raw tuples" count equal to "rows" for the appendrel; needed
- * because some places assume rel->tuples is valid for any baserel.
- */
- rel->tuples = parent_rows;
-
/*
* Note that we leave rel->pages as zero; this is important to avoid
* double-counting the appendrel tree in total_table_pages.
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index dbf3835cb1..420b6ae599 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -3666,3 +3666,33 @@ UPDATE errtst_parent SET partid = 30, data = data + 10 WHERE partid = 20;
ERROR: no partition of relation "errtst_parent" found for row
DETAIL: Partition key of the failing row contains (partid) = (30).
DROP TABLE errtst_parent;
+-- Check that we have the correct tuples estimate for an appendrel
+create table tuplesest_parted (a int, b int, c float) partition by range(a);
+create table tuplesest_parted1 partition of tuplesest_parted for values from (0) to (100);
+create table tuplesest_parted2 partition of tuplesest_parted for values from (100) to (200);
+create table tuplesest_tab (a int, b int);
+insert into tuplesest_parted select i%200, i%300, i%400 from generate_series(1, 1000)i;
+insert into tuplesest_tab select i, i from generate_series(1, 100)i;
+analyze tuplesest_parted;
+analyze tuplesest_tab;
+explain (costs off)
+select * from tuplesest_tab join
+ (select b from tuplesest_parted where c < 100 group by b) sub
+ on tuplesest_tab.a = sub.b;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Hash Join
+ Hash Cond: (tuplesest_parted.b = tuplesest_tab.a)
+ -> HashAggregate
+ Group Key: tuplesest_parted.b
+ -> Append
+ -> Seq Scan on tuplesest_parted1 tuplesest_parted_1
+ Filter: (c < '100'::double precision)
+ -> Seq Scan on tuplesest_parted2 tuplesest_parted_2
+ Filter: (c < '100'::double precision)
+ -> Hash
+ -> Seq Scan on tuplesest_tab
+(11 rows)
+
+drop table tuplesest_parted;
+drop table tuplesest_tab;
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 49aae426f3..30fba16231 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -1483,3 +1483,24 @@ UPDATE errtst_parent SET partid = 0, data = data + 10 WHERE partid = 20;
UPDATE errtst_parent SET partid = 30, data = data + 10 WHERE partid = 20;
DROP TABLE errtst_parent;
+
+-- Check that we have the correct tuples estimate for an appendrel
+create table tuplesest_parted (a int, b int, c float) partition by range(a);
+create table tuplesest_parted1 partition of tuplesest_parted for values from (0) to (100);
+create table tuplesest_parted2 partition of tuplesest_parted for values from (100) to (200);
+
+create table tuplesest_tab (a int, b int);
+
+insert into tuplesest_parted select i%200, i%300, i%400 from generate_series(1, 1000)i;
+insert into tuplesest_tab select i, i from generate_series(1, 100)i;
+
+analyze tuplesest_parted;
+analyze tuplesest_tab;
+
+explain (costs off)
+select * from tuplesest_tab join
+ (select b from tuplesest_parted where c < 100 group by b) sub
+ on tuplesest_tab.a = sub.b;
+
+drop table tuplesest_parted;
+drop table tuplesest_tab;
--
2.43.0