planner chooses incremental but not the best one
Dear Hackers,
I've come across a behaviour of the planner I can't explain.
After a migration from 11 to 15 (on RDS) we noticed a degradation in
response time on a query, it went from a few seconds to ten minutes.
A vacuum(analyze) has been realized to be sure that all is clean.
The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
`incremental sort` with an index corresponding to the ORDER BY clause
(on the created_at column). The previous v11 plan used a more efficient
index.
By deactivating incremental sort, response times in v15 are equal to v11
one.
Here is the query
SELECT inputdocum0_.id AS col_0_0_
FROM document_management_services.input_document inputdocum0_
WHERE (inputdocum0_.indexation_domain_id in
('2d29daf6-e151-479a-a52a-78b08bb3009d'))
AND (inputdocum0_.indexation_subsidiary_id in
('9f9df402-f70b-40d9-b283-a3c35232469a'))
AND (inputdocum0_.locked_at IS NULL)
AND (inputdocum0_.locked_by_app IS NULL)
AND (inputdocum0_.locked_by_user IS NULL)
AND (inputdocum0_.lock_time_out IS NULL)
AND inputdocum0_.archiving_state<> 'DESTROYED'
AND (inputdocum0_.creation_state in ('READY'))
AND inputdocum0_.active_content=true
AND (inputdocum0_.processing_state in ('PENDING_INDEXATION'))
ORDER BY inputdocum0_.created_at ASC,
inputdocum0_.reception_id ASC,
inputdocum0_.reception_order ASC
LIMIT 50 ;
Here are some details, the table `input_document` is partionned by hash
with 20 partitions with a lot of indexes
Indexes:
"input_document_pkey" PRIMARY KEY, btree (id)
"input_document_api_version_idx" btree (api_version) INVALID
"input_document_created_at_idx" btree (created_at)
"input_document_created_by_user_profile_idx" btree
(created_by_user_profile)
"input_document_dashboard_idx" btree (processing_state,
indexation_family_id, indexation_group_id, reception_id) INCLUDE
(active_content, archiving_state, creation_state) WHERE active_content =
true AND archiving_state <> 'DESTROYED'::text AND creation_state <>
'PENDING'::text
"input_document_fts_description_idx" gin
(to_tsvector('simple'::regconfig, description))
"input_document_fts_insured_firstname_idx" gin
(to_tsvector('simple'::regconfig, indexation_insured_firstname))
"input_document_fts_insured_lastname_idx" gin
(to_tsvector('simple'::regconfig, indexation_insured_lastname))
"input_document_indexation_activity_id_idx" btree
(indexation_activity_id)
"input_document_indexation_agency_id_idx" btree (indexation_agency_id)
"input_document_indexation_distributor_id_idx" btree
(indexation_distributor_id)
"input_document_indexation_domain_id_idx" btree (indexation_domain_id)
"input_document_indexation_family_id_idx" btree (indexation_family_id)
"input_document_indexation_group_id_idx" btree (indexation_group_id)
"input_document_indexation_insurer_id_idx" btree
(indexation_insurer_id)
"input_document_indexation_nature_id_idx" btree (indexation_nature_id)
"input_document_indexation_reference_idx" btree (indexation_reference)
"input_document_indexation_subsidiary_id_idx" btree
(indexation_subsidiary_id)
"input_document_indexation_warranty_id_idx" btree
(indexation_warranty_id)
"input_document_locked_by_user_idx" btree (locked_by_user)
"input_document_modified_at_idx" btree (modified_at)
"input_document_modified_by_user_profile_idx" btree
(modified_by_user_profile)
"input_document_processing_state_idx" btree (processing_state)
"input_document_stock_idx" btree (active_content, archiving_state,
creation_state, processing_state) WHERE active_content AND
archiving_state <> 'DESTROYED'::text AND creation_state <>
'PENDING'::text AND (processing_state = ANY
('{PENDING_PROCESSING,PENDING_INDEXATION,READY}'::text[]))
"input_dom_act_pi_idx" btree (indexation_activity_id,
indexation_domain_id) WHERE processing_state = 'PENDING_INDEXATION'::text
"input_dom_act_pp_idx" btree (indexation_activity_id,
indexation_domain_id) WHERE processing_state = 'PENDING_PROCESSING'::text
"input_dom_act_sub_idx" btree (indexation_activity_id,
indexation_domain_id, indexation_subsidiary_id)
"input_reception_id_created_at_idx" btree (reception_id, created_at)
"input_reception_id_reception_order_idx" btree (reception_id,
reception_order)
"operational_perimeter_view_idx" btree (processing_state,
indexation_distributor_id) WHERE processing_state =
'PENDING_PROCESSING'::text
Please find attached the 3 plans
explain_analyse_incremental_off.txt with enable_incremental_sort to off
explain_analyse_incremental_on.txt with enable_incremental_sort to on
explain_analyse_incremental_on_limit5000 with enable_incremental_sort to
on but with increase the limit to 5000, in this case plan choose don't
use `Incremental Sort`
The point that I don't understand in the plan (incremental_sort to on)
is the top level one, the limit cost doesn't seem right.
Limit (cost=324.05..16073.82 rows=50 width=44) (actual
time=1663688.290..1663696.151 rows=50 loops=1)
Buffers: shared hit=114672881 read=5725197 dirtied=38564 written=24394
I/O Timings: shared/local read=1481378.069 write=313.574
-> Incremental Sort (cost=324.05..27838050.13 rows=88375 width=44)
(actual time=1663688.289..1663696.144 rows=50 loops=1)
Have you a explaination on the behaviour ?
Best regards
--
Nicolas Lutic
Attachments:
Import Notes
Reply to msg id not found: 09fd08ef-e215-4169-b067-5992470bc0de@loxodata.comReference msg id not found: 09fd08ef-e215-4169-b067-5992470bc0de@loxodata.com
On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic <n.lutic@loxodata.com> wrote:
I've come across a behaviour of the planner I can't explain.
After a migration from 11 to 15 (on RDS) we noticed a degradation in
response time on a query, it went from a few seconds to ten minutes.
A vacuum(analyze) has been realized to be sure that all is clean.
The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
`incremental sort` with an index corresponding to the ORDER BY clause
(on the created_at column). The previous v11 plan used a more efficient
index.By deactivating incremental sort, response times in v15 are equal to v11
one.
I think this issue is caused by under-estimating the startup cost of
incremental sort, which in turn is caused by over-estimating the number
of groups with equal presorted keys by estimate_num_groups().
We can simulate the same issue with the query below.
create table t (a int, b int, c int, d int) partition by range (a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);
insert into t select i%2000, i%1000, i%300, i from
generate_series(1,1000000)i;
create index on t(b);
create index on t(c);
analyze t;
-- by default incremental sort is chosen
explain analyze select * from t where b = 3 order by c, d limit 10;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=143.03..570.89 rows=10 width=16) (actual
time=375.399..375.402 rows=10 loops=1)
-> Incremental Sort (cost=143.03..42671.85 rows=994 width=16) (actual
time=375.397..375.399 rows=10 loops=1)
Sort Key: t.c, t.d
Presorted Key: t.c
Full-sort Groups: 1 Sort Method: top-N heapsort Average Memory:
25kB Peak Memory: 25kB
Pre-sorted Groups: 1 Sort Method: top-N heapsort Average Memory:
25kB Peak Memory: 25kB
-> Merge Append (cost=0.85..42644.84 rows=994 width=16) (actual
time=11.415..375.289 rows=335 loops=1)
Sort Key: t.c
-> Index Scan using tp1_c_idx on tp1 t_1
(cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
rows=168 loops=1)
Filter: (b = 3)
Rows Removed by Filter: 171537
-> Index Scan using tp2_c_idx on tp2 t_2
(cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
rows=168 loops=1)
Filter: (b = 3)
Rows Removed by Filter: 171534
Planning Time: 0.501 ms
Execution Time: 375.477 ms
(16 rows)
-- disable incremental sort
set enable_incremental_sort to off;
SET
explain analyze select * from t where b = 3 order by c, d limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2577.51..2577.53 rows=10 width=16) (actual time=2.928..2.930
rows=10 loops=1)
-> Sort (cost=2577.51..2579.99 rows=994 width=16) (actual
time=2.925..2.927 rows=10 loops=1)
Sort Key: t.c, t.d
Sort Method: top-N heapsort Memory: 25kB
-> Append (cost=8.28..2556.03 rows=994 width=16) (actual
time=0.627..2.670 rows=1000 loops=1)
-> Bitmap Heap Scan on tp1 t_1 (cost=8.28..1276.62
rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
Recheck Cond: (b = 3)
Heap Blocks: exact=500
-> Bitmap Index Scan on tp1_b_idx (cost=0.00..8.16
rows=498 width=0) (actual time=0.330..0.330 rows=500 loops=1)
Index Cond: (b = 3)
-> Bitmap Heap Scan on tp2 t_2 (cost=8.27..1274.43
rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
Recheck Cond: (b = 3)
Heap Blocks: exact=500
-> Bitmap Index Scan on tp2_b_idx (cost=0.00..8.14
rows=496 width=0) (actual time=0.093..0.093 rows=500 loops=1)
Index Cond: (b = 3)
Planning Time: 0.481 ms
Execution Time: 3.031 ms
(17 rows)
As we can see the execution time is 375.477 ms by default and 3.031 ms
if we disable incremental sort.
When we calculate the cost of incremental sort, we need to estimate the
number of groups into which the relation is divided by the presorted
keys, and then calculate the cost of sorting a single group. If we
over-estimate the number of groups, the startup cost of incremental sort
would be under-estimated. In the first plan above, the number of groups
with equal 't.c' is estimated to be 300 by estimate_num_groups(), but is
actually 3 after applying the restriction 'b = 3'. As a result, the
startup cost of the incremental sort is estimated to be 143.03, but is
actually 14222.68. That's why the planner mistakenly thinks the
increment sort path is the cheaper one.
It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.
In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.
Any thoughts on how to improve this?
Thanks
Richard
On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux@gmail.com> wrote:
It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.
I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.
And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3. As I experimented, this can improve
the estimate for appendrels. For instance,
create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);
insert into t select i%2000, (100000 * random())::int, random() from
generate_series(1,1000000) i;
analyze t;
explain analyze select b from t where c < 0.1 group by b;
-- on master
HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)
-- on patched
HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)
With the patch the estimate for the number of distinct 'b' values is
more accurate.
BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?
Thanks
Richard
Attachments:
v1-0001-Adjust-tuples-estimate-for-appendrel.patchapplication/octet-stream; name=v1-0001-Adjust-tuples-estimate-for-appendrel.patchDownload
From 63f7586af41225d3abf4612dc74c6fcbd5f78cc7 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 15 Dec 2023 15:07:08 +0800
Subject: [PATCH v1] Adjust tuples estimate for appendrel
---
src/backend/optimizer/path/allpaths.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 67921a0826..290e0344a6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -949,6 +949,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
{
int parentRTindex = rti;
bool has_live_children;
+ double parent_tuples;
double parent_rows;
double parent_size;
double *parent_attrsizes;
@@ -986,6 +987,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
* have zero rows and/or width, if they were excluded by constraints.
*/
has_live_children = false;
+ parent_tuples = 0;
parent_rows = 0;
parent_size = 0;
nattrs = rel->max_attr - rel->min_attr + 1;
@@ -1152,6 +1154,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
*/
Assert(childrel->rows > 0);
+ parent_tuples += childrel->tuples;
parent_rows += childrel->rows;
parent_size += childrel->reltarget->width * childrel->rows;
@@ -1204,10 +1207,10 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
/*
- * Set "raw tuples" count equal to "rows" for the appendrel; needed
- * because some places assume rel->tuples is valid for any baserel.
+ * Set "raw tuples" count for the appendrel; needed because some places
+ * assume rel->tuples is valid for any baserel.
*/
- rel->tuples = parent_rows;
+ rel->tuples = parent_tuples;
/*
* Note that we leave rel->pages as zero; this is important to avoid
--
2.31.0
On 12/15/23 09:58, Richard Guo wrote:
On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux@gmail.com
<mailto:guofenglinux@gmail.com>> wrote:It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.
Yeah, seems like that's the right thing to do. FWIW I've been often
confused by these fields, because we use tuples and rows as synonyms,
but in this particular case that's not the same. I wonder if this is
just a manifestation of this confusion.
And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.
I don't follow. Why wouldn't it be using the new formula even without
your patch? (using the "wrong" value, ofc, but the same formula).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
Yeah, seems like that's the right thing to do. FWIW I've been often
confused by these fields, because we use tuples and rows as synonyms,
but in this particular case that's not the same. I wonder if this is
just a manifestation of this confusion.
For tables, one is the raw number of rows on-disk and the other is the
number of rows predicted to pass the relation's quals. For something
like an appendrel that doesn't enforce any quals (today anyway), they
should probably be the same; but you need to be sure you're adding
up the right numbers from the inputs.
regards, tom lane
On 12/14/23 11:02, Richard Guo wrote:
On Tue, Dec 12, 2023 at 4:40 PM Nicolas Lutic <n.lutic@loxodata.com
<mailto:n.lutic@loxodata.com>> wrote:I've come across a behaviour of the planner I can't explain.
After a migration from 11 to 15 (on RDS) we noticed a degradation in
response time on a query, it went from a few seconds to ten minutes.
A vacuum(analyze) has been realized to be sure that all is clean.
The 'explain analyze' shows us a change of plan. Postgresql 15 chooses
`incremental sort` with an index corresponding to the ORDER BY clause
(on the created_at column). The previous v11 plan used a more efficient
index.By deactivating incremental sort, response times in v15 are equal to
v11
one.I think this issue is caused by under-estimating the startup cost of
incremental sort, which in turn is caused by over-estimating the number
of groups with equal presorted keys by estimate_num_groups().We can simulate the same issue with the query below.
create table t (a int, b int, c int, d int) partition by range (a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);insert into t select i%2000, i%1000, i%300, i from
generate_series(1,1000000)i;create index on t(b);
create index on t(c);analyze t;
-- by default incremental sort is chosen
explain analyze select * from t where b = 3 order by c, d limit 10;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=143.03..570.89 rows=10 width=16) (actual
time=375.399..375.402 rows=10 loops=1)
-> Incremental Sort (cost=143.03..42671.85 rows=994 width=16)
(actual time=375.397..375.399 rows=10 loops=1)
Sort Key: t.c, t.d
Presorted Key: t.c
Full-sort Groups: 1 Sort Method: top-N heapsort Average
Memory: 25kB Peak Memory: 25kB
Pre-sorted Groups: 1 Sort Method: top-N heapsort Average
Memory: 25kB Peak Memory: 25kB
-> Merge Append (cost=0.85..42644.84 rows=994 width=16)
(actual time=11.415..375.289 rows=335 loops=1)
Sort Key: t.c
-> Index Scan using tp1_c_idx on tp1 t_1
(cost=0.42..21318.39 rows=498 width=16) (actual time=6.666..189.356
rows=168 loops=1)
Filter: (b = 3)
Rows Removed by Filter: 171537
-> Index Scan using tp2_c_idx on tp2 t_2
(cost=0.42..21316.50 rows=496 width=16) (actual time=4.745..185.870
rows=168 loops=1)
Filter: (b = 3)
Rows Removed by Filter: 171534
Planning Time: 0.501 ms
Execution Time: 375.477 ms
(16 rows)-- disable incremental sort
set enable_incremental_sort to off;
SETexplain analyze select * from t where b = 3 order by c, d limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2577.51..2577.53 rows=10 width=16) (actual
time=2.928..2.930 rows=10 loops=1)
-> Sort (cost=2577.51..2579.99 rows=994 width=16) (actual
time=2.925..2.927 rows=10 loops=1)
Sort Key: t.c, t.d
Sort Method: top-N heapsort Memory: 25kB
-> Append (cost=8.28..2556.03 rows=994 width=16) (actual
time=0.627..2.670 rows=1000 loops=1)
-> Bitmap Heap Scan on tp1 t_1 (cost=8.28..1276.62
rows=498 width=16) (actual time=0.625..1.688 rows=500 loops=1)
Recheck Cond: (b = 3)
Heap Blocks: exact=500
-> Bitmap Index Scan on tp1_b_idx
(cost=0.00..8.16 rows=498 width=0) (actual time=0.330..0.330 rows=500
loops=1)
Index Cond: (b = 3)
-> Bitmap Heap Scan on tp2 t_2 (cost=8.27..1274.43
rows=496 width=16) (actual time=0.178..0.876 rows=500 loops=1)
Recheck Cond: (b = 3)
Heap Blocks: exact=500
-> Bitmap Index Scan on tp2_b_idx
(cost=0.00..8.14 rows=496 width=0) (actual time=0.093..0.093 rows=500
loops=1)
Index Cond: (b = 3)
Planning Time: 0.481 ms
Execution Time: 3.031 ms
(17 rows)As we can see the execution time is 375.477 ms by default and 3.031 ms
if we disable incremental sort.When we calculate the cost of incremental sort, we need to estimate the
number of groups into which the relation is divided by the presorted
keys, and then calculate the cost of sorting a single group. If we
over-estimate the number of groups, the startup cost of incremental sort
would be under-estimated. In the first plan above, the number of groups
with equal 't.c' is estimated to be 300 by estimate_num_groups(), but is
actually 3 after applying the restriction 'b = 3'. As a result, the
startup cost of the incremental sort is estimated to be 143.03, but is
actually 14222.68. That's why the planner mistakenly thinks the
increment sort path is the cheaper one.
I haven't done any debugging on this, but this seems plausible. In a
way, it seems like a combination of three issues - assumption that
"LIMIT n" has cost proportional to n/N, group by estimate of correlated
columns, and assumption of independence.
It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.Any thoughts on how to improve this?
Oh! Now I see what you meant by using the new formula in 84f9a35e3
depending on how we sum tuples. I agree that seems like the right thing.
I'm not sure it'll actually help with the issue, though - if I apply the
patch, the plan does not actually change (and the cost changes just a
little bit).
I looked at this only very briefly, but I believe it's due to the
assumption of independence I mentioned earlier - we end up using the new
formula introduced in 84f9a35e3, but it assumes it assumes the
selectivity and number of groups are independent. But that'd not the
case here, because the groups are very clearly correlated (with the
condition on "b").
If that's the case, I'm not sure how to fix this :-(
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:
Oh! Now I see what you meant by using the new formula in 84f9a35e3
depending on how we sum tuples. I agree that seems like the right thing.I'm not sure it'll actually help with the issue, though - if I apply the
patch, the plan does not actually change (and the cost changes just a
little bit).I looked at this only very briefly, but I believe it's due to the
assumption of independence I mentioned earlier - we end up using the new
formula introduced in 84f9a35e3, but it assumes it assumes the
selectivity and number of groups are independent. But that'd not the
case here, because the groups are very clearly correlated (with the
condition on "b").
You're right. The patch allows us to adjust the estimate of distinct
values for appendrels using the new formula introduced in 84f9a35e3.
But if the restrictions are correlated with the grouping expressions,
the new formula does not behave well. That's why the patch does not
help in case [1]/messages/by-id/CAMbWs4-FtsJ0dQUv6g==XR_gsq=Fj9oiydW6gbqwQ_wrbU0osw@mail.gmail.com, where 'b' and 'c' are correlated.
OTOH, if the restrictions are not correlated with the grouping
expressions, the new formula would perform quite well. And in this case
the patch would help a lot, as shown in [2]/messages/by-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com where estimate_num_groups()
gives a much more accurate estimate with the help of this patch.
So this patch could be useful in certain situations. I'm wondering if
we should at least have this patch (if it is right).
If that's the case, I'm not sure how to fix this :-(
The commit message of 84f9a35e3 says
This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.
Maybe this is something we can consider trying. But anyhow this is not
an easy task I suppose.
[1]: /messages/by-id/CAMbWs4-FtsJ0dQUv6g==XR_gsq=Fj9oiydW6gbqwQ_wrbU0osw@mail.gmail.com
/messages/by-id/CAMbWs4-FtsJ0dQUv6g==XR_gsq=Fj9oiydW6gbqwQ_wrbU0osw@mail.gmail.com
[2]: /messages/by-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com
/messages/by-id/CAMbWs4-ocromEKMtVDH3RBMuAJQaQDK0qi4k6zOuvpOnMWZauw@mail.gmail.com
Thanks
Richard
On 12/18/23 11:40, Richard Guo wrote:
On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:Oh! Now I see what you meant by using the new formula in 84f9a35e3
depending on how we sum tuples. I agree that seems like the right thing.I'm not sure it'll actually help with the issue, though - if I apply the
patch, the plan does not actually change (and the cost changes just a
little bit).I looked at this only very briefly, but I believe it's due to the
assumption of independence I mentioned earlier - we end up using the new
formula introduced in 84f9a35e3, but it assumes it assumes the
selectivity and number of groups are independent. But that'd not the
case here, because the groups are very clearly correlated (with the
condition on "b").You're right. The patch allows us to adjust the estimate of distinct
values for appendrels using the new formula introduced in 84f9a35e3.
But if the restrictions are correlated with the grouping expressions,
the new formula does not behave well. That's why the patch does not
help in case [1], where 'b' and 'c' are correlated.OTOH, if the restrictions are not correlated with the grouping
expressions, the new formula would perform quite well. And in this case
the patch would help a lot, as shown in [2] where estimate_num_groups()
gives a much more accurate estimate with the help of this patch.So this patch could be useful in certain situations. I'm wondering if
we should at least have this patch (if it is right).
I do agree the patch seems to do the right thing, and it's worth pushing
on it's own.
If that's the case, I'm not sure how to fix this :-(
The commit message of 84f9a35e3 says
This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.Maybe this is something we can consider trying. But anyhow this is not
an easy task I suppose.
Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-)
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
ndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as
1 * ndistinct(b,c)
I'm well aware this is only a very trivial example, and for more complex
examples it's likely way more complicated. But hopefully it illustrates
the general idea.
The other idea would be to maybe look at multi-column MCV, and try using
it to deduce cross-column correlation too (it could be more accurate for
arbitrary predicates).
And finally, we might try being a bit more pessimistic and look at what
the "worst case" behavior could be. So for example instead of trying to
estimate the real number of groups, we'd ask "What's the minimum number
of groups we're likely to get?". And use that to cost the incremental
sort. But I don't think we do this elsewhere, and I'm not sure we want
to start with this risk-based approach here. It might be OK, because we
usually expect the incremental sort to be much cheaper, ...
If this something would be interested in exploring? I don't have
capacity to work on this myself, but I'd be available for reviews,
feedback and so on.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
The possible solution of one scenario I can come up with so far is the
query's predicate columns and group columns belonging to one table .
For a query that contains where clause, perhaps num_groups could be
estimated according to the following formula.
num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where
clause * ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) / ndistinct(pred_col_1, pred_col_2, ...
pred_col_n).
ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where clause =
ndistinct(pred_var_1, pred_var_2, ... pred_var_n) * selectivity of rel.
pred_col_n belong to the columns involved in the where clause and
sort_col_m belong to the columns involved in the group by clause.
The reason for multiplying by selectivity of rel directly is that the
selectivity of rel depends on only pred_col not sort_col. So the above
formula can be simplified as follows.
num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) * selectivity of rel.
The correctness of the above formula depends on the following conditions.
-
ndistinct(pred_col_1, pred_col_2, ... pred_col_n)* ndistinct(pred_col_1,
pred_col_2, ... pred_col_n, sort_col_1, sort_col_2, ... sort_col_m)
statistics already exist, and need be accurate.
-
Both pred_col_n and sort_col_m are uniformly distributed, if not,
statistics such as mcv are needed for correction.
-
The tuples of rel are the number of total tuples of the table , not the
number of filtered tuples.
After experimentation, in the scenario mentioned in previous thread. The
estimate num_groups is 3, the accuracy of result strongly relies on the
uniform distribution of b, which makes ndistinct(pred_col_1, pred_col_2,
... pred_col_n) with where clause could be able to estimated accurately.
I'd like to hear your opinions.
Regards.
ywgrit.
Tomas Vondra <tomas.vondra@enterprisedb.com> 于2023年12月18日周一 20:53写道:
Show quoted text
On 12/18/23 11:40, Richard Guo wrote:
On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:Oh! Now I see what you meant by using the new formula in 84f9a35e3
depending on how we sum tuples. I agree that seems like the rightthing.
I'm not sure it'll actually help with the issue, though - if I apply
the
patch, the plan does not actually change (and the cost changes just a
little bit).I looked at this only very briefly, but I believe it's due to the
assumption of independence I mentioned earlier - we end up using thenew
formula introduced in 84f9a35e3, but it assumes it assumes the
selectivity and number of groups are independent. But that'd not the
case here, because the groups are very clearly correlated (with the
condition on "b").You're right. The patch allows us to adjust the estimate of distinct
values for appendrels using the new formula introduced in 84f9a35e3.
But if the restrictions are correlated with the grouping expressions,
the new formula does not behave well. That's why the patch does not
help in case [1], where 'b' and 'c' are correlated.OTOH, if the restrictions are not correlated with the grouping
expressions, the new formula would perform quite well. And in this case
the patch would help a lot, as shown in [2] where estimate_num_groups()
gives a much more accurate estimate with the help of this patch.So this patch could be useful in certain situations. I'm wondering if
we should at least have this patch (if it is right).I do agree the patch seems to do the right thing, and it's worth pushing
on it's own.If that's the case, I'm not sure how to fix this :-(
The commit message of 84f9a35e3 says
This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.Maybe this is something we can consider trying. But anyhow this is not
an easy task I suppose.Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-)
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something fromndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as1 * ndistinct(b,c)
I'm well aware this is only a very trivial example, and for more complex
examples it's likely way more complicated. But hopefully it illustrates
the general idea.The other idea would be to maybe look at multi-column MCV, and try using
it to deduce cross-column correlation too (it could be more accurate for
arbitrary predicates).And finally, we might try being a bit more pessimistic and look at what
the "worst case" behavior could be. So for example instead of trying to
estimate the real number of groups, we'd ask "What's the minimum number
of groups we're likely to get?". And use that to cost the incremental
sort. But I don't think we do this elsewhere, and I'm not sure we want
to start with this risk-based approach here. It might be OK, because we
usually expect the incremental sort to be much cheaper, ...If this something would be interested in exploring? I don't have
capacity to work on this myself, but I'd be available for reviews,
feedback and so on.regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 15/12/2023 09:58, Richard Guo wrote:
On Thu, Dec 14, 2023 at 6:02 PM Richard Guo <guofenglinux@gmail.com>
wrote:It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached? I believe this aligns more closely with reality.And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3. As I experimented, this can improve
the estimate for appendrels. For instance,create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);insert into t select i%2000, (100000 * random())::int, random() from
generate_series(1,1000000) i;
analyze t;explain analyze select b from t where c < 0.1 group by b;
-- on master
HashAggregate (cost=18659.28..19598.74 rows=93946 width=4)
(actual time=220.760..234.439 rows=63224 loops=1)-- on patched
HashAggregate (cost=18659.28..19294.25 rows=63497 width=4)
(actual time=235.161..250.023 rows=63224 loops=1)With the patch the estimate for the number of distinct 'b' values is
more accurate.BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?
Hi,
thank you for the patch ; I've tried it and it works with the scenario
you provide.
As Nicolas's co-worker, I've been involved in this case, but,
unfortunately, we're not able to test the patch with the actual data for
the moment, but I'll ask a dump to the real owner.
About the regression test, I don't know how to implement it either.
best regards,
--
Sébastien
Hi,Tomas
Recently, I looked at papers related to estimation of cardinarity with
selection. I may be biased towards the scheme provided by the paper
"Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries
and Event Reports". This paper uses distinct sampling as opposed to the
current uniform sampling, and the main differences between the two are as
follows.
1)It not only counts the distincts on each column or multiple columns, but
also saves some rows corresponding to each distinct value, i.e., it saves
some part of the rows of the original relation as samples. The purpose of
saving these rows is to accommodate restrictions on the queries, such as
where clauses.
2)The queries are executed on the samples, and the result of the execution
is used as the statistical value of cardinarity.
The advantages of this paper over existing practices are as follows.
1)The samples collected can be applied to arbitrary predicates, e.g.
predicates that are correlated with the columns of group by clauses.
2)The accuracy is very high, and in some scenarios, the statistical error
can be minimized by hundreds of times compared to uniform sampling.
However, the scheme provided in this paper also has some defects, as
mentioned above, the scheme relies on the collected samples, which will
lead to a significant increase in the storage overhead of statistical
information.
I'd like to hear your opinions.
ywgrit.
ywgrit <yw987194828@gmail.com> 于2023年12月22日周五 16:20写道:
Show quoted text
The possible solution of one scenario I can come up with so far is the
query's predicate columns and group columns belonging to one table .For a query that contains where clause, perhaps num_groups could be
estimated according to the following formula.num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where
clause * ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) / ndistinct(pred_col_1, pred_col_2, ...
pred_col_n).ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where clause =
ndistinct(pred_var_1, pred_var_2, ... pred_var_n) * selectivity of rel.pred_col_n belong to the columns involved in the where clause and
sort_col_m belong to the columns involved in the group by clause.The reason for multiplying by selectivity of rel directly is that the
selectivity of rel depends on only pred_col not sort_col. So the above
formula can be simplified as follows.num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) * selectivity of rel.The correctness of the above formula depends on the following conditions.
-
ndistinct(pred_col_1, pred_col_2, ... pred_col_n)*
ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1, sort_col_2,
... sort_col_m) statistics already exist, and need be accurate.
-Both pred_col_n and sort_col_m are uniformly distributed, if not,
statistics such as mcv are needed for correction.
-The tuples of rel are the number of total tuples of the table , not
the number of filtered tuples.After experimentation, in the scenario mentioned in previous thread. The
estimate num_groups is 3, the accuracy of result strongly relies on the
uniform distribution of b, which makes ndistinct(pred_col_1, pred_col_2,
... pred_col_n) with where clause could be able to estimated accurately.I'd like to hear your opinions.
Regards.
ywgrit.
Tomas Vondra <tomas.vondra@enterprisedb.com> 于2023年12月18日周一 20:53写道:
On 12/18/23 11:40, Richard Guo wrote:
On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:Oh! Now I see what you meant by using the new formula in 84f9a35e3
depending on how we sum tuples. I agree that seems like the rightthing.
I'm not sure it'll actually help with the issue, though - if I
apply the
patch, the plan does not actually change (and the cost changes just
a
little bit).
I looked at this only very briefly, but I believe it's due to the
assumption of independence I mentioned earlier - we end up usingthe new
formula introduced in 84f9a35e3, but it assumes it assumes the
selectivity and number of groups are independent. But that'd not the
case here, because the groups are very clearly correlated (with the
condition on "b").You're right. The patch allows us to adjust the estimate of distinct
values for appendrels using the new formula introduced in 84f9a35e3.
But if the restrictions are correlated with the grouping expressions,
the new formula does not behave well. That's why the patch does not
help in case [1], where 'b' and 'c' are correlated.OTOH, if the restrictions are not correlated with the grouping
expressions, the new formula would perform quite well. And in this case
the patch would help a lot, as shown in [2] where estimate_num_groups()
gives a much more accurate estimate with the help of this patch.So this patch could be useful in certain situations. I'm wondering if
we should at least have this patch (if it is right).I do agree the patch seems to do the right thing, and it's worth pushing
on it's own.If that's the case, I'm not sure how to fix this :-(
The commit message of 84f9a35e3 says
This could possibly be improved upon in the future by identifying
correlated restrictions and using a hybrid of the old and new
formulae.Maybe this is something we can consider trying. But anyhow this is not
an easy task I suppose.Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-)
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something fromndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as1 * ndistinct(b,c)
I'm well aware this is only a very trivial example, and for more complex
examples it's likely way more complicated. But hopefully it illustrates
the general idea.The other idea would be to maybe look at multi-column MCV, and try using
it to deduce cross-column correlation too (it could be more accurate for
arbitrary predicates).And finally, we might try being a bit more pessimistic and look at what
the "worst case" behavior could be. So for example instead of trying to
estimate the real number of groups, we'd ask "What's the minimum number
of groups we're likely to get?". And use that to cost the incremental
sort. But I don't think we do this elsewhere, and I'm not sure we want
to start with this risk-based approach here. It might be OK, because we
usually expect the incremental sort to be much cheaper, ...If this something would be interested in exploring? I don't have
capacity to work on this myself, but I'd be available for reviews,
feedback and so on.regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 18/12/2023 19:53, Tomas Vondra wrote:
On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something fromndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as1 * ndistinct(b,c)
Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?
Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?
--
regards,
Andrei Lepikhov
Postgres Professional
On 15/12/2023 15:58, Richard Guo wrote:
With the patch the estimate for the number of distinct 'b' values is
more accurate.
+1 to commit this patch.
It looks good and resolves kind of a bug in the code.
BTW, this patch does not change any existing regression test results. I
attempted to devise a regression test that shows how this change can
improve query plans, but failed. Should I try harder to find such a
test case?
The test that was changed refers to different features. Its behaviour
can be changed in. the future, and mask testing of this code. IMO, you
should add a test directly checking appendrel->tuples correction.
--
regards,
Andrei Lepikhov
Postgres Professional
On 2/15/24 07:50, Andrei Lepikhov wrote:
On 18/12/2023 19:53, Tomas Vondra wrote:
On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something fromndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as1 * ndistinct(b,c)
Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?
Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).
Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?
I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 15/2/2024 18:10, Tomas Vondra wrote:
On 2/15/24 07:50, Andrei Lepikhov wrote:
On 18/12/2023 19:53, Tomas Vondra wrote:
On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something fromndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as1 * ndistinct(b,c)
Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).
Thanks, I got it. I just think how to implement such techniques with
extensions just to test the idea in action. In the case of GROUP-BY we
can use path hook, of course. But what if to invent a hook on clauselist
estimation?
Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?
Yes, exactly. I've thought about looking into baserestrictinfos and, if
group-by references a subquery targetlist, into subqueries too.
--
regards,
Andrei Lepikhov
Postgres Professional
On 2/15/24 13:45, Andrei Lepikhov wrote:
On 15/2/2024 18:10, Tomas Vondra wrote:
On 2/15/24 07:50, Andrei Lepikhov wrote:
On 18/12/2023 19:53, Tomas Vondra wrote:
On 12/18/23 11:40, Richard Guo wrote:
The challenge is where to get usable information about correlation
between columns. I only have a couple very rought ideas of what might
try. For example, if we have multi-column ndistinct statistics, we
might
look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something fromndistinct(b,c,d) / ndistinct(b,c)
If we know how many distinct values we have for the predicate
column, we
could then estimate the number of groups. I mean, we know that for the
restriction "WHERE b = 3" we only have 1 distinct value, so we could
estimate the number of groups as1 * ndistinct(b,c)
Did you mean here ndistinct(c,d) and the formula:
ndistinct(b,c,d) / ndistinct(c,d) ?Yes, I think that's probably a more correct ... Essentially, the idea is
to estimate the change in number of distinct groups after adding a
column (or restricting it in some way).Thanks, I got it. I just think how to implement such techniques with
extensions just to test the idea in action. In the case of GROUP-BY we
can use path hook, of course. But what if to invent a hook on clauselist
estimation?
Maybe.
I have thought about introducing such hook to alter estimation of
clauses, so I'm not opposed to it. Ofc, it depends on where would the
hook be, what would it be allowed to do etc. And as it doesn't exist
yet, it'd be more a "local" improvement to separate the changes into an
extension.
Do you implicitly bear in mind here the necessity of tracking clauses
that were applied to the data up to the moment of grouping?I don't recall what exactly I considered two months ago when writing the
message, but I don't see why we would need to track that beyond what we
already have. Shouldn't it be enough for the grouping to simply inspect
the conditions on the lower levels?Yes, exactly. I've thought about looking into baserestrictinfos and, if
group-by references a subquery targetlist, into subqueries too.
True. Something like that.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company