MergeJoin beats HashJoin in the case of multiple hash clauses
Hi, all.
Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.
The attached patch shows a sketch of the solution.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
fix_bucketsize.difftext/plain; charset=UTF-8; name=fix_bucketsize.diffDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a1..26f26d6a40 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4033,11 +4033,12 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
thismcvfreq = restrictinfo->left_mcvfreq;
}
- if (innerbucketsize > thisbucketsize)
- innerbucketsize = thisbucketsize;
- if (innermcvfreq > thismcvfreq)
- innermcvfreq = thismcvfreq;
+ innerbucketsize *= thisbucketsize;
+ innermcvfreq *= thismcvfreq;
}
+
+ if (innerbucketsize > virtualbuckets)
+ innerbucketsize = 1.0 / virtualbuckets;
}
/*
Hi!
On 15.06.2023 11:30, Andrey Lepikhov wrote:
Hi, all.
Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm
to Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin
(see q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value
across all buckets instead of multiplying them (or trying to make
multivariate analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between
hash clause components. But as for me, here we should go the same way,
as estimation of groups.
The attached patch shows a sketch of the solution.
This problem is very important.
Honestly, I'm still learning your code and looking for cases on which
cases your patch can affect for the worse or for the better. But I have
already found something that seemed interesting to me. I have found
several other interesting cases where your patch can solve some problem
in order to choose a more correct plan, but in focus on memory consumption.
To make it easier to evaluate, I added a hook to your patch that makes
it easier to switch to your or the original way of estimating the size
of baskets (diff_estimate.diff).
Here are other cases where your fix improves the query plan.
First of all, I changed the way creation of tables are created to look
at the behavior of the query plan in terms of planning and execution time:
DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
FROM generate_series(1,1e5) AS gs;
CREATE TABLE b AS
SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;
SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (actual time=200.872..200.879 rows=0 loops=1)
Hash Cond: ((b.x = a.x) AND (b.y = a.y) AND (b.z = a.z))
-> Seq Scan on b (actual time=0.029..15.946 rows=100000 loops=1)
-> Hash (actual time=97.645..97.649 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5612kB
-> Seq Scan on a (actual time=0.024..17.153 rows=100000 loops=1)
Planning Time: 2.910 ms
Execution Time: 201.949 ms
(8 rows)
SET
QUERY PLAN
---------------------------------------------------------------------------
Merge Join (actual time=687.415..687.416 rows=0 loops=1)
Merge Cond: ((b.y = a.y) AND (b.x = a.x) AND (b.z = a.z))
-> Sort (actual time=462.022..536.716 rows=100000 loops=1)
Sort Key: b.y, b.x, b.z
Sort Method: external merge Disk: 3328kB
-> Seq Scan on b (actual time=0.017..12.326 rows=100000 loops=1)
-> Sort (actual time=111.295..113.196 rows=16001 loops=1)
Sort Key: a.y, a.x, a.z
Sort Method: external sort Disk: 2840kB
-> Seq Scan on a (actual time=0.020..10.129 rows=100000 loops=1)
Planning Time: 0.752 ms
Execution Time: 688.829 ms
(12 rows)
Secondly, I found another case that is not related to the fact that the
planner would prefer to choose merge join rather than hash join, but we
have the opportunity to see that the plan has become better due to the
consumption of less memory, and also takes less planning time.
Here, with the same query, the planning time was reduced by 5 times, and
the number of buckets by 128 times, therefore, memory consumption also
decreased:
DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;
SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Hash Join (cost=20.50..3157.58 rows=8 width=32) (actual
time=95.648..95.651 rows=0 loops=1)
Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND
(b.z = (a.z)::numeric))
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actual
time=0.027..17.980 rows=100000 loops=1)
-> Hash (cost=10.00..10.00 rows=600 width=12) (actual
time=2.046..2.047 rows=600 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12)
(actual time=0.022..0.315 rows=600 loops=1)
Planning Time: 0.631 ms
Execution Time: 95.730 ms
(8 rows)
SET
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3387.00..8621.58 rows=8 width=32) (actual
time=102.873..102.877 rows=0 loops=1)
Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
((a.z)::numeric = b.z))
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual
time=0.014..0.131 rows=600 loops=1)
-> Hash (cost=1637.00..1637.00 rows=100000 width=20) (actual
time=101.920..101.921 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6474kB
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20)
(actual time=0.013..16.349 rows=100000 loops=1)
Planning Time: 0.153 ms
Execution Time: 103.518 ms
(8 rows)
I also give an improvement relative to the left external or right
connection:
DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;
SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;
SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=20.50..3157.58 rows=100000 width=32) (actual
time=1.846..102.264 rows=100000 loops=1)
Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND
(b.z = (a.z)::numeric))
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actual
time=0.041..15.328 rows=100000 loops=1)
-> Hash (cost=10.00..10.00 rows=600 width=12) (actual
time=1.780..1.781 rows=600 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12)
(actual time=0.031..0.252 rows=600 loops=1)
Planning Time: 0.492 ms
Execution Time: 107.609 ms
(8 rows)
SET
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=3387.00..8500.08 rows=100000 width=32) (actual
time=80.919..101.613 rows=100000 loops=1)
Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
((a.z)::numeric = b.z))
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual
time=0.017..0.084 rows=600 loops=1)
-> Hash (cost=1637.00..1637.00 rows=100000 width=20) (actual
time=80.122..80.123 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6474kB
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20)
(actual time=0.015..11.819 rows=100000 loops=1)
Planning Time: 0.194 ms
Execution Time: 104.662 ms
(8 rows)
--
Regards,
Alena Rybakina
Postgres Professional
Attachments:
diff_estimate.difftext/x-patch; charset=UTF-8; name=diff_estimate.diffDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..31771dfba46 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -153,6 +153,7 @@ bool enable_parallel_hash = true;
bool enable_partition_pruning = true;
bool enable_presorted_aggregate = true;
bool enable_async_append = true;
+bool enable_cost_size = true;
typedef struct
{
@@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
thismcvfreq = restrictinfo->left_mcvfreq;
}
+ if (enable_cost_size)
+ {
+ innerbucketsize *= thisbucketsize;
+ innermcvfreq *= thismcvfreq;
+ }
+ else
+ {
if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;
+ }
}
+
+ if (enable_cost_size && innerbucketsize > virtualbuckets)
+ innerbucketsize = 1.0 / virtualbuckets;
}
/*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 71e27f8eb05..ded9ba3b7a9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("set the optimizer coefficient"
+ "so that custom or generic plan is selected more often. "
+ "by default, the value is set to 1, which means that "
+ "the choice of using both depends on the calculated cost"),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_cost_size,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_async_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of async append plans."),
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3a..c79ec12e6d5 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -71,6 +71,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
extern PGDLLIMPORT bool enable_presorted_aggregate;
extern PGDLLIMPORT bool enable_async_append;
extern PGDLLIMPORT int constraint_exclusion;
+extern PGDLLIMPORT bool enable_cost_size;
extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
Does anyone else have an opinion on this patch? It looks promising.
---------------------------------------------------------------------------
On Wed, Jun 28, 2023 at 04:53:06PM +0300, Alena Rybakina wrote:
Hi!
On 15.06.2023 11:30, Andrey Lepikhov wrote:
Hi, all.
Some of my clients use JOIN's with three - four clauses. Quite frequently,
I see complaints on unreasonable switch of JOIN algorithm to Merge Join
instead of Hash Join. Quick research have shown one weak place - estimation
of an average bucket size in final_cost_hashjoin (see q2.sql in attachment)
with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses, and
if each has high value of ndistinct, we will overestimate average size of a
bucket and, as a result, prefer to use Merge Join. As the example in
attachment shows, it leads to worse plan than possible, sometimes
drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.
The attached patch shows a sketch of the solution.This problem is very important.
Honestly, I'm still learning your code and looking for cases on which cases
your patch can affect for the worse or for the better. But I have already found
something that seemed interesting to me. I have found several other interesting
cases where your patch can solve some problem in order to choose a more correct
plan, but in focus on memory consumption.To make it easier to evaluate, I added a hook to your patch that makes it
easier to switch to your or the original way of estimating the size of baskets
(diff_estimate.diff).Here are other cases where your fix improves the query plan.
First of all, I changed the way creation of tables are created to look at the
behavior of the query plan in terms of planning and execution time:DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
FROM generate_series(1,1e5) AS gs;
CREATE TABLE b AS
SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;QUERY PLAN
---------------------------------------------------------------------------
Hash Join (actual time=200.872..200.879 rows=0 loops=1)
Hash Cond: ((b.x = a.x) AND (b.y = a.y) AND (b.z = a.z))
-> Seq Scan on b (actual time=0.029..15.946 rows=100000 loops=1)
-> Hash (actual time=97.645..97.649 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5612kB
-> Seq Scan on a (actual time=0.024..17.153 rows=100000 loops=1)
Planning Time: 2.910 ms
Execution Time: 201.949 ms
(8 rows)SET
QUERY PLAN
---------------------------------------------------------------------------
Merge Join (actual time=687.415..687.416 rows=0 loops=1)
Merge Cond: ((b.y = a.y) AND (b.x = a.x) AND (b.z = a.z))
-> Sort (actual time=462.022..536.716 rows=100000 loops=1)
Sort Key: b.y, b.x, b.z
Sort Method: external merge Disk: 3328kB
-> Seq Scan on b (actual time=0.017..12.326 rows=100000 loops=1)
-> Sort (actual time=111.295..113.196 rows=16001 loops=1)
Sort Key: a.y, a.x, a.z
Sort Method: external sort Disk: 2840kB
-> Seq Scan on a (actual time=0.020..10.129 rows=100000 loops=1)
Planning Time: 0.752 ms
Execution Time: 688.829 ms
(12 rows)Secondly, I found another case that is not related to the fact that the planner
would prefer to choose merge join rather than hash join, but we have the
opportunity to see that the plan has become better due to the consumption of
less memory, and also takes less planning time.Here, with the same query, the planning time was reduced by 5 times, and the
number of buckets by 128 times, therefore, memory consumption also decreased:DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a,b
WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;QUERY
PLAN
----------------------------------------------------------------------------------------------------------------
Hash Join (cost=20.50..3157.58 rows=8 width=32) (actual time=95.648..95.651
rows=0 loops=1)
Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z =
(a.z)::numeric))
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actual time=
0.027..17.980 rows=100000 loops=1)
-> Hash (cost=10.00..10.00 rows=600 width=12) (actual time=2.046..2.047
rows=600 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=
0.022..0.315 rows=600 loops=1)
Planning Time: 0.631 ms
Execution Time: 95.730 ms
(8 rows)SET
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3387.00..8621.58 rows=8 width=32) (actual time=
102.873..102.877 rows=0 loops=1)
Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
((a.z)::numeric = b.z))
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=
0.014..0.131 rows=600 loops=1)
-> Hash (cost=1637.00..1637.00 rows=100000 width=20) (actual time=
101.920..101.921 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6474kB
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actual
time=0.013..16.349 rows=100000 loops=1)
Planning Time: 0.153 ms
Execution Time: 103.518 ms
(8 rows)I also give an improvement relative to the left external or right connection:
DROP TABLE IF EXISTS a,b CASCADE;
CREATE TABLE a AS
SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
FROM generate_series(1,600) AS gs;
CREATE TABLE b AS
SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
FROM generate_series(1,1e5) AS gs;
ANALYZE a,b;SET enable_cost_size = 'on';
EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;SET enable_cost_size = 'off';
EXPLAIN ANALYZE
SELECT * FROM a right join b
on a.x=b.x AND a.y=b.y AND a.z=b.z;QUERY
PLAN
----------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=20.50..3157.58 rows=100000 width=32) (actual time=
1.846..102.264 rows=100000 loops=1)
Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z =
(a.z)::numeric))
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actual time=
0.041..15.328 rows=100000 loops=1)
-> Hash (cost=10.00..10.00 rows=600 width=12) (actual time=1.780..1.781
rows=600 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 34kB
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=
0.031..0.252 rows=600 loops=1)
Planning Time: 0.492 ms
Execution Time: 107.609 ms
(8 rows)SET
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=3387.00..8500.08 rows=100000 width=32) (actual time=
80.919..101.613 rows=100000 loops=1)
Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
((a.z)::numeric = b.z))
-> Seq Scan on a (cost=0.00..10.00 rows=600 width=12) (actual time=
0.017..0.084 rows=600 loops=1)
-> Hash (cost=1637.00..1637.00 rows=100000 width=20) (actual time=
80.122..80.123 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 6474kB
-> Seq Scan on b (cost=0.00..1637.00 rows=100000 width=20) (actual
time=0.015..11.819 rows=100000 loops=1)
Planning Time: 0.194 ms
Execution Time: 104.662 ms
(8 rows)--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index ef475d95a18..31771dfba46 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -153,6 +153,7 @@ bool enable_parallel_hash = true; bool enable_partition_pruning = true; bool enable_presorted_aggregate = true; bool enable_async_append = true; +bool enable_cost_size = true;typedef struct
{
@@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
thismcvfreq = restrictinfo->left_mcvfreq;
}+ if (enable_cost_size) + { + innerbucketsize *= thisbucketsize; + innermcvfreq *= thismcvfreq; + } + else + { if (innerbucketsize > thisbucketsize) innerbucketsize = thisbucketsize; if (innermcvfreq > thismcvfreq) innermcvfreq = thismcvfreq; + } } + + if (enable_cost_size && innerbucketsize > virtualbuckets) + innerbucketsize = 1.0 / virtualbuckets; }/* diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 71e27f8eb05..ded9ba3b7a9 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + { + {"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER, + gettext_noop("set the optimizer coefficient" + "so that custom or generic plan is selected more often. " + "by default, the value is set to 1, which means that " + "the choice of using both depends on the calculated cost"), + NULL, + GUC_EXPLAIN + }, + &enable_cost_size, + true, + NULL, NULL, NULL + }, { {"enable_async_append", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of async append plans."), diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index 6cf49705d3a..c79ec12e6d5 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -71,6 +71,7 @@ extern PGDLLIMPORT bool enable_partition_pruning; extern PGDLLIMPORT bool enable_presorted_aggregate; extern PGDLLIMPORT bool enable_async_append; extern PGDLLIMPORT int constraint_exclusion; +extern PGDLLIMPORT bool enable_cost_size;extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
Hi,
On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:
Hi, all.
Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.
I can reproduce the visitation you want to improve and verify the patch
can do it expectedly. I think this is a right thing to do.
The attached patch shows a sketch of the solution.
I understand that this is a sketch of the solution, but the below changes
still
make me confused.
+ if (innerbucketsize > virtualbuckets)
+ innerbucketsize = 1.0 / virtualbuckets;
innerbucketsize is a fraction of rows in all the rows, so it is between 0.0
and 1.0.
and virtualbuckets is the number of buckets in total (when considered the
mutli
batchs), how is it possible for 'innerbucketsize > virtualbuckets' ? Am
I missing something?
--
Best Regards
Andy Fan
On Mon, Sep 11, 2023, at 11:51 AM, Andy Fan wrote:
Hi,
On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:Hi, all.
Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.I can reproduce the visitation you want to improve and verify the patch
can do it expectedly. I think this is a right thing to do.The attached patch shows a sketch of the solution.
I understand that this is a sketch of the solution, but the below
changes still
make me confused.+ if (innerbucketsize > virtualbuckets) + innerbucketsize = 1.0 / virtualbuckets;innerbucketsize is a fraction of rows in all the rows, so it is between
0.0 and 1.0.
and virtualbuckets is the number of buckets in total (when considered
the mutli
batchs), how is it possible for 'innerbucketsize > virtualbuckets' ?
Am
I missing something?
You are right here. I've made a mistake here. Changed diff is in attachment.
--
Regards,
Andrei Lepikhov
Attachments:
fix_bucketsize_v2.diffapplication/octet-stream; name=fix_bucketsize_v2.diffDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index d6ceafd51c..5c5b24a843 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4274,11 +4274,12 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
thismcvfreq = restrictinfo->left_mcvfreq;
}
- if (innerbucketsize > thisbucketsize)
- innerbucketsize = thisbucketsize;
- if (innermcvfreq > thismcvfreq)
- innermcvfreq = thismcvfreq;
+ innerbucketsize *= thisbucketsize;
+ innermcvfreq *= thismcvfreq;
}
+
+ if (innerbucketsize < 1.0 / virtualbuckets)
+ innerbucketsize = 1.0 / virtualbuckets;
}
/*
On 9/11/23 10:04, Lepikhov Andrei wrote:
On Mon, Sep 11, 2023, at 11:51 AM, Andy Fan wrote:
Hi,
On Thu, Jun 15, 2023 at 4:30 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:Hi, all.
Some of my clients use JOIN's with three - four clauses. Quite
frequently, I see complaints on unreasonable switch of JOIN algorithm to
Merge Join instead of Hash Join. Quick research have shown one weak
place - estimation of an average bucket size in final_cost_hashjoin (see
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across
all buckets instead of multiplying them (or trying to make multivariate
analysis).
It works fine for the case of one clause. But if we have many clauses,
and if each has high value of ndistinct, we will overestimate average
size of a bucket and, as a result, prefer to use Merge Join. As the
example in attachment shows, it leads to worse plan than possible,
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash
clause components. But as for me, here we should go the same way, as
estimation of groups.
Yes, this analysis is correct - final_cost_hashjoin assumes the clauses
may be correlated (not necessarily by functional dependencies, just that
the overall ndistinct is not a simple product of per-column ndistincts).
And it even says so in the comment before calculating bucket size:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.
I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.
Unfortunately, AFAICS the patch simply assumes the extreme in the
opposite direction - it assumes each clause splits the bucket for each
distinct value in the column. Which works great when it's true, but
surely it'd have issues when the columns are correlated?
I think this deserves more discussion, i.e. what happens if the
assumptions do not hold? We know what happens for the conservative
approach, but what's the worst thing that would happen for the
optimistic one?
I doubt e can simply switch from the conservative approach to the
optimistic one. Yes, it'll make some queries faster, but for other
queries it likely causes problems and slowdowns.
IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.
This however reminds me we don't use extended statistics for join
clauses at all. Which means that even with accurate extended statistics,
we can still get stuff like this for multiple join clauses:
Hash Join (cost=1317.00..2386.00 rows=200 width=24)
(actual time=85.781..8574.784 rows=8000000 loops=1)
This is unrelated to the issue discussed here, of course, as it won't
affect join method selection for that join. But it certainly will affect
all estimates/costs above that join, which can be pretty disastrous.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 3/11/2023 23:43, Tomas Vondra wrote:
On 9/11/23 10:04, Lepikhov Andrei wrote:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.
Yes, It was contradictory idea, though.
IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.
And I've done it. Sorry for so long response. This patch employs of
extended statistics for estimation of the HashJoin bucket_size. In
addition, I describe the idea in more convenient form here [1]https://open.substack.com/pub/danolivo/p/why-postgresql-prefers-mergejoin?r=34q1yy&utm_campaign=post&utm_medium=web.
Obviously, it needs the only ndistinct to make a prediction that allows
to reduce computational cost of this statistic.
[1]: https://open.substack.com/pub/danolivo/p/why-postgresql-prefers-mergejoin?r=34q1yy&utm_campaign=post&utm_medium=web
https://open.substack.com/pub/danolivo/p/why-postgresql-prefers-mergejoin?r=34q1yy&utm_campaign=post&utm_medium=web
--
regards, Andrei Lepikhov
Attachments:
0001-Use-extended-statistics-for-precise-estimation-of-bu.patchtext/plain; charset=UTF-8; name=0001-Use-extended-statistics-for-precise-estimation-of-bu.patchDownload
From b4b007970b1d9b99602b8422f2122c8e5738828e Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 13 May 2024 16:15:02 +0700
Subject: [PATCH] Use extended statistics for precise estimation of bucket size
in hash join.
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.
In the case of hash join, it can end up with a small number of predicted hash
buckets and, as a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.
Clauses are grouped into lists, each containing expressions referencing the
same relation. The result of the multicolumn estimation made over such a list
is combined with others according to the caller's logic.
Clauses that are not estimated are returned to the caller for further
estimation.
---
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/utils/adt/selfuncs.c | 171 ++++++++++++++++++++++++++
src/include/utils/selfuncs.h | 4 +
3 files changed, 186 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ee23ed7835..eafde90d4c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4250,9 +4250,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
}
else
{
+ List *otherclauses;
+
innerbucketsize = 1.0;
innermcvfreq = 1.0;
- foreach(hcl, hashclauses)
+
+ /* At first, try to estimate bucket size using extended statistics. */
+ otherclauses = estimate_multivariate_bucketsize(root,
+ inner_path->parent,
+ hashclauses,
+ &innerbucketsize);
+
+ /* Pass through the remaining clauses */
+ foreach(hcl, otherclauses)
{
RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl);
Selectivity thisbucketsize;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..89ed2d136a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3751,6 +3751,177 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
return numdistinct;
}
+/*
+ * Try to estimate the bucket size of the hash join inner side when the join
+ * condition contains two or more clauses by employing extended statistics.
+ *
+ * The main idea of this approach is that the distinct value generated by
+ * multivariate estimation on two or more columns would provide less bucket size
+ * than estimation on one separate column.
+ *
+ * IMPORTANT: It is crucial to synchronise the approach of combining different
+ * estimations with the caller's method.
+ * Return a list of clauses that didn't fetch any extended statistics.
+ */
+List *
+estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize)
+{
+ List *clauses = list_copy(hashclauses);
+ List *otherclauses = NIL;
+ double ndistinct = 1.0;
+
+ if (list_length(hashclauses) <= 1)
+ /*
+ * Nothing to do for a single clause. Could we employ univariate
+ * extended stat here?
+ */
+ return hashclauses;
+
+ while (clauses != NIL)
+ {
+ ListCell *lc;
+ int relid = -1;
+ List *varinfos = NIL;
+ List *saveList = NIL;
+ double mvndistinct;
+ List *origin_varinfos;
+ int group_relid = -1;
+ RelOptInfo *group_rel;
+ ListCell *lc1, *lc2;
+
+ /*
+ * Find clauses, referencing the same single base relation and try to
+ * estimate such a group with extended statistics.
+ * Create varinfo for an approved clause, push it to otherclauses, if it
+ * can't be estimated here or ignore to process at the next iteration.
+ */
+ foreach(lc, clauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Node *expr;
+ Relids relids;
+ GroupVarInfo *varinfo;
+
+ /*
+ * Find the inner side of the join which we need to estimate the
+ * number of buckets. Use outer_is_left because the
+ * clause_sides_match_join routine has called on hash clauses.
+ */
+ relids = rinfo->outer_is_left ?
+ rinfo->right_relids : rinfo->left_relids;
+ expr = rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+
+ if (bms_get_singleton_member(relids, &relid) &&
+ root->simple_rel_array[relid]->statlist != NIL)
+ {
+ /*
+ * This inner-side expression references only one relation.
+ * Extended statistics on this clause can exist.
+ */
+
+ if (group_relid < 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid];
+
+ if (!rte || (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW &&
+ rte->relkind != RELKIND_FOREIGN_TABLE &&
+ rte->relkind != RELKIND_PARTITIONED_TABLE))
+ {
+ /* Extended statistics can't exist in principle */
+ otherclauses = lappend(otherclauses, rinfo);
+ clauses = foreach_delete_current(clauses, lc);
+ continue;
+ }
+
+ group_relid = relid;
+ group_rel = root->simple_rel_array[relid];
+ Assert(group_rel != NULL);
+ }
+ else if (group_relid != relid)
+ /*
+ * Being in the group forming state we don't
+ * need other clauses.
+ */
+ continue;
+
+ varinfo = (GroupVarInfo *) palloc(sizeof(GroupVarInfo));
+ varinfo->var = expr;
+ varinfo->rel = root->simple_rel_array[relid];
+ varinfo->ndistinct = 0.0;
+ varinfo->isdefault = false;
+ varinfos = lappend(varinfos, varinfo);
+
+ /*
+ * Remember the link to RestrictInfo for the case the clause is
+ * failed to be estimated.
+ */
+ saveList = lappend(saveList, rinfo);
+ }
+ else
+ /* This clause can't be estimated with extended statistics */
+ otherclauses = lappend(otherclauses, rinfo);
+
+ clauses = foreach_delete_current(clauses, lc);
+ }
+
+ if (list_length(varinfos) < 2)
+ {
+ /*
+ * Multivariate statistics doesn't apply to single column except
+ * expressions, but it has not implemented yet.
+ */
+ otherclauses = list_concat(otherclauses, saveList);
+ list_free_deep(varinfos);
+ list_free(saveList);
+ continue;
+ }
+
+ /* Let's employ extended statistics. */
+ origin_varinfos = varinfos;
+ for (;;)
+ {
+ bool estimated = estimate_multivariate_ndistinct(root,
+ group_rel,
+ &varinfos,
+ &mvndistinct);
+ if (!estimated)
+ break;
+
+ /*
+ * We've got an estimation. Use ndistinct value in consistent way -
+ * according to the caller's logic (see final_cost_hashjoin).
+ */
+ if (ndistinct < mvndistinct)
+ ndistinct = mvndistinct;
+ Assert(ndistinct >= 1.0);
+ }
+
+ Assert(list_length(origin_varinfos) == list_length(saveList));
+
+ /*
+ * Remove already estimated clauses from the saveList
+ */
+ forboth(lc1, origin_varinfos, lc2, saveList)
+ {
+ GroupVarInfo *vinfo = lfirst(lc1);
+
+ if (!list_member_ptr(varinfos, vinfo))
+ /* Already estimated */
+ continue;
+
+ /* Can't be estimated here - push to the returning list */
+ otherclauses = lappend(otherclauses, lfirst(lc2));
+ }
+ }
+
+ *innerbucketsize = 1.0 / ndistinct;
+ return otherclauses;
+}
+
/*
* Estimate hash bucket statistics when the specified expression is used
* as a hash key for the given number of buckets.
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f2563ad1cb..ac62d2fa0e 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -217,6 +217,10 @@ extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
double input_rows, List **pgset,
EstimationInfo *estinfo);
+extern List *estimate_multivariate_bucketsize(PlannerInfo *root,
+ RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize);
extern void estimate_hash_bucket_stats(PlannerInfo *root,
Node *hashkey, double nbuckets,
Selectivity *mcv_freq,
--
2.45.2
On 7/8/24 19:45, Andrei Lepikhov wrote:
On 3/11/2023 23:43, Tomas Vondra wrote:
On 9/11/23 10:04, Lepikhov Andrei wrote:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.Yes, It was contradictory idea, though.
IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.And I've done it. Sorry for so long response. This patch employs of
extended statistics for estimation of the HashJoin bucket_size. In
addition, I describe the idea in more convenient form here [1].
Obviously, it needs the only ndistinct to make a prediction that allows
to reduce computational cost of this statistic.
Minor change to make cfbot happier.
--
regards, Andrei Lepikhov
Attachments:
v1-0001-Use-extended-statistics-for-precise-estimation-of.patchtext/x-patch; charset=UTF-8; name=v1-0001-Use-extended-statistics-for-precise-estimation-of.patchDownload
From 73e4d16812431b91fb760d994d0198baab2fe034 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 13 May 2024 16:15:02 +0700
Subject: [PATCH v1] Use extended statistics for precise estimation of bucket
size in hash join.
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.
In the case of hash join, it can end up with a small number of predicted hash
buckets and, as a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.
Clauses are grouped into lists, each containing expressions referencing the
same relation. The result of the multicolumn estimation made over such a list
is combined with others according to the caller's logic.
Clauses that are not estimated are returned to the caller for further
estimation.
---
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/utils/adt/selfuncs.c | 172 ++++++++++++++++++++++++++
src/include/utils/selfuncs.h | 4 +
3 files changed, 187 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..af1cb8ba78 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4294,9 +4294,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
}
else
{
+ List *otherclauses;
+
innerbucketsize = 1.0;
innermcvfreq = 1.0;
- foreach(hcl, hashclauses)
+
+ /* At first, try to estimate bucket size using extended statistics. */
+ otherclauses = estimate_multivariate_bucketsize(root,
+ inner_path->parent,
+ hashclauses,
+ &innerbucketsize);
+
+ /* Pass through the remaining clauses */
+ foreach(hcl, otherclauses)
{
RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl);
Selectivity thisbucketsize;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 03d7fb5f48..7a57f444c8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3752,6 +3752,178 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
return numdistinct;
}
+/*
+ * Try to estimate the bucket size of the hash join inner side when the join
+ * condition contains two or more clauses by employing extended statistics.
+ *
+ * The main idea of this approach is that the distinct value generated by
+ * multivariate estimation on two or more columns would provide less bucket size
+ * than estimation on one separate column.
+ *
+ * IMPORTANT: It is crucial to synchronise the approach of combining different
+ * estimations with the caller's method.
+ * Return a list of clauses that didn't fetch any extended statistics.
+ */
+List *
+estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize)
+{
+ List *clauses = list_copy(hashclauses);
+ List *otherclauses = NIL;
+ double ndistinct = 1.0;
+
+ if (list_length(hashclauses) <= 1)
+ /*
+ * Nothing to do for a single clause. Could we employ univariate
+ * extended stat here?
+ */
+ return hashclauses;
+
+ while (clauses != NIL)
+ {
+ ListCell *lc;
+ int relid = -1;
+ List *varinfos = NIL;
+ List *saveList = NIL;
+ double mvndistinct;
+ List *origin_varinfos;
+ int group_relid = -1;
+ RelOptInfo *group_rel = NULL;
+ ListCell *lc1, *lc2;
+
+ /*
+ * Find clauses, referencing the same single base relation and try to
+ * estimate such a group with extended statistics.
+ * Create varinfo for an approved clause, push it to otherclauses, if it
+ * can't be estimated here or ignore to process at the next iteration.
+ */
+ foreach(lc, clauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Node *expr;
+ Relids relids;
+ GroupVarInfo *varinfo;
+
+ /*
+ * Find the inner side of the join which we need to estimate the
+ * number of buckets. Use outer_is_left because the
+ * clause_sides_match_join routine has called on hash clauses.
+ */
+ relids = rinfo->outer_is_left ?
+ rinfo->right_relids : rinfo->left_relids;
+ expr = rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+
+ if (bms_get_singleton_member(relids, &relid) &&
+ root->simple_rel_array[relid]->statlist != NIL)
+ {
+ /*
+ * This inner-side expression references only one relation.
+ * Extended statistics on this clause can exist.
+ */
+
+ if (group_relid < 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid];
+
+ if (!rte || (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW &&
+ rte->relkind != RELKIND_FOREIGN_TABLE &&
+ rte->relkind != RELKIND_PARTITIONED_TABLE))
+ {
+ /* Extended statistics can't exist in principle */
+ otherclauses = lappend(otherclauses, rinfo);
+ clauses = foreach_delete_current(clauses, lc);
+ continue;
+ }
+
+ group_relid = relid;
+ group_rel = root->simple_rel_array[relid];
+ }
+ else if (group_relid != relid)
+ /*
+ * Being in the group forming state we don't
+ * need other clauses.
+ */
+ continue;
+
+ varinfo = (GroupVarInfo *) palloc(sizeof(GroupVarInfo));
+ varinfo->var = expr;
+ varinfo->rel = root->simple_rel_array[relid];
+ varinfo->ndistinct = 0.0;
+ varinfo->isdefault = false;
+ varinfos = lappend(varinfos, varinfo);
+
+ /*
+ * Remember the link to RestrictInfo for the case the clause is
+ * failed to be estimated.
+ */
+ saveList = lappend(saveList, rinfo);
+ }
+ else
+ /* This clause can't be estimated with extended statistics */
+ otherclauses = lappend(otherclauses, rinfo);
+
+ clauses = foreach_delete_current(clauses, lc);
+ }
+
+ if (list_length(varinfos) < 2)
+ {
+ /*
+ * Multivariate statistics doesn't apply to single column except
+ * expressions, but it has not implemented yet.
+ */
+ otherclauses = list_concat(otherclauses, saveList);
+ list_free_deep(varinfos);
+ list_free(saveList);
+ continue;
+ }
+
+ Assert(group_rel != NULL);
+
+ /* Let's employ extended statistics. */
+ origin_varinfos = varinfos;
+ for (;;)
+ {
+ bool estimated = estimate_multivariate_ndistinct(root,
+ group_rel,
+ &varinfos,
+ &mvndistinct);
+ if (!estimated)
+ break;
+
+ /*
+ * We've got an estimation. Use ndistinct value in consistent way -
+ * according to the caller's logic (see final_cost_hashjoin).
+ */
+ if (ndistinct < mvndistinct)
+ ndistinct = mvndistinct;
+ Assert(ndistinct >= 1.0);
+ }
+
+ Assert(list_length(origin_varinfos) == list_length(saveList));
+
+ /*
+ * Remove already estimated clauses from the saveList
+ */
+ forboth(lc1, origin_varinfos, lc2, saveList)
+ {
+ GroupVarInfo *vinfo = lfirst(lc1);
+
+ if (!list_member_ptr(varinfos, vinfo))
+ /* Already estimated */
+ continue;
+
+ /* Can't be estimated here - push to the returning list */
+ otherclauses = lappend(otherclauses, lfirst(lc2));
+ }
+ }
+
+ *innerbucketsize = 1.0 / ndistinct;
+ return otherclauses;
+}
+
/*
* Estimate hash bucket statistics when the specified expression is used
* as a hash key for the given number of buckets.
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index f2563ad1cb..ac62d2fa0e 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -217,6 +217,10 @@ extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
double input_rows, List **pgset,
EstimationInfo *estinfo);
+extern List *estimate_multivariate_bucketsize(PlannerInfo *root,
+ RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize);
extern void estimate_hash_bucket_stats(PlannerInfo *root,
Node *hashkey, double nbuckets,
Selectivity *mcv_freq,
--
2.39.5
Hi, Andrei!
On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/8/24 19:45, Andrei Lepikhov wrote:
On 3/11/2023 23:43, Tomas Vondra wrote:
On 9/11/23 10:04, Lepikhov Andrei wrote:
* Determine bucketsize fraction and MCV frequency for the inner
* relation. We use the smallest bucketsize or MCV frequency estimated
* for any individual hashclause; this is undoubtedly conservative.I'm sure this may lead to inflated cost for "good" cases (where the
actual bucket size really is a product), which may push the optimizer to
use the less efficient/slower join method.Yes, It was contradictory idea, though.
IMHO the only principled way forward is to get a better ndistinct
estimate (which this implicitly does), perhaps by using extended
statistics. I haven't tried, but I guess it'd need to extract the
clauses for the inner side, and call estimate_num_groups() on it.And I've done it. Sorry for so long response. This patch employs of
extended statistics for estimation of the HashJoin bucket_size. In
addition, I describe the idea in more convenient form here [1].
Obviously, it needs the only ndistinct to make a prediction that allows
to reduce computational cost of this statistic.Minor change to make cfbot happier.
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.
if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;
IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?
I've slightly revised the patch. I've run pg_indent and renamed
s/saveList/origin_rinfos/g for better readability.
Also, the patch badly needs regression test coverage. We can't
include costs in expected outputs. But that could be some plans,
which previously were reliably merge joins but now become reliable
hash joins.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Use-extended-statistics-for-precise-estimation-of.patchapplication/octet-stream; name=v2-0001-Use-extended-statistics-for-precise-estimation-of.patchDownload
From 05aade32c8c50fcb748cc86ddb3bb25b73787c8c Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 13 May 2024 16:15:02 +0700
Subject: [PATCH v2] Use extended statistics for precise estimation of bucket
size in hash join.
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.
In the case of hash join, it can end up with a small number of predicted hash
buckets and, as a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.
Clauses are grouped into lists, each containing expressions referencing the
same relation. The result of the multicolumn estimation made over such a list
is combined with others according to the caller's logic.
Clauses that are not estimated are returned to the caller for further
estimation.
---
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/utils/adt/selfuncs.c | 176 ++++++++++++++++++++++++++
src/include/utils/selfuncs.h | 4 +
3 files changed, 191 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 73d78617009..256568d05a2 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4339,9 +4339,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
}
else
{
+ List *otherclauses;
+
innerbucketsize = 1.0;
innermcvfreq = 1.0;
- foreach(hcl, hashclauses)
+
+ /* At first, try to estimate bucket size using extended statistics. */
+ otherclauses = estimate_multivariate_bucketsize(root,
+ inner_path->parent,
+ hashclauses,
+ &innerbucketsize);
+
+ /* Pass through the remaining clauses */
+ foreach(hcl, otherclauses)
{
RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl);
Selectivity thisbucketsize;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d3d1e485bb2..b826730acbe 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3765,6 +3765,182 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
return numdistinct;
}
+/*
+ * Try to estimate the bucket size of the hash join inner side when the join
+ * condition contains two or more clauses by employing extended statistics.
+ *
+ * The main idea of this approach is that the distinct value generated by
+ * multivariate estimation on two or more columns would provide less bucket size
+ * than estimation on one separate column.
+ *
+ * IMPORTANT: It is crucial to synchronise the approach of combining different
+ * estimations with the caller's method.
+ * Return a list of clauses that didn't fetch any extended statistics.
+ */
+List *
+estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize)
+{
+ List *clauses = list_copy(hashclauses);
+ List *otherclauses = NIL;
+ double ndistinct = 1.0;
+
+ if (list_length(hashclauses) <= 1)
+
+ /*
+ * Nothing to do for a single clause. Could we employ univariate
+ * extended stat here?
+ */
+ return hashclauses;
+
+ while (clauses != NIL)
+ {
+ ListCell *lc;
+ int relid = -1;
+ List *varinfos = NIL;
+ List *origin_rinfos = NIL;
+ double mvndistinct;
+ List *origin_varinfos;
+ int group_relid = -1;
+ RelOptInfo *group_rel = NULL;
+ ListCell *lc1,
+ *lc2;
+
+ /*
+ * Find clauses, referencing the same single base relation and try to
+ * estimate such a group with extended statistics. Create varinfo for
+ * an approved clause, push it to otherclauses, if it can't be
+ * estimated here or ignore to process at the next iteration.
+ */
+ foreach(lc, clauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Node *expr;
+ Relids relids;
+ GroupVarInfo *varinfo;
+
+ /*
+ * Find the inner side of the join which we need to estimate the
+ * number of buckets. Use outer_is_left because the
+ * clause_sides_match_join routine has called on hash clauses.
+ */
+ relids = rinfo->outer_is_left ?
+ rinfo->right_relids : rinfo->left_relids;
+ expr = rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+
+ if (bms_get_singleton_member(relids, &relid) &&
+ root->simple_rel_array[relid]->statlist != NIL)
+ {
+ /*
+ * This inner-side expression references only one relation.
+ * Extended statistics on this clause can exist.
+ */
+
+ if (group_relid < 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid];
+
+ if (!rte || (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW &&
+ rte->relkind != RELKIND_FOREIGN_TABLE &&
+ rte->relkind != RELKIND_PARTITIONED_TABLE))
+ {
+ /* Extended statistics can't exist in principle */
+ otherclauses = lappend(otherclauses, rinfo);
+ clauses = foreach_delete_current(clauses, lc);
+ continue;
+ }
+
+ group_relid = relid;
+ group_rel = root->simple_rel_array[relid];
+ }
+ else if (group_relid != relid)
+
+ /*
+ * Being in the group forming state we don't need other
+ * clauses.
+ */
+ continue;
+
+ varinfo = (GroupVarInfo *) palloc(sizeof(GroupVarInfo));
+ varinfo->var = expr;
+ varinfo->rel = root->simple_rel_array[relid];
+ varinfo->ndistinct = 0.0;
+ varinfo->isdefault = false;
+ varinfos = lappend(varinfos, varinfo);
+
+ /*
+ * Remember the link to RestrictInfo for the case the clause
+ * is failed to be estimated.
+ */
+ origin_rinfos = lappend(origin_rinfos, rinfo);
+ }
+ else
+ /* This clause can't be estimated with extended statistics */
+ otherclauses = lappend(otherclauses, rinfo);
+
+ clauses = foreach_delete_current(clauses, lc);
+ }
+
+ if (list_length(varinfos) < 2)
+ {
+ /*
+ * Multivariate statistics doesn't apply to single column except
+ * expressions, but it has not implemented yet.
+ */
+ otherclauses = list_concat(otherclauses, origin_rinfos);
+ list_free_deep(varinfos);
+ list_free(origin_rinfos);
+ continue;
+ }
+
+ Assert(group_rel != NULL);
+
+ /* Let's employ extended statistics. */
+ origin_varinfos = varinfos;
+ for (;;)
+ {
+ bool estimated = estimate_multivariate_ndistinct(root,
+ group_rel,
+ &varinfos,
+ &mvndistinct);
+
+ if (!estimated)
+ break;
+
+ /*
+ * We've got an estimation. Use ndistinct value in consistent way
+ * - according to the caller's logic (see final_cost_hashjoin).
+ */
+ if (ndistinct < mvndistinct)
+ ndistinct = mvndistinct;
+ Assert(ndistinct >= 1.0);
+ }
+
+ Assert(list_length(origin_varinfos) == list_length(origin_rinfos));
+
+ /*
+ * Remove already estimated clauses from the origin_rinfos
+ */
+ forboth(lc1, origin_varinfos, lc2, origin_rinfos)
+ {
+ GroupVarInfo *vinfo = lfirst(lc1);
+
+ if (!list_member_ptr(varinfos, vinfo))
+ /* Already estimated */
+ continue;
+
+ /* Can't be estimated here - push to the returning list */
+ otherclauses = lappend(otherclauses, lfirst(lc2));
+ }
+ }
+
+ *innerbucketsize = 1.0 / ndistinct;
+ return otherclauses;
+}
+
/*
* Estimate hash bucket statistics when the specified expression is used
* as a hash key for the given number of buckets.
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index b5e95c0ff61..c845da4c7cc 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -217,6 +217,10 @@ extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
double input_rows, List **pgset,
EstimationInfo *estinfo);
+extern List *estimate_multivariate_bucketsize(PlannerInfo *root,
+ RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize);
extern void estimate_hash_bucket_stats(PlannerInfo *root,
Node *hashkey, double nbuckets,
Selectivity *mcv_freq,
--
2.39.5 (Apple Git-154)
On 17/2/2025 01:34, Alexander Korotkov wrote:
Hi, Andrei!
On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?
I agree with your point. But I think the code works precisely the way
you have described.
I've slightly revised the patch. I've run pg_indent and renamed
s/saveList/origin_rinfos/g for better readability.
Thank You!
Also, the patch badly needs regression test coverage. We can't
include costs in expected outputs. But that could be some plans,
which previously were reliably merge joins but now become reliable
hash joins.
I added one test here. Writing more tests on this feature is hard, but
feature [1]Showing applied extended statistics in explain Part 2 /messages/by-id/TYYPR01MB82310B308BA8770838F681619E5E2@TYYPR01MB8231.jpnprd01.prod.outlook.com may provide us with additional tools to reveal extended stat
internals. I also have thought about injection points, but it seems an
over-complication.
[1]: Showing applied extended statistics in explain Part 2 /messages/by-id/TYYPR01MB82310B308BA8770838F681619E5E2@TYYPR01MB8231.jpnprd01.prod.outlook.com
/messages/by-id/TYYPR01MB82310B308BA8770838F681619E5E2@TYYPR01MB8231.jpnprd01.prod.outlook.com
--
regards, Andrei Lepikhov
Attachments:
v3-0001-Use-extended-statistics-for-precise-estimation-of.patchtext/plain; charset=UTF-8; name=v3-0001-Use-extended-statistics-for-precise-estimation-of.patchDownload
From 8132a7ef772751c5b54fc905571a876f2d3b9132 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 13 May 2024 16:15:02 +0700
Subject: [PATCH v3] Use extended statistics for precise estimation of bucket
size in hash join.
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN.
In the case of hash join, it can end up with a small number of predicted hash
buckets and, as a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation.
Clauses are grouped into lists, each containing expressions referencing the
same relation. The result of the multicolumn estimation made over such a list
is combined with others according to the caller's logic.
Clauses that are not estimated are returned to the caller for further
estimation.
---
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/utils/adt/selfuncs.c | 181 +++++++++++++++++++++++-
src/include/utils/selfuncs.h | 4 +
src/test/regress/expected/stats_ext.out | 43 ++++++
src/test/regress/sql/stats_ext.sql | 28 ++++
5 files changed, 265 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 73d78617009..256568d05a2 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4339,9 +4339,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
}
else
{
+ List *otherclauses;
+
innerbucketsize = 1.0;
innermcvfreq = 1.0;
- foreach(hcl, hashclauses)
+
+ /* At first, try to estimate bucket size using extended statistics. */
+ otherclauses = estimate_multivariate_bucketsize(root,
+ inner_path->parent,
+ hashclauses,
+ &innerbucketsize);
+
+ /* Pass through the remaining clauses */
+ foreach(hcl, otherclauses)
{
RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl);
Selectivity thisbucketsize;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c2918c9c831..4f74618affd 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3765,6 +3765,183 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
return numdistinct;
}
+/*
+ * Try to estimate the bucket size of the hash join inner side when the join
+ * condition contains two or more clauses by employing extended statistics.
+ *
+ * The main idea of this approach is that the distinct value generated by
+ * multivariate estimation on two or more columns would provide less bucket size
+ * than estimation on one separate column.
+ *
+ * IMPORTANT: It is crucial to synchronise the approach of combining different
+ * estimations with the caller's method.
+ * Return a list of clauses that didn't fetch any extended statistics.
+ */
+List *
+estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize)
+{
+ List *clauses = list_copy(hashclauses);
+ List *otherclauses = NIL;
+ double ndistinct = 1.0;
+
+ if (list_length(hashclauses) <= 1)
+
+ /*
+ * Nothing to do for a single clause. Could we employ univariate
+ * extended stat here?
+ */
+ return hashclauses;
+
+ while (clauses != NIL)
+ {
+ ListCell *lc;
+ int relid = -1;
+ List *varinfos = NIL;
+ List *origin_rinfos = NIL;
+ double mvndistinct;
+ List *origin_varinfos;
+ int group_relid = -1;
+ RelOptInfo *group_rel = NULL;
+ ListCell *lc1,
+ *lc2;
+
+ /*
+ * Find clauses, referencing the same single base relation and try to
+ * estimate such a group with extended statistics. Create varinfo for
+ * an approved clause, push it to otherclauses, if it can't be
+ * estimated here or ignore to process at the next iteration.
+ */
+ foreach(lc, clauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Node *expr;
+ Relids relids;
+ GroupVarInfo *varinfo;
+
+ /*
+ * Find the inner side of the join which we need to estimate the
+ * number of buckets. Use outer_is_left because the
+ * clause_sides_match_join routine has called on hash clauses.
+ */
+ relids = rinfo->outer_is_left ?
+ rinfo->right_relids : rinfo->left_relids;
+ expr = rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+
+ if (bms_get_singleton_member(relids, &relid) &&
+ root->simple_rel_array[relid]->statlist != NIL)
+ {
+ /*
+ * This inner-side expression references only one relation.
+ * Extended statistics on this clause can exist.
+ */
+
+ if (group_relid < 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid];
+
+ if (!rte || (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW &&
+ rte->relkind != RELKIND_FOREIGN_TABLE &&
+ rte->relkind != RELKIND_PARTITIONED_TABLE))
+ {
+ /* Extended statistics can't exist in principle */
+ otherclauses = lappend(otherclauses, rinfo);
+ clauses = foreach_delete_current(clauses, lc);
+ continue;
+ }
+
+ group_relid = relid;
+ group_rel = root->simple_rel_array[relid];
+ }
+ else if (group_relid != relid)
+
+ /*
+ * Being in the group forming state we don't need other
+ * clauses.
+ */
+ continue;
+
+ varinfo = (GroupVarInfo *) palloc(sizeof(GroupVarInfo));
+ varinfo->var = expr;
+ varinfo->rel = root->simple_rel_array[relid];
+ varinfo->ndistinct = 0.0;
+ varinfo->isdefault = false;
+ varinfos = lappend(varinfos, varinfo);
+
+ /*
+ * Remember the link to RestrictInfo for the case of the clause
+ * estimation failure.
+ */
+ origin_rinfos = lappend(origin_rinfos, rinfo);
+ }
+ else
+ /* This clause can't be estimated with extended statistics */
+ otherclauses = lappend(otherclauses, rinfo);
+
+ clauses = foreach_delete_current(clauses, lc);
+ }
+
+ if (list_length(varinfos) < 2)
+ {
+ /*
+ * Multivariate statistics doesn't apply to single column except
+ * expressions, but it has not implemented yet.
+ */
+ otherclauses = list_concat(otherclauses, origin_rinfos);
+ list_free_deep(varinfos);
+ list_free(origin_rinfos);
+ continue;
+ }
+
+ Assert(group_rel != NULL);
+
+ /* Let's employ extended statistics. */
+ origin_varinfos = varinfos;
+ for (;;)
+ {
+ bool estimated = estimate_multivariate_ndistinct(root,
+ group_rel,
+ &varinfos,
+ &mvndistinct);
+
+ if (!estimated)
+ break;
+
+ /*
+ * We've got an estimation. Use ndistinct value in consistent way
+ * - according to the caller's logic (see final_cost_hashjoin).
+ */
+ if (ndistinct < mvndistinct)
+ ndistinct = mvndistinct;
+ Assert(ndistinct >= 1.0);
+ }
+
+ Assert(list_length(origin_varinfos) == list_length(origin_rinfos));
+
+ /*
+ * Identify clauses estimated through extended statistics and assemble
+ * a list of unestimated clauses.
+ */
+ forboth(lc1, origin_varinfos, lc2, origin_rinfos)
+ {
+ GroupVarInfo *vinfo = lfirst(lc1);
+
+ if (!list_member_ptr(varinfos, vinfo))
+ /* Already estimated */
+ continue;
+
+ /* unestimated clause found - push to the returning list */
+ otherclauses = lappend(otherclauses, lfirst(lc2));
+ }
+ }
+
+ *innerbucketsize = 1.0 / ndistinct;
+ return otherclauses;
+}
+
/*
* Estimate hash bucket statistics when the specified expression is used
* as a hash key for the given number of buckets.
@@ -3957,8 +4134,8 @@ estimate_hashagg_tablesize(PlannerInfo *root, Path *path,
/*
* Find applicable ndistinct statistics for the given list of VarInfos (which
* must all belong to the given rel), and update *ndistinct to the estimate of
- * the MVNDistinctItem that best matches. If a match it found, *varinfos is
- * updated to remove the list of matched varinfos.
+ * the MVNDistinctItem that best matches. If a match is found, *varinfos is
+ * a new list of not matched varinfos.
*
* Varinfos that aren't for simple Vars are ignored.
*
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index d35939651f3..82ac8c6d9da 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -218,6 +218,10 @@ extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
double input_rows, List **pgset,
EstimationInfo *estinfo);
+extern List *estimate_multivariate_bucketsize(PlannerInfo *root,
+ RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize);
extern void estimate_hash_bucket_stats(PlannerInfo *root,
Node *hashkey, double nbuckets,
Selectivity *mcv_freq,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 904d3e623f5..ecba76d8d6b 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -3383,3 +3383,46 @@ SELECT * FROM check_estimated_rows('
(1 row)
DROP TABLE grouping_unique;
+--
+-- Extended statistics on sb_2(x,y,z) improves a bucket size estimation and
+-- optimiser may choose hash join.
+--
+CREATE TABLE sb_1 AS
+ SELECT gs%10 AS x, gs%10 AS y, gs%10 AS z
+ FROM generate_series(1,1e4) AS gs;
+CREATE TABLE sb_2 AS
+ SELECT gs % 49 AS x, gs % 51 AS y, gs %73 AS z, 'abc' || gs AS payload
+ FROM generate_series(1,1e4) AS gs;
+ANALYZE sb_1,sb_2;
+-- During hash join estimation, the number of distinct values on each column
+-- is calculated, and choosing the smallest one as a hash bucket size optimiser
+-- decides that it is quite big --> lots of correlations.
+EXPLAIN (COSTS OFF) -- Choose merge join
+SELECT * FROM sb_1 a, sb_2 b WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
+ QUERY PLAN
+-------------------------------------------------------------
+ Merge Join
+ Merge Cond: ((a.z = b.z) AND (a.x = b.x) AND (a.y = b.y))
+ -> Sort
+ Sort Key: a.z, a.x, a.y
+ -> Seq Scan on sb_1 a
+ -> Sort
+ Sort Key: b.z, b.x, b.y
+ -> Seq Scan on sb_2 b
+(8 rows)
+
+-- ndistinct on x,y,z provides more reliable value of bucket size
+CREATE STATISTICS extstat_sb_2 (ndistinct) ON x,y,z FROM sb_2;
+ANALYZE sb_2;
+EXPLAIN (COSTS OFF) -- Choose hash join
+SELECT * FROM sb_1 a,sb_2 b WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
+ QUERY PLAN
+------------------------------------------------------------
+ Hash Join
+ Hash Cond: ((a.x = b.x) AND (a.y = b.y) AND (a.z = b.z))
+ -> Seq Scan on sb_1 a
+ -> Hash
+ -> Seq Scan on sb_2 b
+(5 rows)
+
+DROP TABLE sb_1,sb_2 CASCADE;
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 88b33ccaef8..3e845764ba3 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -1719,3 +1719,31 @@ SELECT * FROM check_estimated_rows('
ON t1.t1 = q1.x;
');
DROP TABLE grouping_unique;
+
+--
+-- Extended statistics on sb_2(x,y,z) improves a bucket size estimation and
+-- optimiser may choose hash join.
+--
+
+CREATE TABLE sb_1 AS
+ SELECT gs%10 AS x, gs%10 AS y, gs%10 AS z
+ FROM generate_series(1,1e4) AS gs;
+CREATE TABLE sb_2 AS
+ SELECT gs % 49 AS x, gs % 51 AS y, gs %73 AS z, 'abc' || gs AS payload
+ FROM generate_series(1,1e4) AS gs;
+ANALYZE sb_1,sb_2;
+
+-- During hash join estimation, the number of distinct values on each column
+-- is calculated, and choosing the smallest one as a hash bucket size optimiser
+-- decides that it is quite big --> lots of correlations.
+EXPLAIN (COSTS OFF) -- Choose merge join
+SELECT * FROM sb_1 a, sb_2 b WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
+
+-- ndistinct on x,y,z provides more reliable value of bucket size
+CREATE STATISTICS extstat_sb_2 (ndistinct) ON x,y,z FROM sb_2;
+ANALYZE sb_2;
+
+EXPLAIN (COSTS OFF) -- Choose hash join
+SELECT * FROM sb_1 a,sb_2 b WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
+
+DROP TABLE sb_1,sb_2 CASCADE;
--
2.48.1
On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 17/2/2025 01:34, Alexander Korotkov wrote:
Hi, Andrei!
On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?I agree with your point. But I think the code works precisely the way
you have described.
You're right. I just messed up with the sides of comparison operator.
------
Regards,
Alexander Korotkov
Supabase
On Wed, Mar 5, 2025 at 4:43 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 17/2/2025 01:34, Alexander Korotkov wrote:
Hi, Andrei!
On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
Thank you for your work on this subject. I agree with the general
direction. While everyone has used conservative estimates for a long
time, it's better to change them only when we're sure about it.
However, I'm still not sure I get the conservatism.if (innerbucketsize > thisbucketsize)
innerbucketsize = thisbucketsize;
if (innermcvfreq > thismcvfreq)
innermcvfreq = thismcvfreq;IFAICS, even in the worst case (all columns are totally correlated),
the overall bucket size should be the smallest bucket size among
clauses (not the largest). And the same is true of MCV. As a mental
experiment, we can add a new clause to hash join, which is always true
because columns on both sides have the same value. In fact, it would
have almost no influence except for the cost of extracting additional
columns and the cost of executing additional operators. But in the
current model, this additional clause would completely ruin
thisbucketsize and thismcvfreq, making hash join extremely
unappealing. Should we still revise this to calculate minimum instead
of maximum?I agree with your point. But I think the code works precisely the way
you have described.You're right. I just messed up with the sides of comparison operator.
I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v3-0001-Use-extended-stats-for-precise-estimation-of-buck.patchapplication/octet-stream; name=v3-0001-Use-extended-stats-for-precise-estimation-of-buck.patchDownload
From cac95c0aed845f5f1d5358bdccbaa5e6252bd2de Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 9 Mar 2025 14:09:38 +0200
Subject: [PATCH v3] Use extended stats for precise estimation of bucket size
in hash join
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN. In the case of hash
join, it can end up with a small number of predicted hash buckets and, as
a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation. Clauses are grouped into
lists, each containing expressions referencing the same relation. The result
of the multicolumn estimation made over such a list is combined with others
according to the caller's logic. Clauses that are not estimated are returned
to the caller for further estimation.
Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Andy Fan <zhihui.fan1213@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
src/backend/optimizer/path/costsize.c | 12 +-
src/backend/utils/adt/selfuncs.c | 177 ++++++++++++++++++++++++
src/include/utils/selfuncs.h | 4 +
src/test/regress/expected/stats_ext.out | 45 ++++++
src/test/regress/sql/stats_ext.sql | 29 ++++
5 files changed, 266 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 73d78617009..256568d05a2 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4339,9 +4339,19 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
}
else
{
+ List *otherclauses;
+
innerbucketsize = 1.0;
innermcvfreq = 1.0;
- foreach(hcl, hashclauses)
+
+ /* At first, try to estimate bucket size using extended statistics. */
+ otherclauses = estimate_multivariate_bucketsize(root,
+ inner_path->parent,
+ hashclauses,
+ &innerbucketsize);
+
+ /* Pass through the remaining clauses */
+ foreach(hcl, otherclauses)
{
RestrictInfo *restrictinfo = lfirst_node(RestrictInfo, hcl);
Selectivity thisbucketsize;
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c2918c9c831..fb75b1ec46d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3765,6 +3765,183 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
return numdistinct;
}
+/*
+ * Try to estimate the bucket size of the hash join inner side when the join
+ * condition contains two or more clauses by employing extended statistics.
+ *
+ * The main idea of this approach is that the distinct value generated by
+ * multivariate estimation on two or more columns would provide less bucket size
+ * than estimation on one separate column.
+ *
+ * IMPORTANT: It is crucial to synchronize the approach of combining different
+ * estimations with the caller's method.
+ *
+ * Return a list of clauses that didn't fetch any extended statistics.
+ */
+List *
+estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize)
+{
+ List *clauses = list_copy(hashclauses);
+ List *otherclauses = NIL;
+ double ndistinct = 1.0;
+
+ if (list_length(hashclauses) <= 1)
+
+ /*
+ * Nothing to do for a single clause. Could we employ univariate
+ * extended stat here?
+ */
+ return hashclauses;
+
+ while (clauses != NIL)
+ {
+ ListCell *lc;
+ int relid = -1;
+ List *varinfos = NIL;
+ List *origin_rinfos = NIL;
+ double mvndistinct;
+ List *origin_varinfos;
+ int group_relid = -1;
+ RelOptInfo *group_rel = NULL;
+ ListCell *lc1,
+ *lc2;
+
+ /*
+ * Find clauses, referencing the same single base relation and try to
+ * estimate such a group with extended statistics. Create varinfo for
+ * an approved clause, push it to otherclauses, if it can't be
+ * estimated here or ignore to process at the next iteration.
+ */
+ foreach(lc, clauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Node *expr;
+ Relids relids;
+ GroupVarInfo *varinfo;
+
+ /*
+ * Find the inner side of the join, which we need to estimate the
+ * number of buckets. Use outer_is_left because the
+ * clause_sides_match_join routine has called on hash clauses.
+ */
+ relids = rinfo->outer_is_left ?
+ rinfo->right_relids : rinfo->left_relids;
+ expr = rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+
+ if (bms_get_singleton_member(relids, &relid) &&
+ root->simple_rel_array[relid]->statlist != NIL)
+ {
+ /*
+ * This inner-side expression references only one relation.
+ * Extended statistics on this clause can exist.
+ */
+ if (group_relid < 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid];
+
+ if (!rte || (rte->relkind != RELKIND_RELATION &&
+ rte->relkind != RELKIND_MATVIEW &&
+ rte->relkind != RELKIND_FOREIGN_TABLE &&
+ rte->relkind != RELKIND_PARTITIONED_TABLE))
+ {
+ /* Extended statistics can't exist in principle */
+ otherclauses = lappend(otherclauses, rinfo);
+ clauses = foreach_delete_current(clauses, lc);
+ continue;
+ }
+
+ group_relid = relid;
+ group_rel = root->simple_rel_array[relid];
+ }
+ else if (group_relid != relid)
+
+ /*
+ * Being in the group forming state we don't need other
+ * clauses.
+ */
+ continue;
+
+ varinfo = (GroupVarInfo *) palloc(sizeof(GroupVarInfo));
+ varinfo->var = expr;
+ varinfo->rel = root->simple_rel_array[relid];
+ varinfo->ndistinct = 0.0;
+ varinfo->isdefault = false;
+ varinfos = lappend(varinfos, varinfo);
+
+ /*
+ * Remember the link to RestrictInfo for the case the clause
+ * is failed to be estimated.
+ */
+ origin_rinfos = lappend(origin_rinfos, rinfo);
+ }
+ else
+ /* This clause can't be estimated with extended statistics */
+ otherclauses = lappend(otherclauses, rinfo);
+
+ clauses = foreach_delete_current(clauses, lc);
+ }
+
+ if (list_length(varinfos) < 2)
+ {
+ /*
+ * Multivariate statistics doesn't apply to single columns except
+ * for expressions, but it has not been implemented yet.
+ */
+ otherclauses = list_concat(otherclauses, origin_rinfos);
+ list_free_deep(varinfos);
+ list_free(origin_rinfos);
+ continue;
+ }
+
+ Assert(group_rel != NULL);
+
+ /* Let's employ extended statistics. */
+ origin_varinfos = varinfos;
+ for (;;)
+ {
+ bool estimated = estimate_multivariate_ndistinct(root,
+ group_rel,
+ &varinfos,
+ &mvndistinct);
+
+ if (!estimated)
+ break;
+
+ /*
+ * We've got an estimation. Use ndistinct value in a consistent
+ * way - according to the caller's logic (see
+ * final_cost_hashjoin).
+ */
+ if (ndistinct < mvndistinct)
+ ndistinct = mvndistinct;
+ Assert(ndistinct >= 1.0);
+ }
+
+ Assert(list_length(origin_varinfos) == list_length(origin_rinfos));
+
+ /*
+ * Remove already estimated clauses from the origin_rinfos
+ */
+ forboth(lc1, origin_varinfos, lc2, origin_rinfos)
+ {
+ GroupVarInfo *vinfo = lfirst(lc1);
+
+ if (!list_member_ptr(varinfos, vinfo))
+ /* Already estimated */
+ continue;
+
+ /* Can't be estimated here - push to the returning list */
+ otherclauses = lappend(otherclauses, lfirst(lc2));
+ }
+ }
+
+ *innerbucketsize = 1.0 / ndistinct;
+ return otherclauses;
+}
+
/*
* Estimate hash bucket statistics when the specified expression is used
* as a hash key for the given number of buckets.
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index d35939651f3..82ac8c6d9da 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -218,6 +218,10 @@ extern double estimate_num_groups(PlannerInfo *root, List *groupExprs,
double input_rows, List **pgset,
EstimationInfo *estinfo);
+extern List *estimate_multivariate_bucketsize(PlannerInfo *root,
+ RelOptInfo *inner,
+ List *hashclauses,
+ Selectivity *innerbucketsize);
extern void estimate_hash_bucket_stats(PlannerInfo *root,
Node *hashkey, double nbuckets,
Selectivity *mcv_freq,
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 904d3e623f5..686d8c93aa8 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -3383,3 +3383,48 @@ SELECT * FROM check_estimated_rows('
(1 row)
DROP TABLE grouping_unique;
+--
+-- Extended statistics on sb_2 (x, y, z) improve a bucket size estimation,
+-- and the optimizer may choose hash join.
+--
+CREATE TABLE sb_1 AS
+ SELECT gs % 10 AS x, gs % 10 AS y, gs % 10 AS z
+ FROM generate_series(1, 1e4) AS gs;
+CREATE TABLE sb_2 AS
+ SELECT gs % 49 AS x, gs % 51 AS y, gs % 73 AS z, 'abc' || gs AS payload
+ FROM generate_series(1, 1e4) AS gs;
+ANALYZE sb_1, sb_2;
+-- During hash join estimation, the number of distinct values on each column
+-- is calculated. The optimizer selects the smallest number of distinct values
+-- and the largest hash bucket size. The optimizer decides that the hash
+-- bucket size is quite big because there are possibly many correlations.
+EXPLAIN (COSTS OFF) -- Choose merge join
+SELECT * FROM sb_1 a, sb_2 b WHERE a.x = b.x AND a.y = b.y AND a.z = b.z;
+ QUERY PLAN
+-------------------------------------------------------------
+ Merge Join
+ Merge Cond: ((a.z = b.z) AND (a.x = b.x) AND (a.y = b.y))
+ -> Sort
+ Sort Key: a.z, a.x, a.y
+ -> Seq Scan on sb_1 a
+ -> Sort
+ Sort Key: b.z, b.x, b.y
+ -> Seq Scan on sb_2 b
+(8 rows)
+
+-- The ndistinct extended statistics on (x, y, z) provides more reliable value
+-- of bucket size.
+CREATE STATISTICS extstat_sb_2 (ndistinct) ON x, y, z FROM sb_2;
+ANALYZE sb_2;
+EXPLAIN (COSTS OFF) -- Choose hash join
+SELECT * FROM sb_1 a, sb_2 b WHERE a.x = b.x AND a.y = b.y AND a.z = b.z;
+ QUERY PLAN
+------------------------------------------------------------
+ Hash Join
+ Hash Cond: ((a.x = b.x) AND (a.y = b.y) AND (a.z = b.z))
+ -> Seq Scan on sb_1 a
+ -> Hash
+ -> Seq Scan on sb_2 b
+(5 rows)
+
+DROP TABLE sb_1, sb_2 CASCADE;
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 88b33ccaef8..b71a6cd089f 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -1719,3 +1719,32 @@ SELECT * FROM check_estimated_rows('
ON t1.t1 = q1.x;
');
DROP TABLE grouping_unique;
+
+--
+-- Extended statistics on sb_2 (x, y, z) improve a bucket size estimation,
+-- and the optimizer may choose hash join.
+--
+CREATE TABLE sb_1 AS
+ SELECT gs % 10 AS x, gs % 10 AS y, gs % 10 AS z
+ FROM generate_series(1, 1e4) AS gs;
+CREATE TABLE sb_2 AS
+ SELECT gs % 49 AS x, gs % 51 AS y, gs % 73 AS z, 'abc' || gs AS payload
+ FROM generate_series(1, 1e4) AS gs;
+ANALYZE sb_1, sb_2;
+
+-- During hash join estimation, the number of distinct values on each column
+-- is calculated. The optimizer selects the smallest number of distinct values
+-- and the largest hash bucket size. The optimizer decides that the hash
+-- bucket size is quite big because there are possibly many correlations.
+EXPLAIN (COSTS OFF) -- Choose merge join
+SELECT * FROM sb_1 a, sb_2 b WHERE a.x = b.x AND a.y = b.y AND a.z = b.z;
+
+-- The ndistinct extended statistics on (x, y, z) provides more reliable value
+-- of bucket size.
+CREATE STATISTICS extstat_sb_2 (ndistinct) ON x, y, z FROM sb_2;
+ANALYZE sb_2;
+
+EXPLAIN (COSTS OFF) -- Choose hash join
+SELECT * FROM sb_1 a, sb_2 b WHERE a.x = b.x AND a.y = b.y AND a.z = b.z;
+
+DROP TABLE sb_1, sb_2 CASCADE;
--
2.39.5 (Apple Git-154)
Hi,
On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:
I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.
I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.
Andres
On Fri, 11 Apr 2025 at 00:27, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:
I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.
I think the patch from that mail got committed as 6bb6a62f about a
month ago, which was shortly after Alexander's message. Did you get
confused about the month of his message, or by the incorrect state of
the CF entry?
Kind regards,
Matthias van de Meent
Hi,
On 2025-04-11 00:47:19 +0200, Matthias van de Meent wrote:
On Fri, 11 Apr 2025 at 00:27, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:
I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.I think the patch from that mail got committed as 6bb6a62f about a
month ago, which was shortly after Alexander's message. Did you get
confused about the month of his message, or by the incorrect state of
the CF entry?
Sorry for that Alexander - for some reason the email just showed up as new in
my inbox and I only looked at the date, not the month :(
Greetings,
Andres Freund
On Fri, Apr 11, 2025 at 5:06 AM Andres Freund <andres@anarazel.de> wrote:
On 2025-04-11 00:47:19 +0200, Matthias van de Meent wrote:
On Fri, 11 Apr 2025 at 00:27, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:
I've revised commit message, comments, formatting etc.
I'm going to push this if no objections.I'm rather confused as to why this is a thing to push at this point? This
doesn't seem to be a bugfix and it's post feature freeze.I think the patch from that mail got committed as 6bb6a62f about a
month ago, which was shortly after Alexander's message. Did you get
confused about the month of his message, or by the incorrect state of
the CF entry?Sorry for that Alexander - for some reason the email just showed up as new in
my inbox and I only looked at the date, not the month :(
Not a problem at all!
------
Regards,
Alexander Korotkov
Supabase
Hi,
While I debug hashjoin codes, in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.
List *clauses = list_copy(hashclauses);
...
I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.
Any thoughts?
--
Thanks, Tender Wang
Attachments:
0001-Adjust-the-place-of-list_copy-in-estimate_multivaria.patchtext/plain; charset=US-ASCII; name=0001-Adjust-the-place-of-list_copy-in-estimate_multivaria.patchDownload
From d4a29af25ef276a145b46ecc16d631909fb1891a Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Mon, 14 Apr 2025 13:53:15 +0800
Subject: [PATCH] Adjust the place of list_copy() in
estimate_multivariate_bucketsize.
If we have a single hashclause, list_copy(hashclauses) is unnecessary.
This adjustment can save the overhead of function calls and memory
copies.
---
src/backend/utils/adt/selfuncs.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 588d991fa57..786fea16bc5 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3799,18 +3799,18 @@ estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
List *hashclauses,
Selectivity *innerbucketsize)
{
- List *clauses = list_copy(hashclauses);
+ List *clauses = NIL;
List *otherclauses = NIL;
double ndistinct = 1.0;
+ /*
+ * Nothing to do for a single clause. Could we employ univariate
+ * extended stat here?
+ */
if (list_length(hashclauses) <= 1)
-
- /*
- * Nothing to do for a single clause. Could we employ univariate
- * extended stat here?
- */
return hashclauses;
+ clauses = list_copy(hashclauses);
while (clauses != NIL)
{
ListCell *lc;
--
2.34.1
Tender Wang <tndrwang@gmail.com> 于2025年4月14日周一 14:17写道:
Hi,
While I debug hashjoin codes, in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.List *clauses = list_copy(hashclauses);
...I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.Any thoughts?
Hi Alexander,
In the last thread, I found a minor optimization for the code in
estimate_multivariate_bucketsize().
Adjust the place of list_copy() at the start of
estimate_multivariate_bucketsize, and we can avoid unnecessarily creating a
new list
and memory copy if we have only a single hash clause.
Do you think it's worth doing this?
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2025年4月24日周四 22:07写道:
Tender Wang <tndrwang@gmail.com> 于2025年4月14日周一 14:17写道:
Hi,
While I debug hashjoin codes, in estimate_multivariate_bucketsize(), I
find that
the list_copy(hashclauses) below is unnecessary if we have a single join
clause.List *clauses = list_copy(hashclauses);
...I adjust the place of list_copy() call as the attached patch.
This can save some overhead of function calls and memory copies.Any thoughts?
Hi Alexander,
In the last thread, I found a minor optimization for the code in
estimate_multivariate_bucketsize().
Adjust the place of list_copy() at the start of
estimate_multivariate_bucketsize, and we can avoid unnecessarily creating a
new list
and memory copy if we have only a single hash clause.Do you think it's worth doing this?
Hi all,
I have added this patch to commitfest[1]https://commitfest.postgresql.org/patch/5704/. I'm hoping someone can review it
for me.
[1]: https://commitfest.postgresql.org/patch/5704/
--
Thanks,
Tender Wang
Andrei Lepikhov <lepihov@gmail.com> 于2025年7月2日周三 22:29写道:
On 30/6/2025 04:38, Tender Wang wrote:
Do you think it's worth doing this?
Hi all,
I have added this patch to commitfest[1]. I'm hoping someone can review
it for me.It makes sense to apply. If you return the comment to its place, you may
reduce the patch size even more.
Thanks for reviewing. I returned the comment to its place. Please review
the attached patch.
--
Thanks,
Tender Wang
Attachments:
v2-0001-If-we-have-a-single-hashclause-list_copy-hashclau.patchapplication/octet-stream; name=v2-0001-If-we-have-a-single-hashclause-list_copy-hashclau.patchDownload
From c0e81860dfdcd4285256c7915439d687126fc5d3 Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Thu, 3 Jul 2025 09:56:02 +0800
Subject: [PATCH v2] If we have a single hashclause, list_copy(hashclauses) is
unnecessary. This adjustment can save the overhead of function calls and
memory copies.
---
src/backend/utils/adt/selfuncs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ce6a626eba2..bb52bd28266 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3798,7 +3798,7 @@ estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
List *hashclauses,
Selectivity *innerbucketsize)
{
- List *clauses = list_copy(hashclauses);
+ List *clauses = NIL;
List *otherclauses = NIL;
double ndistinct = 1.0;
@@ -3810,6 +3810,7 @@ estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
*/
return hashclauses;
+ clauses = list_copy(hashclauses);
while (clauses != NIL)
{
ListCell *lc;
--
2.34.1
Import Notes
Reply to msg id not found: 3128c924-df05-4558-9d41-831d195189a5@gmail.com
On 3/7/2025 04:02, Tender Wang wrote:
Andrei Lepikhov <lepihov@gmail.com <mailto:lepihov@gmail.com>> 于2025年7
月2日周三 22:29写道:On 30/6/2025 04:38, Tender Wang wrote:
Do you think it's worth doing this?
Hi all,
I have added this patch to commitfest[1]. I'm hoping someone can
review
it for me.
It makes sense to apply. If you return the comment to its place, you
may
reduce the patch size even more.Thanks for reviewing. I returned the comment to its place. Please review
the attached patch.
Do you really need to initialise clauses with the NIL value? I guess, it
may be avoided because later you non-alternatively init it with a copy
of hash clauses.
--
regards, Andrei Lepikhov
Andrei Lepikhov <lepihov@gmail.com> 于2025年7月3日周四 17:23写道:
On 3/7/2025 04:02, Tender Wang wrote:
Andrei Lepikhov <lepihov@gmail.com <mailto:lepihov@gmail.com>> 于2025年7
月2日周三 22:29写道:On 30/6/2025 04:38, Tender Wang wrote:
Do you think it's worth doing this?
Hi all,
I have added this patch to commitfest[1]. I'm hoping someone can
review
it for me.
It makes sense to apply. If you return the comment to its place, you
may
reduce the patch size even more.Thanks for reviewing. I returned the comment to its place. Please review
the attached patch.Do you really need to initialise clauses with the NIL value? I guess, it
may be avoided because later you non-alternatively init it with a copy
of hash clauses.
Yeah, no need to initialize clauses to NIL.
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2025年7月3日周四 17:48写道:
Andrei Lepikhov <lepihov@gmail.com> 于2025年7月3日周四 17:23写道:
On 3/7/2025 04:02, Tender Wang wrote:
Andrei Lepikhov <lepihov@gmail.com <mailto:lepihov@gmail.com>> 于2025年7
月2日周三 22:29写道:On 30/6/2025 04:38, Tender Wang wrote:
Do you think it's worth doing this?
Hi all,
I have added this patch to commitfest[1]. I'm hoping someone can
review
it for me.
It makes sense to apply. If you return the comment to its place, you
may
reduce the patch size even more.Thanks for reviewing. I returned the comment to its place. Please
review
the attached patch.
Do you really need to initialise clauses with the NIL value? I guess, it
may be avoided because later you non-alternatively init it with a copy
of hash clauses.Yeah, no need to initialize clauses to NIL.
Please take a look at the attached v3 patch.
And I have changed the status to Ready for Committer in the [1]https://commitfest.postgresql.org/patch/5704/.
[1]: https://commitfest.postgresql.org/patch/5704/
--
Thanks,
Tender Wang
Attachments:
v3-0001-If-we-have-a-single-hashclause-list_copy-hashclau.patchapplication/octet-stream; name=v3-0001-If-we-have-a-single-hashclause-list_copy-hashclau.patchDownload
From fded17402724ff1095be5dca434efd34edfde187 Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Thu, 3 Jul 2025 09:56:02 +0800
Subject: [PATCH v3] If we have a single hashclause, list_copy(hashclauses) is
unnecessary. This adjustment can save the overhead of function calls and
memory operation.
---
src/backend/utils/adt/selfuncs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ce6a626eba2..b57e7262596 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3798,7 +3798,7 @@ estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
List *hashclauses,
Selectivity *innerbucketsize)
{
- List *clauses = list_copy(hashclauses);
+ List *clauses;
List *otherclauses = NIL;
double ndistinct = 1.0;
@@ -3810,6 +3810,7 @@ estimate_multivariate_bucketsize(PlannerInfo *root, RelOptInfo *inner,
*/
return hashclauses;
+ clauses = list_copy(hashclauses);
while (clauses != NIL)
{
ListCell *lc;
--
2.34.1
Tender Wang <tndrwang@gmail.com> writes:
Please take a look at the attached v3 patch.
This is of course not ever going to make any visible performance
difference. Still, the code is arguably clearer like this, and
I added some comments in hopes of improving it some more.
Pushed.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> 于2025年7月20日周日 02:25写道:
Tender Wang <tndrwang@gmail.com> writes:
Please take a look at the attached v3 patch.
This is of course not ever going to make any visible performance
difference. Still, the code is arguably clearer like this, and
I added some comments in hopes of improving it some more.
Pushed.regards, tom lane
Thanks for pushing.
--
Thanks,
Tender Wang