Consider the number of columns in the sort cost model

Started by Andrei Lepikhovover 1 year ago14 messages
#1Andrei Lepikhov
lepihov@gmail.com
1 attachment(s)

Hi,

I would like to propose a slight elaboration of the sort cost model.
In practice, we frequently see the choice of suboptimal sortings, which
slump performance by 10-50%.

The key reason here is the optimiser's blindness to the fact that
sorting calls a comparison operator for each pair of attributes in the
tuple. Just for example:

SET work_mem = '128MB';
CREATE TABLE test (
x1 int,x2 int,x3 int,x4 int,x5 int,x6 int,x7 int,x8 int,payload text);
INSERT INTO test (x1,x2,x3,x4,x5,x6,x7,x8,payload) (
SELECT 1,2,3,4,5,6,7,(1E5-x), 'abc'||x
FROM generate_series(1,1E5) AS x);
VACUUM ANALYZE test;

Let's execute the following three queries:

EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x8;
EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;
EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x4,x5,x6,x7,x8;

My laptop executes these three queries in 100ms, 300ms, and 560ms,
respectively. At the same time, the costs of these query plans are
similar. This corner case shows that sorting matters and in the corner
case, dependence on the number of comparisons looks close to linear.

The patch in the attachment is a simplistic attempt to differentiate
these sortings by the number of columns. It is not ideal because
comparisons depend on the duplicates in each column, but it may be the
subject of further work.

Even such a trivial model allows us to resolve the case like the below:
CREATE INDEX ON test (x1,x2,x3,x8);
EXPLAIN (ANALYZE) SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;

The current master will choose SeqScan for four columns as well as for a
single column. With the model, the optimiser will understand that
sorting four columns costs more than sorting a single column and switch
to IndexScan.

--
regards, Andrei Lepikhov

Attachments:

0001-Introduce-the-number-of-columns-in-the-cost-sort-mod.patchtext/plain; charset=UTF-8; name=0001-Introduce-the-number-of-columns-in-the-cost-sort-mod.patchDownload
From 5f047613befb2fb2c20c2b7173dd3a172f1cfcb8 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Wed, 21 Aug 2024 18:06:41 +0200
Subject: [PATCH] Introduce the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 contrib/postgres_fdw/postgres_fdw.c           |  2 +-
 src/backend/optimizer/path/costsize.c         | 34 ++++++----
 src/backend/optimizer/plan/createplan.c       |  2 +-
 src/backend/optimizer/plan/planner.c          |  2 +-
 src/backend/optimizer/prep/prepunion.c        |  2 +-
 src/backend/optimizer/util/pathnode.c         |  8 +--
 src/include/optimizer/cost.h                  |  2 +-
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 12 files changed, 95 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f3eb055e2c..06590faf9c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                                                                     QUERY PLAN                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                                    QUERY PLAN                                                                                     
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..f9a2e88a17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root,
 
 		cost_sort(&sort_path,
 				  root,
-				  pathkeys,
+				  list_length(pathkeys),
 				  0,
 				  *p_startup_cost + *p_run_cost,
 				  retrieved_rows,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..a4d5e94b65 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -493,6 +493,10 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			npathkeys = list_length(((Path *) path)->pathkeys);
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	/* Mark the path with the correct row estimate */
 	if (rows)
@@ -512,7 +516,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -1896,7 +1900,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-			   double tuples, int width,
+			   double tuples, int width, int cmpMultiplier,
 			   Cost comparison_cost, int sort_mem,
 			   double limit_tuples)
 {
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		tuples = 2.0;
 
 	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
+	comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2086,7 +2090,9 @@ cost_incremental_sort(Path *path,
 	 * are equal.
 	 */
 	cost_tuplesort(&group_startup_cost, &group_run_cost,
-				   group_tuples, width, comparison_cost, sort_mem,
+				   group_tuples, width,
+				   list_length(pathkeys),
+				   comparison_cost, sort_mem,
 				   limit_tuples);
 
 	/*
@@ -2110,7 +2116,7 @@ cost_incremental_sort(Path *path,
 	 * detect the sort groups. This is roughly equal to one extra copy and
 	 * comparison per tuple.
 	 */
-	run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+	run_cost += (cpu_tuple_cost + (list_length(pathkeys) + 1) * comparison_cost) * input_tuples;
 
 	/*
 	 * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2142,7 +2148,7 @@ cost_incremental_sort(Path *path,
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
-		  List *pathkeys, int input_disabled_nodes,
+		  int npathkeys, int input_disabled_nodes,
 		  Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples)
@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	cost_tuplesort(&startup_cost, &run_cost,
-				   tuples, width,
+				   tuples, width, cmpMultiplier,
 				   comparison_cost, sort_mem,
 				   limit_tuples);
 
@@ -2321,7 +2330,7 @@ cost_append(AppendPath *apath)
 					 */
 					cost_sort(&sort_path,
 							  NULL, /* doesn't currently need root */
-							  pathkeys,
+							  list_length(pathkeys),
 							  subpath->disabled_nodes,
 							  subpath->total_cost,
 							  subpath->rows,
@@ -2440,6 +2449,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			cmpMultiplier = list_length(pathkeys);
+
+	Assert(pathkeys != NIL);
 
 	/*
 	 * Avoid log(0)...
@@ -2448,7 +2460,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -3684,7 +3696,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	{
 		cost_sort(&sort_path,
 				  root,
-				  outersortkeys,
+				  list_length(outersortkeys),
 				  outer_path->disabled_nodes,
 				  outer_path->total_cost,
 				  outer_path_rows,
@@ -3713,7 +3725,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	{
 		cost_sort(&sort_path,
 				  root,
-				  innersortkeys,
+				  list_length(innersortkeys),
 				  inner_path->disabled_nodes,
 				  inner_path->total_cost,
 				  inner_path_rows,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8e0e5977a9..87b7277e26 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5453,7 +5453,7 @@ label_sort_with_costsize(PlannerInfo *root, Sort *plan, double limit_tuples)
 	 */
 	Assert(IsA(plan, Sort));
 
-	cost_sort(&sort_path, root, NIL,
+	cost_sort(&sort_path, root, list_length(lefttree->targetlist),
 			  lefttree->total_cost,
 			  plan->plan.disabled_nodes,
 			  lefttree->plan_rows,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b5827d3980..14930c8216 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6747,7 +6747,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of seq scan + sort */
 	seqScanPath = create_seqscan_path(root, rel, NULL, 0);
-	cost_sort(&seqScanAndSortPath, root, NIL,
+	cost_sort(&seqScanAndSortPath, root, rel->max_attr,
 			  seqScanPath->disabled_nodes,
 			  seqScanPath->total_cost, rel->tuples, rel->reltarget->width,
 			  comparisonCost, maintenance_work_mem, -1.0);
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..0d51e1f612 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1358,7 +1358,7 @@ choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 	sorted_p.startup_cost = input_path->startup_cost;
 	sorted_p.total_cost = input_path->total_cost;
 	/* XXX cost_sort doesn't actually look at pathkeys, so just pass NIL */
-	cost_sort(&sorted_p, root, NIL, sorted_p.disabled_nodes,
+	cost_sort(&sorted_p, root, list_length(input_path->pathtarget->exprs), sorted_p.disabled_nodes,
 			  sorted_p.total_cost,
 			  input_path->rows, input_path->pathtarget->width,
 			  0.0, work_mem, -1.0);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..393f993176 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1537,7 +1537,7 @@ create_merge_append_path(PlannerInfo *root,
 
 			cost_sort(&sort_path,
 					  root,
-					  pathkeys,
+					  list_length(pathkeys),
 					  subpath->disabled_nodes,
 					  subpath->total_cost,
 					  subpath->rows,
@@ -1866,7 +1866,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		/*
 		 * Estimate cost for sort+unique implementation
 		 */
-		cost_sort(&sort_path, root, NIL,
+		cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs),
 				  subpath->disabled_nodes,
 				  subpath->total_cost,
 				  rel->rows,
@@ -3101,7 +3101,7 @@ create_sort_path(PlannerInfo *root,
 
 	pathnode->subpath = subpath;
 
-	cost_sort(&pathnode->path, root, pathkeys,
+	cost_sort(&pathnode->path, root, list_length(pathkeys),
 			  subpath->disabled_nodes,
 			  subpath->total_cost,
 			  subpath->rows,
@@ -3436,7 +3436,7 @@ create_groupingsets_path(PlannerInfo *root,
 			else
 			{
 				/* Account for cost of sort, but don't charge input cost again */
-				cost_sort(&sort_path, root, NIL, 0,
+				cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs), 0,
 						  0.0,
 						  subpath->rows,
 						  subpath->pathtarget->width,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944..b8113b8748 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -108,7 +108,7 @@ extern void cost_resultscan(Path *path, PlannerInfo *root,
 							RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_recursive_union(Path *runion, Path *nrterm, Path *rterm);
 extern void cost_sort(Path *path, PlannerInfo *root,
-					  List *pathkeys, int disabled_nodes,
+					  int npathkeys, int disabled_nodes,
 					  Cost input_cost, double tuples, int width,
 					  Cost comparison_cost, int sort_mem,
 					  double limit_tuples);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8ac13b562c..2503ad03fb 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3016,17 +3016,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 31fb7d142e..efbd5e0182 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5726,18 +5726,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 108f9ecb44..886dc559a5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1274,9 +1274,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1325,7 +1327,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 0456d48c93..ec52215ae4 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1225,18 +1225,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1290,19 +1289,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1323,24 +1321,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.46.0

#2Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#1)
1 attachment(s)
Re: Consider the number of columns in the sort cost model

On 8/23/24 01:46, Andrei Lepikhov wrote:

Just a rebase onto current master to make cf-bot be happy.

--
regards, Andrei Lepikhov

Attachments:

v2-0001-Introduce-the-number-of-columns-in-the-cost-sort-.patchtext/x-patch; charset=UTF-8; name=v2-0001-Introduce-the-number-of-columns-in-the-cost-sort-.patchDownload
From 73472897442516f0df4ed945cda28d125df08197 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 8 Oct 2024 11:20:46 +0700
Subject: [PATCH v2] Introduce the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 contrib/postgres_fdw/postgres_fdw.c           |  2 +-
 src/backend/optimizer/path/costsize.c         | 34 ++++++----
 src/backend/optimizer/plan/createplan.c       |  2 +-
 src/backend/optimizer/plan/planner.c          |  2 +-
 src/backend/optimizer/prep/prepunion.c        |  2 +-
 src/backend/optimizer/util/pathnode.c         |  8 +--
 src/include/optimizer/cost.h                  |  2 +-
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 12 files changed, 95 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98..5da33ab69e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                                                                     QUERY PLAN                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                                    QUERY PLAN                                                                                     
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..f9a2e88a17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root,
 
 		cost_sort(&sort_path,
 				  root,
-				  pathkeys,
+				  list_length(pathkeys),
 				  0,
 				  *p_startup_cost + *p_run_cost,
 				  retrieved_rows,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..a4d5e94b65 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -493,6 +493,10 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			npathkeys = list_length(((Path *) path)->pathkeys);
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	/* Mark the path with the correct row estimate */
 	if (rows)
@@ -512,7 +516,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -1896,7 +1900,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-			   double tuples, int width,
+			   double tuples, int width, int cmpMultiplier,
 			   Cost comparison_cost, int sort_mem,
 			   double limit_tuples)
 {
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		tuples = 2.0;
 
 	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
+	comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2086,7 +2090,9 @@ cost_incremental_sort(Path *path,
 	 * are equal.
 	 */
 	cost_tuplesort(&group_startup_cost, &group_run_cost,
-				   group_tuples, width, comparison_cost, sort_mem,
+				   group_tuples, width,
+				   list_length(pathkeys),
+				   comparison_cost, sort_mem,
 				   limit_tuples);
 
 	/*
@@ -2110,7 +2116,7 @@ cost_incremental_sort(Path *path,
 	 * detect the sort groups. This is roughly equal to one extra copy and
 	 * comparison per tuple.
 	 */
-	run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+	run_cost += (cpu_tuple_cost + (list_length(pathkeys) + 1) * comparison_cost) * input_tuples;
 
 	/*
 	 * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2142,7 +2148,7 @@ cost_incremental_sort(Path *path,
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
-		  List *pathkeys, int input_disabled_nodes,
+		  int npathkeys, int input_disabled_nodes,
 		  Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples)
@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	cost_tuplesort(&startup_cost, &run_cost,
-				   tuples, width,
+				   tuples, width, cmpMultiplier,
 				   comparison_cost, sort_mem,
 				   limit_tuples);
 
@@ -2321,7 +2330,7 @@ cost_append(AppendPath *apath)
 					 */
 					cost_sort(&sort_path,
 							  NULL, /* doesn't currently need root */
-							  pathkeys,
+							  list_length(pathkeys),
 							  subpath->disabled_nodes,
 							  subpath->total_cost,
 							  subpath->rows,
@@ -2440,6 +2449,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			cmpMultiplier = list_length(pathkeys);
+
+	Assert(pathkeys != NIL);
 
 	/*
 	 * Avoid log(0)...
@@ -2448,7 +2460,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -3684,7 +3696,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	{
 		cost_sort(&sort_path,
 				  root,
-				  outersortkeys,
+				  list_length(outersortkeys),
 				  outer_path->disabled_nodes,
 				  outer_path->total_cost,
 				  outer_path_rows,
@@ -3713,7 +3725,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 	{
 		cost_sort(&sort_path,
 				  root,
-				  innersortkeys,
+				  list_length(innersortkeys),
 				  inner_path->disabled_nodes,
 				  inner_path->total_cost,
 				  inner_path_rows,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bb45ef318f..381fe0ed45 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5453,7 +5453,7 @@ label_sort_with_costsize(PlannerInfo *root, Sort *plan, double limit_tuples)
 	 */
 	Assert(IsA(plan, Sort));
 
-	cost_sort(&sort_path, root, NIL,
+	cost_sort(&sort_path, root, list_length(lefttree->targetlist),
 			  plan->plan.disabled_nodes,
 			  lefttree->total_cost,
 			  lefttree->plan_rows,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d92d43a17e..728aecfd6a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6854,7 +6854,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of seq scan + sort */
 	seqScanPath = create_seqscan_path(root, rel, NULL, 0);
-	cost_sort(&seqScanAndSortPath, root, NIL,
+	cost_sort(&seqScanAndSortPath, root, rel->max_attr,
 			  seqScanPath->disabled_nodes,
 			  seqScanPath->total_cost, rel->tuples, rel->reltarget->width,
 			  comparisonCost, maintenance_work_mem, -1.0);
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..0d51e1f612 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1358,7 +1358,7 @@ choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 	sorted_p.startup_cost = input_path->startup_cost;
 	sorted_p.total_cost = input_path->total_cost;
 	/* XXX cost_sort doesn't actually look at pathkeys, so just pass NIL */
-	cost_sort(&sorted_p, root, NIL, sorted_p.disabled_nodes,
+	cost_sort(&sorted_p, root, list_length(input_path->pathtarget->exprs), sorted_p.disabled_nodes,
 			  sorted_p.total_cost,
 			  input_path->rows, input_path->pathtarget->width,
 			  0.0, work_mem, -1.0);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..393f993176 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1537,7 +1537,7 @@ create_merge_append_path(PlannerInfo *root,
 
 			cost_sort(&sort_path,
 					  root,
-					  pathkeys,
+					  list_length(pathkeys),
 					  subpath->disabled_nodes,
 					  subpath->total_cost,
 					  subpath->rows,
@@ -1866,7 +1866,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		/*
 		 * Estimate cost for sort+unique implementation
 		 */
-		cost_sort(&sort_path, root, NIL,
+		cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs),
 				  subpath->disabled_nodes,
 				  subpath->total_cost,
 				  rel->rows,
@@ -3101,7 +3101,7 @@ create_sort_path(PlannerInfo *root,
 
 	pathnode->subpath = subpath;
 
-	cost_sort(&pathnode->path, root, pathkeys,
+	cost_sort(&pathnode->path, root, list_length(pathkeys),
 			  subpath->disabled_nodes,
 			  subpath->total_cost,
 			  subpath->rows,
@@ -3436,7 +3436,7 @@ create_groupingsets_path(PlannerInfo *root,
 			else
 			{
 				/* Account for cost of sort, but don't charge input cost again */
-				cost_sort(&sort_path, root, NIL, 0,
+				cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs), 0,
 						  0.0,
 						  subpath->rows,
 						  subpath->pathtarget->width,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944..b8113b8748 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -108,7 +108,7 @@ extern void cost_resultscan(Path *path, PlannerInfo *root,
 							RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_recursive_union(Path *runion, Path *nrterm, Path *rterm);
 extern void cost_sort(Path *path, PlannerInfo *root,
-					  List *pathkeys, int disabled_nodes,
+					  int npathkeys, int disabled_nodes,
 					  Cost input_cost, double tuples, int width,
 					  Cost comparison_cost, int sort_mem,
 					  double limit_tuples);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8ac13b562c..2503ad03fb 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3016,17 +3016,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 12abd3a0e7..f6a29905ca 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5736,18 +5736,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 108f9ecb44..886dc559a5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1274,9 +1274,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1325,7 +1327,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 0456d48c93..ec52215ae4 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1225,18 +1225,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1290,19 +1289,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1323,24 +1321,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.39.5

#3Kirill Reshke
reshkekirill@gmail.com
In reply to: Andrei Lepikhov (#1)
Re: Consider the number of columns in the sort cost model

Hi!

On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov <lepihov@gmail.com> wrote:

Hi,

I would like to propose a slight elaboration of the sort cost model.
In practice, we frequently see the choice of suboptimal sortings, which
slump performance by 10-50%.

The key reason here is the optimiser's blindness to the fact that
sorting calls a comparison operator for each pair of attributes in the
tuple. Just for example:

SET work_mem = '128MB';
CREATE TABLE test (
x1 int,x2 int,x3 int,x4 int,x5 int,x6 int,x7 int,x8 int,payload text);
INSERT INTO test (x1,x2,x3,x4,x5,x6,x7,x8,payload) (
SELECT 1,2,3,4,5,6,7,(1E5-x), 'abc'||x
FROM generate_series(1,1E5) AS x);
VACUUM ANALYZE test;

Let's execute the following three queries:

EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x8;
EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;
EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x4,x5,x6,x7,x8;

My laptop executes these three queries in 100ms, 300ms, and 560ms,
respectively. At the same time, the costs of these query plans are
similar. This corner case shows that sorting matters and in the corner
case, dependence on the number of comparisons looks close to linear.

Indeed. My numbers are 44.167, 88.759,136.864 ms:

The patch in the attachment is a simplistic attempt to differentiate
these sortings by the number of columns. It is not ideal because
comparisons depend on the duplicates in each column, but it may be the
subject of further work.

Even such a trivial model allows us to resolve the case like the below:
CREATE INDEX ON test (x1,x2,x3,x8);
EXPLAIN (ANALYZE) SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;

The current master will choose SeqScan for four columns as well as for a
single column. With the model, the optimiser will understand that
sorting four columns costs more than sorting a single column and switch
to IndexScan.

--
regards, Andrei Lepikhov

I configm, before applying your patch Index Scan is not chosen,
whereas after it does.

Two questions:
1) Why do we change the `cost_sort` definition? We can calculate the
length of `patchkeys` inside cost_sort if we want, just like we do
inside cost_gather_merge. No need to make extension developers suffer
with API change, isn't it?

2) I'm not very sure about the (1.0 + cmpMultiplier) coefficient. I
understand that we need to multiply by something that is not 0 (and
that's why we add 1), and this is strictly better than multiplying by
2, but why exactly is this formula like this, not (1.0 + cmpMultiplier
^ 2 / 10) for example? Maybe we need some more sophisticated approach?

Also, this patch needs to be rebased, again.

--
Best regards,
Kirill Reshke

#4Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Kirill Reshke (#3)
1 attachment(s)
Re: Consider the number of columns in the sort cost model

Hi!

On 14.10.2024 22:59, Kirill Reshke wrote:

The patch in the attachment is a simplistic attempt to differentiate
these sortings by the number of columns. It is not ideal because
comparisons depend on the duplicates in each column, but it may be the
subject of further work.

Even such a trivial model allows us to resolve the case like the below:
CREATE INDEX ON test (x1,x2,x3,x8);
EXPLAIN (ANALYZE) SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;

The current master will choose SeqScan for four columns as well as for a
single column. With the model, the optimiser will understand that
sorting four columns costs more than sorting a single column and switch
to IndexScan.

--
regards, Andrei Lepikhov

Also, this patch needs to be rebased, again.

I also noticed that the code needs to be updated, so I rebased the code
to master during the review and attached a patch.

I have a question about estimating the cost of an Append node with your
sort cost model.

I see that if pathkeys is not a subset of subpath pathkeys, then we
calculate the cost taking pathkeys into account. However, I didn't
notice the estimation like that for the opposite situation.
I see that the cost of Append is the sum of the subpath costs, but we
don't take pathkeys into account for subpaths, right?

--
Regards,
Alena Rybakina
Postgres Professional

Attachments:

0001-Introduce-the-number-of-columns-in-the-cost-sort-mod.patch.no-cfbottext/plain; charset=UTF-8; name=0001-Introduce-the-number-of-columns-in-the-cost-sort-mod.patch.no-cfbotDownload
From 9e9c5ff8aa977ec339c20433a427d98e15f8ed9d Mon Sep 17 00:00:00 2001
From: Alena Rybakina <a.rybakina@postgrespro.ru>
Date: Tue, 15 Oct 2024 01:05:38 +0300
Subject: [PATCH] Introduce the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 contrib/postgres_fdw/postgres_fdw.c           |  2 +-
 src/backend/optimizer/path/costsize.c         | 34 ++++++----
 src/backend/optimizer/plan/createplan.c       |  2 +-
 src/backend/optimizer/plan/planner.c          |  2 +-
 src/backend/optimizer/prep/prepunion.c        |  2 +-
 src/backend/optimizer/util/pathnode.c         |  8 +--
 src/include/optimizer/cost.h                  |  2 +-
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 12 files changed, 95 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98c..5da33ab69ec 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                                                                     QUERY PLAN                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                                    QUERY PLAN                                                                                     
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1f..f9a2e88a17f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root,
 
 		cost_sort(&sort_path,
 				  root,
-				  pathkeys,
+				  list_length(pathkeys),
 				  0,
 				  *p_startup_cost + *p_run_cost,
 				  retrieved_rows,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df77..7c4e39065b7 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -493,6 +493,10 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			npathkeys = list_length(((Path *) path)->pathkeys);
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	/* Mark the path with the correct row estimate */
 	if (rows)
@@ -512,7 +516,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -1896,7 +1900,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-			   double tuples, int width,
+			   double tuples, int width, int cmpMultiplier,
 			   Cost comparison_cost, int sort_mem,
 			   double limit_tuples)
 {
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		tuples = 2.0;
 
 	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
+	comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2086,7 +2090,9 @@ cost_incremental_sort(Path *path,
 	 * are equal.
 	 */
 	cost_tuplesort(&group_startup_cost, &group_run_cost,
-				   group_tuples, width, comparison_cost, sort_mem,
+				   group_tuples, width,
+				   list_length(pathkeys),
+				   comparison_cost, sort_mem,
 				   limit_tuples);
 
 	/*
@@ -2110,7 +2116,7 @@ cost_incremental_sort(Path *path,
 	 * detect the sort groups. This is roughly equal to one extra copy and
 	 * comparison per tuple.
 	 */
-	run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+	run_cost += (cpu_tuple_cost + (list_length(pathkeys) + 1) * comparison_cost) * input_tuples;
 
 	/*
 	 * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2142,7 +2148,7 @@ cost_incremental_sort(Path *path,
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
-		  List *pathkeys, int input_disabled_nodes,
+		  int npathkeys, int input_disabled_nodes,
 		  Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples)
@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	cost_tuplesort(&startup_cost, &run_cost,
-				   tuples, width,
+				   tuples, width, cmpMultiplier,
 				   comparison_cost, sort_mem,
 				   limit_tuples);
 
@@ -2321,7 +2330,7 @@ cost_append(AppendPath *apath)
 					 */
 					cost_sort(&sort_path,
 							  NULL, /* doesn't currently need root */
-							  pathkeys,
+							  list_length(pathkeys),
 							  subpath->disabled_nodes,
 							  subpath->total_cost,
 							  subpath->rows,
@@ -2440,6 +2449,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			cmpMultiplier = list_length(pathkeys);
+
+	Assert(pathkeys != NIL);
 
 	/*
 	 * Avoid log(0)...
@@ -2448,7 +2460,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -3708,7 +3720,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 		{
 			cost_sort(&sort_path,
 					  root,
-					  outersortkeys,
+					  list_length(outersortkeys),
 					  outer_path->disabled_nodes,
 					  outer_path->total_cost,
 					  outer_path_rows,
@@ -3758,7 +3770,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 
 		cost_sort(&sort_path,
 				  root,
-				  innersortkeys,
+				  list_length(innersortkeys),
 				  inner_path->disabled_nodes,
 				  inner_path->total_cost,
 				  inner_path_rows,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2ed0d81f61..fbb64e4cf3e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5496,7 +5496,7 @@ label_sort_with_costsize(PlannerInfo *root, Sort *plan, double limit_tuples)
 
 	Assert(IsA(plan, Sort));
 
-	cost_sort(&sort_path, root, NIL,
+	cost_sort(&sort_path, root, list_length(lefttree->targetlist),
 			  plan->plan.disabled_nodes,
 			  lefttree->total_cost,
 			  lefttree->plan_rows,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e96847..f326e1c34b9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6855,7 +6855,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of seq scan + sort */
 	seqScanPath = create_seqscan_path(root, rel, NULL, 0);
-	cost_sort(&seqScanAndSortPath, root, NIL,
+	cost_sort(&seqScanAndSortPath, root, rel->max_attr,
 			  seqScanPath->disabled_nodes,
 			  seqScanPath->total_cost, rel->tuples, rel->reltarget->width,
 			  comparisonCost, maintenance_work_mem, -1.0);
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a18..0d51e1f6127 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1358,7 +1358,7 @@ choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 	sorted_p.startup_cost = input_path->startup_cost;
 	sorted_p.total_cost = input_path->total_cost;
 	/* XXX cost_sort doesn't actually look at pathkeys, so just pass NIL */
-	cost_sort(&sorted_p, root, NIL, sorted_p.disabled_nodes,
+	cost_sort(&sorted_p, root, list_length(input_path->pathtarget->exprs), sorted_p.disabled_nodes,
 			  sorted_p.total_cost,
 			  input_path->rows, input_path->pathtarget->width,
 			  0.0, work_mem, -1.0);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee26..393f9931767 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1537,7 +1537,7 @@ create_merge_append_path(PlannerInfo *root,
 
 			cost_sort(&sort_path,
 					  root,
-					  pathkeys,
+					  list_length(pathkeys),
 					  subpath->disabled_nodes,
 					  subpath->total_cost,
 					  subpath->rows,
@@ -1866,7 +1866,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		/*
 		 * Estimate cost for sort+unique implementation
 		 */
-		cost_sort(&sort_path, root, NIL,
+		cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs),
 				  subpath->disabled_nodes,
 				  subpath->total_cost,
 				  rel->rows,
@@ -3101,7 +3101,7 @@ create_sort_path(PlannerInfo *root,
 
 	pathnode->subpath = subpath;
 
-	cost_sort(&pathnode->path, root, pathkeys,
+	cost_sort(&pathnode->path, root, list_length(pathkeys),
 			  subpath->disabled_nodes,
 			  subpath->total_cost,
 			  subpath->rows,
@@ -3436,7 +3436,7 @@ create_groupingsets_path(PlannerInfo *root,
 			else
 			{
 				/* Account for cost of sort, but don't charge input cost again */
-				cost_sort(&sort_path, root, NIL, 0,
+				cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs), 0,
 						  0.0,
 						  subpath->rows,
 						  subpath->pathtarget->width,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944a..b8113b87486 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -108,7 +108,7 @@ extern void cost_resultscan(Path *path, PlannerInfo *root,
 							RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_recursive_union(Path *runion, Path *nrterm, Path *rterm);
 extern void cost_sort(Path *path, PlannerInfo *root,
-					  List *pathkeys, int disabled_nodes,
+					  int npathkeys, int disabled_nodes,
 					  Cost input_cost, double tuples, int width,
 					  Cost comparison_cost, int sort_mem,
 					  double limit_tuples);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1e682565d13..7546c795d0f 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3031,17 +3031,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 756c2e24965..f543e416f3e 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5736,18 +5736,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 108f9ecb445..886dc559a50 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1274,9 +1274,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1325,7 +1327,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index c73631a9a1d..f1c0af1c8d6 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1225,18 +1225,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1290,19 +1289,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1323,24 +1321,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.34.1

#5Andrei Lepikhov
lepihov@gmail.com
In reply to: Kirill Reshke (#3)
1 attachment(s)
Re: Consider the number of columns in the sort cost model

On 10/15/24 02:59, Kirill Reshke wrote:

On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov <lepihov@gmail.com> wrote:
Two questions:

Thank you for the interest to this feature!

1) Why do we change the `cost_sort` definition? We can calculate the
length of `pathkeys` inside cost_sort if we want, just like we do
inside cost_gather_merge.

I am suspicious of that but open to hearing other opinions. The
coefficient incorporates knowledge about how many comparisons will be
made with this sorting operator. The caller can have some additional
knowledge about that. For example, IncrementalSort knows about groups -
each group has the same prefix, which means each tuple comparison will
inevitably cause a comparison for each column from the prefix. Also, it
may be knowledge about the number of distinct values for each column,
etc. I'm not sure we should elaborate cost_sort a lot here.

2) I'm not very sure about the (1.0 + cmpMultiplier) coefficient. I
understand that we need to multiply by something that is not 0 (and
that's why we add 1), and this is strictly better than multiplying by
2, but why exactly is this formula like this, not (1.0 + cmpMultiplier
^ 2 / 10) for example? Maybe we need some more sophisticated approach?

Hmmm, I think you misunderstood the general idea behind this
coefficient. I explained it earlier this year in more detail [1]https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort,
showing the direction of future improvement.
Value '1' in this formula reflects some second-order significance
physics that we don't consider in that formula. Of course, it may be
different, but after tests, I decided to keep it simple.
The cmpMultiplier value reflects the number of comparisons the executor
will make when comparing a pair of tuples. It is a fraction of the
maximum number of potential comparisons for each operation of tuple
comparisons (likewise, Hash Agg already employs in costing). Right now,
we don't use distinct statistics on sorting columns and use a
conservative approach, which may be improved in the future. But I don't
think it is a real problem here because we use the same in all places
and, with this feature, change the balance only with hashing methods (in
favour of their usage). I think it is not harmful because this
characteristic is a second-order grade factor and shouldn't
significantly reduce the performance of actual queries.

Also, this patch needs to be rebased, again.

Thanks, done.

[1]: https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

--
regards, Andrei Lepikhov

Attachments:

v3-0001-Introduce-the-number-of-columns-in-the-cost-sort-.patchtext/x-patch; charset=UTF-8; name=v3-0001-Introduce-the-number-of-columns-in-the-cost-sort-.patchDownload
From 3ba39d245e079235f6352e5cb90224cc9ecee9b6 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 15 Oct 2024 11:33:01 +0700
Subject: [PATCH v3] Introduce the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 contrib/postgres_fdw/postgres_fdw.c           |  2 +-
 src/backend/optimizer/path/costsize.c         | 34 ++++++----
 src/backend/optimizer/plan/createplan.c       |  2 +-
 src/backend/optimizer/plan/planner.c          |  2 +-
 src/backend/optimizer/prep/prepunion.c        |  2 +-
 src/backend/optimizer/util/pathnode.c         |  8 +--
 src/include/optimizer/cost.h                  |  2 +-
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 12 files changed, 95 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98..5da33ab69e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                                                                     QUERY PLAN                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                                    QUERY PLAN                                                                                     
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..f9a2e88a17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3667,7 +3667,7 @@ adjust_foreign_grouping_path_cost(PlannerInfo *root,
 
 		cost_sort(&sort_path,
 				  root,
-				  pathkeys,
+				  list_length(pathkeys),
 				  0,
 				  *p_startup_cost + *p_run_cost,
 				  retrieved_rows,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df7..7c4e39065b 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -493,6 +493,10 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			npathkeys = list_length(((Path *) path)->pathkeys);
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	/* Mark the path with the correct row estimate */
 	if (rows)
@@ -512,7 +516,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -1896,7 +1900,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-			   double tuples, int width,
+			   double tuples, int width, int cmpMultiplier,
 			   Cost comparison_cost, int sort_mem,
 			   double limit_tuples)
 {
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		tuples = 2.0;
 
 	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
+	comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2086,7 +2090,9 @@ cost_incremental_sort(Path *path,
 	 * are equal.
 	 */
 	cost_tuplesort(&group_startup_cost, &group_run_cost,
-				   group_tuples, width, comparison_cost, sort_mem,
+				   group_tuples, width,
+				   list_length(pathkeys),
+				   comparison_cost, sort_mem,
 				   limit_tuples);
 
 	/*
@@ -2110,7 +2116,7 @@ cost_incremental_sort(Path *path,
 	 * detect the sort groups. This is roughly equal to one extra copy and
 	 * comparison per tuple.
 	 */
-	run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+	run_cost += (cpu_tuple_cost + (list_length(pathkeys) + 1) * comparison_cost) * input_tuples;
 
 	/*
 	 * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2142,7 +2148,7 @@ cost_incremental_sort(Path *path,
  */
 void
 cost_sort(Path *path, PlannerInfo *root,
-		  List *pathkeys, int input_disabled_nodes,
+		  int npathkeys, int input_disabled_nodes,
 		  Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples)
@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
+	int			cmpMultiplier = npathkeys;
+
+	Assert(npathkeys > 0);
 
 	cost_tuplesort(&startup_cost, &run_cost,
-				   tuples, width,
+				   tuples, width, cmpMultiplier,
 				   comparison_cost, sort_mem,
 				   limit_tuples);
 
@@ -2321,7 +2330,7 @@ cost_append(AppendPath *apath)
 					 */
 					cost_sort(&sort_path,
 							  NULL, /* doesn't currently need root */
-							  pathkeys,
+							  list_length(pathkeys),
 							  subpath->disabled_nodes,
 							  subpath->total_cost,
 							  subpath->rows,
@@ -2440,6 +2449,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			cmpMultiplier = list_length(pathkeys);
+
+	Assert(pathkeys != NIL);
 
 	/*
 	 * Avoid log(0)...
@@ -2448,7 +2460,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = (1.0 + cmpMultiplier) * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -3708,7 +3720,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 		{
 			cost_sort(&sort_path,
 					  root,
-					  outersortkeys,
+					  list_length(outersortkeys),
 					  outer_path->disabled_nodes,
 					  outer_path->total_cost,
 					  outer_path_rows,
@@ -3758,7 +3770,7 @@ initial_cost_mergejoin(PlannerInfo *root, JoinCostWorkspace *workspace,
 
 		cost_sort(&sort_path,
 				  root,
-				  innersortkeys,
+				  list_length(innersortkeys),
 				  inner_path->disabled_nodes,
 				  inner_path->total_cost,
 				  inner_path_rows,
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2ed0d81f6..fbb64e4cf3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -5496,7 +5496,7 @@ label_sort_with_costsize(PlannerInfo *root, Sort *plan, double limit_tuples)
 
 	Assert(IsA(plan, Sort));
 
-	cost_sort(&sort_path, root, NIL,
+	cost_sort(&sort_path, root, list_length(lefttree->targetlist),
 			  plan->plan.disabled_nodes,
 			  lefttree->total_cost,
 			  lefttree->plan_rows,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0f423e9684..f326e1c34b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6855,7 +6855,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 
 	/* Estimate the cost of seq scan + sort */
 	seqScanPath = create_seqscan_path(root, rel, NULL, 0);
-	cost_sort(&seqScanAndSortPath, root, NIL,
+	cost_sort(&seqScanAndSortPath, root, rel->max_attr,
 			  seqScanPath->disabled_nodes,
 			  seqScanPath->total_cost, rel->tuples, rel->reltarget->width,
 			  comparisonCost, maintenance_work_mem, -1.0);
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..0d51e1f612 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1358,7 +1358,7 @@ choose_hashed_setop(PlannerInfo *root, List *groupClauses,
 	sorted_p.startup_cost = input_path->startup_cost;
 	sorted_p.total_cost = input_path->total_cost;
 	/* XXX cost_sort doesn't actually look at pathkeys, so just pass NIL */
-	cost_sort(&sorted_p, root, NIL, sorted_p.disabled_nodes,
+	cost_sort(&sorted_p, root, list_length(input_path->pathtarget->exprs), sorted_p.disabled_nodes,
 			  sorted_p.total_cost,
 			  input_path->rows, input_path->pathtarget->width,
 			  0.0, work_mem, -1.0);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..393f993176 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1537,7 +1537,7 @@ create_merge_append_path(PlannerInfo *root,
 
 			cost_sort(&sort_path,
 					  root,
-					  pathkeys,
+					  list_length(pathkeys),
 					  subpath->disabled_nodes,
 					  subpath->total_cost,
 					  subpath->rows,
@@ -1866,7 +1866,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
 		/*
 		 * Estimate cost for sort+unique implementation
 		 */
-		cost_sort(&sort_path, root, NIL,
+		cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs),
 				  subpath->disabled_nodes,
 				  subpath->total_cost,
 				  rel->rows,
@@ -3101,7 +3101,7 @@ create_sort_path(PlannerInfo *root,
 
 	pathnode->subpath = subpath;
 
-	cost_sort(&pathnode->path, root, pathkeys,
+	cost_sort(&pathnode->path, root, list_length(pathkeys),
 			  subpath->disabled_nodes,
 			  subpath->total_cost,
 			  subpath->rows,
@@ -3436,7 +3436,7 @@ create_groupingsets_path(PlannerInfo *root,
 			else
 			{
 				/* Account for cost of sort, but don't charge input cost again */
-				cost_sort(&sort_path, root, NIL, 0,
+				cost_sort(&sort_path, root, list_length(subpath->pathtarget->exprs), 0,
 						  0.0,
 						  subpath->rows,
 						  subpath->pathtarget->width,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 854a782944..b8113b8748 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -108,7 +108,7 @@ extern void cost_resultscan(Path *path, PlannerInfo *root,
 							RelOptInfo *baserel, ParamPathInfo *param_info);
 extern void cost_recursive_union(Path *runion, Path *nrterm, Path *rterm);
 extern void cost_sort(Path *path, PlannerInfo *root,
-					  List *pathkeys, int disabled_nodes,
+					  int npathkeys, int disabled_nodes,
 					  Cost input_cost, double tuples, int width,
 					  Cost comparison_cost, int sort_mem,
 					  double limit_tuples);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1e682565d1..7546c795d0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3031,17 +3031,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 756c2e2496..f543e416f3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5736,18 +5736,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 108f9ecb44..886dc559a5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1274,9 +1274,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1325,7 +1327,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index c73631a9a1..f1c0af1c8d 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1225,18 +1225,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1290,19 +1289,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1323,24 +1321,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.39.5

#6David Rowley
dgrowleyml@gmail.com
In reply to: Andrei Lepikhov (#5)
Re: Consider the number of columns in the sort cost model

On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote:

I am suspicious of that but open to hearing other opinions. The
coefficient incorporates knowledge about how many comparisons will be
made with this sorting operator. The caller can have some additional
knowledge about that. For example, IncrementalSort knows about groups -
each group has the same prefix, which means each tuple comparison will
inevitably cause a comparison for each column from the prefix. Also, it
may be knowledge about the number of distinct values for each column,
etc. I'm not sure we should elaborate cost_sort a lot here.

I agree that cost_sort() likely needs a bit of work to try to bring it
closer to the current reality. There have been plenty of performance
improvements to sort over the years and I don't recall the costing
code ever being touched, likely that's because there are just so many
things that are not correctly or not costed at all.

As for your patch, I'm suspicious that the general idea you're
proposing is an actual improvement. I've only glanced at the patch,
but at least from this fragment here:

@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
{
Cost startup_cost;
Cost run_cost;
+ int cmpMultiplier = npathkeys;

and:

 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-    double tuples, int width,
+    double tuples, int width, int cmpMultiplier,
     Cost comparison_cost, int sort_mem,
     double limit_tuples)
 {
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
  tuples = 2.0;
  /* Include the default cost-per-comparison */
- comparison_cost += 2.0 * cpu_operator_cost;
+ comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;

it seems you're just charging cpu_operator_cost * <number of columns
to sort by>. It seems like it won't be very hard to fool that into
doing the wrong thing when the first column to sort by is distinct or
almost distinct. There's going to be far fewer or no tiebreaker
comparisons for that case.

As mentioned by Kirill, I also don't understand the cost_sort
signature change. Why would you do that over just doing
list_length(pathkeys) within cost_sort()? Your throwing away a
parameter that might be the most useful one of the bunch for allowing
better sort cost estimates.

I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons by checking the number of
distinct values for each sort key. That might require what we
discussed on another thread about having estimate_num_groups() better
use the n_distinct estimate from the EquivalenceMember with the least
distinct values. It'll be another question if all that can be done
and this all still perform well enough for cost_sort(). You may have
to write it first before we can figure that out. It may be possible
to buy back some of the overheads with some additional caching...
Perhaps that could be done within the EquivalenceClass.

David

#7Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#6)
Re: Consider the number of columns in the sort cost model

On 10/15/24 12:15, David Rowley wrote:

As for your patch, I'm suspicious that the general idea you're
proposing is an actual improvement.

I didn't intend to treat it as an 'improvement' but as an intermediate
patch. The main purpose here is to debate the way & introduce
considering of number of columns. Conservative development approach has
been preferred before.

it seems you're just charging cpu_operator_cost * <number of columns
to sort by>. It seems like it won't be very hard to fool that into
doing the wrong thing when the first column to sort by is distinct or
almost distinct. There's going to be far fewer or no tiebreaker
comparisons for that case.

As I've written above, it is for starters. It allow to analyse how the
balance between different types of orderings can be changed in extreme
cases. I can join this patch with the following, implementing
differentiation by distincts, but the effect on regression tests will be
smoothed out.
The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give
the optimiser a tool to choose a more optimal sort, comparing
MergeAppend/Sort/IncrementalSort costs.
The whole idea is implemented in the branch [1]https://github.com/postgrespro/postgres/tree/sort-columnsnum and described in the
post [2]https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort. Of course, we differentiate sortings by distinct of the first
column (only one trustworthy statistic). It is not so difficult (but I
doubt the real necessity) to use extended statistics and reflect in the
cost values of different combinations of columns.

As mentioned by Kirill, I also don't understand the cost_sort
signature change. Why would you do that over just doing
list_length(pathkeys) within cost_sort()? Your throwing away a
parameter that might be the most useful one of the bunch for allowing
better sort cost estimates.

Ok, may be it is too much for an intermediate patch. We can change the
interface later, if necessary.

Perhaps that could be done within the EquivalenceClass.

Thanks for the idea!

[1]: https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2]: https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

--
regards, Andrei Lepikhov

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#6)
4 attachment(s)
Re: Consider the number of columns in the sort cost model

On 15/10/2024 12:15, David Rowley wrote:

On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote:
I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons by checking the number of
distinct values for each sort key. That might require what we
discussed on another thread about having estimate_num_groups() better
use the n_distinct estimate from the EquivalenceMember with the least
distinct values. It'll be another question if all that can be done
and this all still perform well enough for cost_sort(). You may have
to write it first before we can figure that out. It may be possible
to buy back some of the overheads with some additional caching...
Perhaps that could be done within the EquivalenceClass.

It seems that your idea with an equivalence class works.
In the patchset attached, I show all the ideas (I initially wanted to
discuss it step-by-step).
In the attached, the first patch is about choosing adequate expression
from the list of EC members.
The second one is the current patch, which considers the number of sort
columns.
Third patch - elaboration of the second patch. Use only the trustful
statistics here - the number of ndistinct values on the first sorted column.
And the last patch is a demo of how I'm going to use the previous three
patches and add one more strategy to improve the order of columns in the
GROUP-BY clause.

--
regards, Andrei Lepikhov

Attachments:

0001-Stabilise-incremental-sort-cost-calculation.patchtext/plain; charset=UTF-8; name=0001-Stabilise-incremental-sort-cost-calculation.patchDownload
From b07075d68181df37e5d2d2d8f0435ac4c44ef80b Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 22 Oct 2024 09:02:02 +0400
Subject: [PATCH 1/4] Stabilise incremental sort cost calculation.

Carefully identify a column/expression that can represent the path key in cost
calculation of specific sort operator. Columns may have different numbers of
distinct values. That's why the order of columns in the sort operation may
impact number of the comparison function's calls.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find an
expression that chooses the most negligible ndistinct value.

TODO: Filtering EC members, external to this sort operator is not a big deal.
But in that case it would be necessary to pass underlying relids to cost
calculation routine that would cause the API change. So, here we stay as simple
as possible.

Add into EquivalenceMember the number of distinct values - em_ndistinct.
It may be additionally used later in groups number estimations.
---
 src/backend/optimizer/path/costsize.c         | 72 +++++++++++++++++--
 src/backend/optimizer/path/equivclass.c       |  1 +
 src/include/nodes/pathnodes.h                 |  2 +
 .../regress/expected/incremental_sort.out     | 51 +++++++++++++
 src/test/regress/sql/incremental_sort.sql     | 31 ++++++++
 5 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 79991b1980..325acdacbf 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -192,6 +192,8 @@ static int32 get_expr_width(PlannerInfo *root, const Node *expr);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
+static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
+												 EquivalenceClass *ec);
 
 
 /*
@@ -2016,22 +2018,21 @@ cost_incremental_sort(Path *path,
 	 */
 	foreach(l, pathkeys)
 	{
-		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		PathKey			   *key = (PathKey *) lfirst(l);
+		EquivalenceMember  *em = identify_sort_ecmember(root, key->pk_eclass);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) em->em_expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, em->em_expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
@@ -6491,3 +6492,64 @@ compute_gather_rows(Path *path)
 
 	return clamp_row_est(path->rows * get_parallel_divisor(path));
 }
+
+/*
+ * Find suitable member of the equivalence class.
+ * Passing through the list of EC members find the member with minimum of
+ * distinct values. Cache estimated number of distincts in the em_ndistinct
+ * field of each member.
+ */
+static EquivalenceMember *
+identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
+{
+	EquivalenceMember  *candidate = linitial(ec->ec_members);
+
+	if (root == NULL)
+		/* Fast path */
+		return candidate;
+
+	foreach_node(EquivalenceMember, em, ec->ec_members)
+	{
+		VariableStatData	vardata;
+
+		if (em->em_ndistinct == 0.)
+			/* Nothing helpful */
+			continue;
+
+		if (em->em_is_child || em->em_is_const || bms_is_empty(em->em_relids) ||
+			bms_is_member(0, em->em_relids))
+		{
+			em->em_ndistinct = 0.;
+			continue;
+		}
+
+		if (em->em_ndistinct < 0.0)
+		{
+			bool	isdefault = true;
+			double	ndist;
+
+			/* Let's check candidate's ndistinct value */
+			examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+			if (HeapTupleIsValid(vardata.statsTuple))
+				ndist = get_variable_numdistinct(&vardata, &isdefault);
+			ReleaseVariableStats(vardata);
+
+			if (isdefault)
+			{
+				em->em_ndistinct = 0.;
+				continue;
+			}
+
+			em->em_ndistinct = ndist;
+		}
+
+		Assert(em->em_ndistinct > 0.);
+
+		if (candidate->em_ndistinct == 0. ||
+			em->em_ndistinct < candidate->em_ndistinct)
+			candidate = em;
+	}
+
+	Assert(candidate != NULL);
+	return candidate;
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d806326e..2894919b24 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -526,6 +526,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
 	em->em_datatype = datatype;
 	em->em_jdomain = jdomain;
 	em->em_parent = parent;
+	em->em_ndistinct = -1.0;
 
 	if (bms_is_empty(relids))
 	{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1..3cb6a8a44f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1439,6 +1439,8 @@ typedef struct EquivalenceMember
 	JoinDomain *em_jdomain;		/* join domain containing the source clause */
 	/* if em_is_child is true, this links to corresponding EM for top parent */
 	struct EquivalenceMember *em_parent pg_node_attr(read_write_ignore);
+
+	double em_ndistinct; /* cached value of ndistinct: 0- default value or 'unknown'; -1 - not defined yet */
 } EquivalenceMember;
 
 /*
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 5fd54a10b1..77ac213910 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1698,3 +1698,54 @@ explain (costs off) select a, b, a <-> point(5, 5) dist from point_table order b
                Order By: (a <-> '(5,5)'::point)
 (6 rows)
 
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+SET enable_hashjoin = 'off';
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index ab471bdfff..1334371e88 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -292,3 +292,34 @@ create index point_table_a_idx on point_table using gist(a);
 -- Ensure we get an incremental sort plan for both of the following queries
 explain (costs off) select a, b, a <-> point(5, 5) dist from point_table order by dist, b limit 1;
 explain (costs off) select a, b, a <-> point(5, 5) dist from point_table order by dist, b desc limit 1;
+
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+
+SET enable_hashjoin = 'off';
+
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
-- 
2.47.0

0002-Consider-the-number-of-columns-in-the-cost-sort-mode.patchtext/plain; charset=UTF-8; name=0002-Consider-the-number-of-columns-in-the-cost-sort-mode.patchDownload
From eaae44d6e9dda4a295433af4ec6ae18e181046f7 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 21 Oct 2024 15:21:34 +0400
Subject: [PATCH 2/4] Consider the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 src/backend/optimizer/path/costsize.c         | 22 +++++--
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 6 files changed, 78 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 39b2b317e8..89ff865992 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9984,13 +9984,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                                                                     QUERY PLAN                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                                    QUERY PLAN                                                                                     
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 325acdacbf..8d50905ba8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -483,6 +483,8 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			npathkeys = list_length(((Path *) path)->pathkeys);
+	double		cmpfrac = (npathkeys == 0) ? 2.0 : npathkeys + 1.0;
 
 	/* Mark the path with the correct row estimate */
 	if (rows)
@@ -505,7 +507,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = cmpfrac * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -1863,7 +1865,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-			   double tuples, int width,
+			   double tuples, int width, double cmpfrac,
 			   Cost comparison_cost, int sort_mem,
 			   double limit_tuples)
 {
@@ -1880,7 +1882,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		tuples = 2.0;
 
 	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
+	comparison_cost += cmpfrac * cpu_operator_cost;
 
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2051,7 +2053,9 @@ cost_incremental_sort(Path *path,
 	 * are equal.
 	 */
 	cost_tuplesort(&group_startup_cost, &group_run_cost,
-				   group_tuples, width, comparison_cost, sort_mem,
+				   group_tuples, width,
+				   list_length(pathkeys) + 1.0,
+				   comparison_cost, sort_mem,
 				   limit_tuples);
 
 	/*
@@ -2075,7 +2079,7 @@ cost_incremental_sort(Path *path,
 	 * detect the sort groups. This is roughly equal to one extra copy and
 	 * comparison per tuple.
 	 */
-	run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+	run_cost += (cpu_tuple_cost + (presorted_keys + 1) * comparison_cost) * input_tuples;
 
 	/*
 	 * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2109,9 +2113,11 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
+	double		cmpfrac =
+						(pathkeys == NIL) ? 2.0 : list_length(pathkeys) + 1.0;
 
 	cost_tuplesort(&startup_cost, &run_cost,
-				   tuples, width,
+				   tuples, width, cmpfrac,
 				   comparison_cost, sort_mem,
 				   limit_tuples);
 
@@ -2391,6 +2397,8 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	double		cmpfrac =
+						(pathkeys == NIL) ? 2.0 : list_length(pathkeys) + 1.0;
 
 	/*
 	 * Avoid log(0)...
@@ -2399,7 +2407,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = cmpfrac * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index a5596ab210..45e08457df 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3001,17 +3001,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 53f70d72ed..c729b68ad3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5726,18 +5726,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 53591a4f2d..6b0303ce52 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1235,9 +1235,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1286,7 +1288,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 0fd0e1c38b..b08cb7f0a8 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1224,18 +1224,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1289,19 +1288,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1322,24 +1320,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.47.0

0003-Consider-ndistinct-on-the-first-column-in-cost_sort.patchtext/plain; charset=UTF-8; name=0003-Consider-ndistinct-on-the-first-column-in-cost_sort.patchDownload
From cfb5973ac86aa32d12bedf472256ad357eaf5fec Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 22 Oct 2024 16:54:49 +0400
Subject: [PATCH 3/4] Consider ndistinct on the first column in cost_sort().

---
 src/backend/optimizer/path/costsize.c        | 46 ++++++++++++++++++--
 src/test/regress/expected/aggregates.out     | 24 +++++-----
 src/test/regress/expected/partition_join.out | 18 ++++----
 3 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8d50905ba8..8935185a5e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -194,6 +194,8 @@ static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
 static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
 												 EquivalenceClass *ec);
+static double sort_comparisons_factor(PlannerInfo *root, List *pathkeys,
+									  double ntuples);
 
 
 /*
@@ -2113,8 +2115,7 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
-	double		cmpfrac =
-						(pathkeys == NIL) ? 2.0 : list_length(pathkeys) + 1.0;
+	double		cmpfrac = sort_comparisons_factor(root, pathkeys, tuples);
 
 	cost_tuplesort(&startup_cost, &run_cost,
 				   tuples, width, cmpfrac,
@@ -6560,4 +6561,43 @@ identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
 
 	Assert(candidate != NULL);
 	return candidate;
-}
\ No newline at end of file
+}
+
+/*
+ * Calculate multiplier reflecting the number of comparisons which executor
+ * have to perform during the sort with this specific order of columns.
+ *
+ * The comparison factor f = 1.+F(pathkeys). There 1. incapsulates the
+ * second-order of significance phusics which cost function doesn't consider.
+ * F(pathkeys) is the estimated fraction of comparisons in the range [1..N].
+ * F = 1 corresponds the 'all-unique' first column case. In that case the sort
+ * will call comparison function only once for each couple of tuples.
+ * F = N represents the case, when values in all columns are constant.
+ */
+static double
+sort_comparisons_factor(PlannerInfo *root, List *pathkeys, double ntuples)
+{
+	int		n = list_length(pathkeys);
+	double	cmpfrac = (n == 0) ? 2.0 : n + 1;
+
+	if (root != NULL && ntuples > 1 && n > 1)
+	{
+		PathKey			   *key = linitial_node(PathKey, pathkeys);
+		EquivalenceMember  *em =  identify_sort_ecmember(root, key->pk_eclass);
+
+		Assert(em->em_ndistinct >= 0);
+
+		if (em->em_ndistinct == 0.)
+			/*
+			 * Optimiser doesn't have an info on ndistinct value, return
+			 * extreme case
+			 */
+			return cmpfrac;
+
+		if (ntuples >= em->em_ndistinct)
+			cmpfrac =
+				2.0 + ((ntuples - em->em_ndistinct) / (ntuples - 1)) * (n - 1);
+	}
+
+	return cmpfrac;
+}
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 45e08457df..8c05adff86 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2842,19 +2842,18 @@ SELECT count(*)
                                   QUERY PLAN                                   
 -------------------------------------------------------------------------------
  GroupAggregate
-   Group Key: t1.z, t1.w, t1.x, t1.y
-   ->  Incremental Sort
-         Sort Key: t1.z, t1.w, t1.x, t1.y
-         Presorted Key: t1.z, t1.w, t1.x
+   Group Key: t1.x, t1.y, t1.z, t1.w
+   ->  Sort
+         Sort Key: t1.x, t1.y, t1.z, t1.w
          ->  Merge Join
-               Merge Cond: ((t1.z = t2.z) AND (t1.w = t2.w) AND (t1.x = t2.x))
+               Merge Cond: ((t1.w = t2.w) AND (t1.z = t2.z) AND (t1.x = t2.x))
                ->  Sort
-                     Sort Key: t1.z, t1.w, t1.x
+                     Sort Key: t1.w, t1.z, t1.x
                      ->  Index Scan using btg_x_y_idx on btg t1
                ->  Sort
-                     Sort Key: t2.z, t2.w, t2.x
+                     Sort Key: t2.w, t2.z, t2.x
                      ->  Index Scan using btg_x_y_idx on btg t2
-(13 rows)
+(12 rows)
 
 RESET enable_nestloop;
 RESET enable_hashjoin;
@@ -2878,12 +2877,11 @@ SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY x*x, z;
  Sort
    Sort Key: ((x * x)), z
    ->  GroupAggregate
-         Group Key: x, y, w, z
-         ->  Incremental Sort
-               Sort Key: x, y, w, z
-               Presorted Key: x, y
+         Group Key: w, x, y, z
+         ->  Sort
+               Sort Key: w, x, y, z
                ->  Index Scan using btg_x_y_idx on btg
-(8 rows)
+(7 rows)
 
 -- Test the case where the number of incoming subtree path keys is more than
 -- the number of grouping keys.
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6b0303ce52..1e6b278d21 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -615,39 +615,39 @@ SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
                ->  Sort
                      Sort Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, p2.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1.a = p2.a) AND (prt1.b = p2.b))
+                           Merge Cond: ((prt1.b = p2.b) AND (prt1.a = p2.a))
                            Filter: ((COALESCE(prt1.a, p2.a) >= 490) AND (COALESCE(prt1.a, p2.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1.a, prt1.b
+                                 Sort Key: prt1.b, prt1.a
                                  ->  Seq Scan on prt1_p1 prt1
                            ->  Sort
-                                 Sort Key: p2.a, p2.b
+                                 Sort Key: p2.b, p2.a
                                  ->  Seq Scan on prt2_p1 p2
          ->  Group
                Group Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, p2_1.b))
                ->  Sort
                      Sort Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, p2_1.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                           Merge Cond: ((prt1_1.b = p2_1.b) AND (prt1_1.a = p2_1.a))
                            Filter: ((COALESCE(prt1_1.a, p2_1.a) >= 490) AND (COALESCE(prt1_1.a, p2_1.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1_1.a, prt1_1.b
+                                 Sort Key: prt1_1.b, prt1_1.a
                                  ->  Seq Scan on prt1_p2 prt1_1
                            ->  Sort
-                                 Sort Key: p2_1.a, p2_1.b
+                                 Sort Key: p2_1.b, p2_1.a
                                  ->  Seq Scan on prt2_p2 p2_1
          ->  Group
                Group Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, p2_2.b))
                ->  Sort
                      Sort Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, p2_2.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                           Merge Cond: ((prt1_2.b = p2_2.b) AND (prt1_2.a = p2_2.a))
                            Filter: ((COALESCE(prt1_2.a, p2_2.a) >= 490) AND (COALESCE(prt1_2.a, p2_2.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1_2.a, prt1_2.b
+                                 Sort Key: prt1_2.b, prt1_2.a
                                  ->  Seq Scan on prt1_p3 prt1_2
                            ->  Sort
-                                 Sort Key: p2_2.a, p2_2.b
+                                 Sort Key: p2_2.b, p2_2.a
                                  ->  Seq Scan on prt2_p3 p2_2
 (43 rows)
 
-- 
2.47.0

0004-Add-GROUP-BY-strategy-put-the-most-distinct-column-a.patchtext/plain; charset=UTF-8; name=0004-Add-GROUP-BY-strategy-put-the-most-distinct-column-a.patchDownload
From caf8eb1706994029308bfd29c8bfce5de17dbff1 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 21 Oct 2024 16:17:01 +0400
Subject: [PATCH 4/4] Add GROUP-BY strategy: put the most distinct column at
 the head.

Let's allow GROUP-BY to utilize cost_sort feature which can differentiate
orders of pathkeys lists according to the ndistinct of the first column.
---
 src/backend/optimizer/path/pathkeys.c    | 73 +++++++++++++++++++++++-
 src/test/regress/expected/aggregates.out | 43 +++++++-------
 src/test/regress/sql/aggregates.sql      | 10 ++--
 3 files changed, 96 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index e25798972f..7db855fc39 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -26,6 +26,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 
 /* Consider reordering of GROUP BY keys? */
 bool		enable_group_by_reordering = true;
@@ -471,6 +472,10 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 	List	   *pathkeys = root->group_pathkeys;
 	List	   *clauses = root->processed_groupClause;
 
+	double		nd_max = 0;
+	PathKey	   *pk_opt = NULL;
+	ListCell   *lc1, *lc2;
+
 	/* always return at least the original pathkeys/clauses */
 	info = makeNode(GroupByOrdering);
 	info->pathkeys = pathkeys;
@@ -524,9 +529,6 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 		/* Test consistency of info structures */
 		for_each_from(lc, infos, 1)
 		{
-			ListCell   *lc1,
-					   *lc2;
-
 			info = lfirst_node(GroupByOrdering, lc);
 
 			Assert(list_length(info->clauses) == list_length(pinfo->clauses));
@@ -544,6 +546,71 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 		}
 	}
 #endif
+
+	/*
+	 * Let's try the order with the column having max ndistinct value
+	 */
+
+	forboth(lc1, root->group_pathkeys, lc2, root->processed_groupClause)
+	{
+		PathKey			   *pkey = lfirst_node(PathKey, lc1);
+		SortGroupClause	   *gc = (SortGroupClause *) lfirst(lc2);
+		Node			   *node;
+		Bitmapset		   *relids;
+		VariableStatData	vardata;
+		double				nd = -1;
+		bool				isdefault;
+
+		if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)
+			break;
+
+		node = get_sortgroupclause_expr(gc, root->parse->targetList);
+		relids = pull_varnos(root, node);
+
+		if (bms_num_members(relids) != 1 && bms_is_member(0, relids))
+			/*
+			 *Although functional index can estimate distincts here, the chance
+			 * is too low.
+			 */
+			continue;
+
+		examine_variable(root, node, 0, &vardata);
+		if (!HeapTupleIsValid(vardata.statsTuple))
+			continue;
+		nd = get_variable_numdistinct(&vardata, &isdefault);
+		ReleaseVariableStats(vardata);
+		if (isdefault)
+			continue;
+
+		Assert(nd >= 0);
+		if (nd_max == 0 || nd > nd_max)
+		{
+			nd_max = nd;
+			pk_opt = pkey;
+		}
+	}
+
+	if (pk_opt != NULL)
+	{
+		List   *new_pathkeys = list_make1(pk_opt);
+		int		n;
+
+		new_pathkeys = list_concat_unique_ptr(new_pathkeys, root->group_pathkeys);
+		n = group_keys_reorder_by_pathkeys(new_pathkeys, &pathkeys, &clauses,
+										   root->num_groupby_pathkeys);
+
+		if (n > 0 &&
+			(enable_incremental_sort || n == root->num_groupby_pathkeys) &&
+			compare_pathkeys(pathkeys, root->group_pathkeys) != PATHKEYS_EQUAL)
+		{
+			info = makeNode(GroupByOrdering);
+			info->pathkeys = pathkeys;
+			info->clauses = clauses;
+
+			infos = lappend(infos, info);
+		}
+	}
+
 	return infos;
 }
 
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8c05adff86..4af82f9600 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2786,13 +2786,13 @@ SELECT balk(hundred) FROM tenk1;
 ROLLBACK;
 -- GROUP BY optimization by reordering GROUP BY clauses
 CREATE TABLE btg AS SELECT
-  i % 10 AS x,
-  i % 10 AS y,
-  'abc' || i % 10 AS z,
+  i % 231 AS x,
+  i % 49 AS y,
+  'abc' || i % 2 AS z,
   i AS w
-FROM generate_series(1, 100) AS i;
+FROM generate_series(1, 1000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
-ANALYZE btg;
+VACUUM ANALYZE btg;
 SET enable_hashagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -2839,21 +2839,19 @@ EXPLAIN (COSTS OFF)
 SELECT count(*)
   FROM btg t1 JOIN btg t2 ON t1.z = t2.z AND t1.w = t2.w AND t1.x = t2.x
   GROUP BY t1.x, t1.y, t1.z, t1.w;
-                                  QUERY PLAN                                   
--------------------------------------------------------------------------------
+                           QUERY PLAN                           
+----------------------------------------------------------------
  GroupAggregate
-   Group Key: t1.x, t1.y, t1.z, t1.w
+   Group Key: t1.w, t1.x, t1.y, t1.z
    ->  Sort
-         Sort Key: t1.x, t1.y, t1.z, t1.w
+         Sort Key: t1.w, t1.x, t1.y, t1.z
          ->  Merge Join
-               Merge Cond: ((t1.w = t2.w) AND (t1.z = t2.z) AND (t1.x = t2.x))
-               ->  Sort
-                     Sort Key: t1.w, t1.z, t1.x
-                     ->  Index Scan using btg_x_y_idx on btg t1
-               ->  Sort
-                     Sort Key: t2.w, t2.z, t2.x
+               Merge Cond: (t1.x = t2.x)
+               Join Filter: ((t2.z = t1.z) AND (t2.w = t1.w))
+               ->  Index Scan using btg_x_y_idx on btg t1
+               ->  Materialize
                      ->  Index Scan using btg_x_y_idx on btg t2
-(12 rows)
+(10 rows)
 
 RESET enable_nestloop;
 RESET enable_hashjoin;
@@ -2877,11 +2875,12 @@ SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY x*x, z;
  Sort
    Sort Key: ((x * x)), z
    ->  GroupAggregate
-         Group Key: w, x, y, z
-         ->  Sort
-               Sort Key: w, x, y, z
+         Group Key: x, y, w, z
+         ->  Incremental Sort
+               Sort Key: x, y, w, z
+               Presorted Key: x, y
                ->  Index Scan using btg_x_y_idx on btg
-(7 rows)
+(8 rows)
 
 -- Test the case where the number of incoming subtree path keys is more than
 -- the number of grouping keys.
@@ -2918,9 +2917,9 @@ GROUP BY c1.w, c1.z;
                      QUERY PLAN                      
 -----------------------------------------------------
  GroupAggregate
-   Group Key: c1.w, c1.z
+   Group Key: c1.z, c1.w
    ->  Sort
-         Sort Key: c1.w, c1.z, c1.x, c1.y
+         Sort Key: c1.z, c1.w, c1.x, c1.y
          ->  Merge Join
                Merge Cond: (c1.x = c2.x)
                ->  Sort
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..f1a9a607b7 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1199,13 +1199,13 @@ ROLLBACK;
 
 -- GROUP BY optimization by reordering GROUP BY clauses
 CREATE TABLE btg AS SELECT
-  i % 10 AS x,
-  i % 10 AS y,
-  'abc' || i % 10 AS z,
+  i % 231 AS x,
+  i % 49 AS y,
+  'abc' || i % 2 AS z,
   i AS w
-FROM generate_series(1, 100) AS i;
+FROM generate_series(1, 1000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
-ANALYZE btg;
+VACUUM ANALYZE btg;
 
 SET enable_hashagg = off;
 SET enable_seqscan = off;
-- 
2.47.0

#9Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Andrei Lepikhov (#8)
Re: Consider the number of columns in the sort cost model

Hi! Thank you for your work on this subject!

On 23.10.2024 04:39, Andrei Lepikhov wrote:

On 15/10/2024 12:15, David Rowley wrote:

On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote:
I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons by checking the number of
distinct values for each sort key.  That might require what we
discussed on another thread about having estimate_num_groups() better
use the n_distinct estimate from the EquivalenceMember with the least
distinct values.  It'll be another question if all that can be done
and this all still perform well enough for cost_sort(). You may have
to write it first before we can figure that out.  It may be possible
to buy back some of the overheads with some additional caching...
Perhaps that could be done within the EquivalenceClass.

It seems that your idea with an equivalence class works.
In the patchset attached, I show all the ideas (I initially wanted to
discuss it step-by-step).
In the attached, the first patch is about choosing adequate expression
from the list of EC members.
The second one is the current patch, which considers the number of
sort columns.
Third patch - elaboration of the second patch. Use only the trustful
statistics here - the number of ndistinct values on the first sorted
column.
And the last patch is a demo of how I'm going to use the previous
three patches and add one more strategy to improve the order of
columns in the GROUP-BY clause.

I have some question about your patches.

1. You consider two lists root->group_pathkeys and
root->processed_groupClause.

As far as I understand, this is due to the fact that the pathkey
elements are identical to the group clause and identity verification is
precisely checked by this condition:

if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)

To be honest, I didn’t find information about this in the code, but did
I understand everything correctly?

2. I noticed that statistics of distinct values ​​are calculated several
times. Maybe I miss something, but this can be avoided if these
statistics can be saved after calculation. For example, I saw that it is
possible that you calculate the same statistic information for the same
equivalence members in cost_incremental_sort and identify_sort_ecmember.
Is it possible to store information in a structure and use it later?

3. I think you should initialize the variable ndist in your patch. I
faced the compile complaining during your code compilation.

costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized
[-Werror=maybe-uninitialized]
 6694 |                         em->em_ndistinct = ndist;
      |                         ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
 6680 |                         double  ndist;
      |                                 ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1

4. Before executing examine_variable function I think you should check
that it is not null.

@@ -575,6 +575,8 @@ get_useful_group_keys_orderings(PlannerInfo *root,
Path *path)
              */
             continue;

+        Assert (node != NULL);
+
         examine_variable(root, node, 0, &vardata);
         if (!HeapTupleIsValid(vardata.statsTuple))
             continue;

--
Regards,
Alena Rybakina
Postgres Professional

#10Andrei Lepikhov
lepihov@gmail.com
In reply to: Alena Rybakina (#9)
4 attachment(s)
Re: Consider the number of columns in the sort cost model

On 10/28/24 16:48, Alena Rybakina wrote:

On 23.10.2024 04:39, Andrei Lepikhov wrote:

On 15/10/2024 12:15, David Rowley wrote:
And the last patch is a demo of how I'm going to use the previous
three patches and add one more strategy to improve the order of
columns in the GROUP-BY clause.

To be honest, I didn’t find information about this in the code, but did
I understand everything correctly?

Yes

2. I noticed that statistics of distinct values ​​are calculated several
times. Maybe I miss something, but this can be avoided if these
statistics can be saved after calculation. For example, I saw that it is
possible that you calculate the same statistic information for the same
equivalence members in cost_incremental_sort and identify_sort_ecmember.
Is it possible to store information in a structure and use it later?

Hmm, I don't see multiple calculations. em_distinct has made
specifically for this sake. Can you provide specific case and code lines?

3. I think you should initialize the variable ndist in your patch. I
faced the compile complaining during your code compilation.

costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized
[-Werror=maybe-uninitialized]
 6694 |                         em->em_ndistinct = ndist;
      |                         ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
 6680 |                         double  ndist;
      |                                 ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1

I think you can just update your compiler. But I added the ndist
initialisation to make more compilers happy :).

+        Assert (node != NULL);
+
         examine_variable(root, node, 0, &vardata);
         if (!HeapTupleIsValid(vardata.statsTuple))
             continue;

I don't think so. At least until you provide the case when the
get_sortgroupclause_expr function returns NULL.
That's more, remember - the patch 0004 here is just to show the
perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly
because of 828e94c. So, see the new version in the attachment.

--
regards, Andrei Lepikhov

Attachments:

v2-0001-Stabilise-incremental-sort-cost-calculation.patchtext/x-patch; charset=UTF-8; name=v2-0001-Stabilise-incremental-sort-cost-calculation.patchDownload
From 5eb884cbbd9c2e356d5d855da46d7e62d101b8b9 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 29 Oct 2024 08:49:33 +0700
Subject: [PATCH v2 1/4] Stabilise incremental sort cost calculation.

Carefully identify a column/expression that can represent the path key in cost
calculation of specific sort operator. Columns may have different numbers of
distinct values. That's why the order of columns in the sort operation may
impact number of the comparison function's calls.
Sorting has only pathkeys as input for the cost estimation. This patch, instead
of a blind choice of the first equivalence class member, attempts to find an
expression that chooses the most negligible ndistinct value.

TODO: Filtering EC members, external to this sort operator is not a big deal.
But in that case it would be necessary to pass underlying relids to cost
calculation routine that would cause the API change. So, here we stay as simple
as possible.

Add into EquivalenceMember the number of distinct values - em_ndistinct.
It may be additionally used later in groups number estimations.
---
 src/backend/optimizer/path/costsize.c         | 72 +++++++++++++++++--
 src/backend/optimizer/path/equivclass.c       |  1 +
 src/include/nodes/pathnodes.h                 |  2 +
 .../regress/expected/incremental_sort.out     | 51 +++++++++++++
 src/test/regress/sql/incremental_sort.sql     | 31 ++++++++
 5 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2bb6db1df7..686d5883d1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -203,6 +203,8 @@ static int32 get_expr_width(PlannerInfo *root, const Node *expr);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
+static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
+												 EquivalenceClass *ec);
 
 
 /*
@@ -2052,22 +2054,21 @@ cost_incremental_sort(Path *path,
 	 */
 	foreach(l, pathkeys)
 	{
-		PathKey    *key = (PathKey *) lfirst(l);
-		EquivalenceMember *member = (EquivalenceMember *)
-			linitial(key->pk_eclass->ec_members);
+		PathKey			   *key = (PathKey *) lfirst(l);
+		EquivalenceMember  *em = identify_sort_ecmember(root, key->pk_eclass);
 
 		/*
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) em->em_expr)))
 		{
 			unknown_varno = true;
 			break;
 		}
 
 		/* expression not containing any Vars with "varno 0" */
-		presortedExprs = lappend(presortedExprs, member->em_expr);
+		presortedExprs = lappend(presortedExprs, em->em_expr);
 
 		if (foreach_current_index(l) + 1 >= presorted_keys)
 			break;
@@ -6604,3 +6605,64 @@ compute_gather_rows(Path *path)
 
 	return clamp_row_est(path->rows * get_parallel_divisor(path));
 }
+
+/*
+ * Find suitable member of the equivalence class.
+ * Passing through the list of EC members find the member with minimum of
+ * distinct values. Cache estimated number of distincts in the em_ndistinct
+ * field of each member.
+ */
+static EquivalenceMember *
+identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
+{
+	EquivalenceMember  *candidate = linitial(ec->ec_members);
+
+	if (root == NULL)
+		/* Fast path */
+		return candidate;
+
+	foreach_node(EquivalenceMember, em, ec->ec_members)
+	{
+		VariableStatData	vardata;
+
+		if (em->em_ndistinct == 0.)
+			/* Nothing helpful */
+			continue;
+
+		if (em->em_is_child || em->em_is_const || bms_is_empty(em->em_relids) ||
+			bms_is_member(0, em->em_relids))
+		{
+			em->em_ndistinct = 0.;
+			continue;
+		}
+
+		if (em->em_ndistinct < 0.)
+		{
+			bool	isdefault = true;
+			double	ndist = 0.;
+
+			/* Let's check candidate's ndistinct value */
+			examine_variable(root, (Node *) em->em_expr, 0, &vardata);
+			if (HeapTupleIsValid(vardata.statsTuple))
+				ndist = get_variable_numdistinct(&vardata, &isdefault);
+			ReleaseVariableStats(vardata);
+
+			if (isdefault)
+			{
+				em->em_ndistinct = 0.;
+				continue;
+			}
+
+			em->em_ndistinct = ndist;
+		}
+
+		Assert(em->em_ndistinct > 0.);
+
+		if (candidate->em_ndistinct == 0. ||
+			em->em_ndistinct < candidate->em_ndistinct)
+			candidate = em;
+	}
+
+	Assert(candidate != NULL);
+	return candidate;
+}
\ No newline at end of file
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index fae137dd82..3552614502 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -525,6 +525,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
 	em->em_datatype = datatype;
 	em->em_jdomain = jdomain;
 	em->em_parent = parent;
+	em->em_ndistinct = -1.0;
 
 	if (bms_is_empty(relids))
 	{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..4ef5256a7f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1448,6 +1448,8 @@ typedef struct EquivalenceMember
 	JoinDomain *em_jdomain;		/* join domain containing the source clause */
 	/* if em_is_child is true, this links to corresponding EM for top parent */
 	struct EquivalenceMember *em_parent pg_node_attr(read_write_ignore);
+
+	double em_ndistinct; /* cached value of ndistinct: 0- default value or 'unknown'; -1 - not defined yet */
 } EquivalenceMember;
 
 /*
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 2df7a5db12..409eb8a4b8 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1722,3 +1722,54 @@ order by t1.four, t1.two limit 1;
                ->  Seq Scan on tenk1 t2
 (12 rows)
 
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+SET enable_hashjoin = 'off';
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+                             QUERY PLAN                             
+--------------------------------------------------------------------
+ Sort
+   Sort Key: t1.x, t1.y
+   ->  Nested Loop
+         ->  Seq Scan on sort_ndist_t1 t1
+         ->  Memoize
+               Cache Key: t1.x
+               Cache Mode: logical
+               ->  Index Only Scan using t2_idx on sort_ndist_t2 t2
+                     Index Cond: (x = t1.x)
+(9 rows)
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 98b20e17e1..af4d145395 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -298,3 +298,34 @@ explain (costs off)
 select * from
   (select * from tenk1 order by four) t1 join tenk1 t2 on t1.four = t2.four and t1.two = t2.two
 order by t1.four, t1.two limit 1;
+
+-- Check:
+-- commuting of sides in an expression doesn't influence cost estimation
+CREATE TABLE sort_ndist_t1 (x numeric, y numeric);
+CREATE TABLE sort_ndist_t2 (x numeric, y numeric);
+
+INSERT INTO sort_ndist_t1 (x,y)
+  SELECT gs%10, gs%10 FROM generate_series(1,1E4) AS gs;
+INSERT INTO sort_ndist_t2 (x,y)
+  SELECT gs, gs FROM generate_series(1,1E4) AS gs;
+CREATE INDEX t1_idx ON sort_ndist_t1 (x);
+CREATE INDEX t2_idx ON sort_ndist_t2 (x);
+VACUUM ANALYZE sort_ndist_t1, sort_ndist_t2;
+
+SET enable_hashjoin = 'off';
+
+-- Having lots of duplicates after the join it is more effective to use plain
+-- Sort instead of incremental sort: with small number of groups we do the same
+-- stuff like Sort but with extra penalty.
+EXPLAIN (COSTS OFF)
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t1.x=t2.x
+ORDER BY t1.x,t1.y;
+
+EXPLAIN (COSTS OFF) -- the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
-- 
2.39.5

v2-0002-Consider-the-number-of-columns-in-the-cost-sort-m.patchtext/x-patch; charset=UTF-8; name=v2-0002-Consider-the-number-of-columns-in-the-cost-sort-m.patchDownload
From 519a8933f731d9e43448671228b5a102d6d33eb4 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 21 Oct 2024 15:21:34 +0400
Subject: [PATCH v2 2/4] Consider the number of columns in the cost-sort model.

The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique can lead to incorrect estimations when sorting on three, four,
or more columns, quite common in practice.
Moreover, further elaboration of the optimiser forms more strict requirements
for the balance of sortings, as caused by IncrementalSort, MergeAppend, and
MergeJoin.
In this patch, the multiplier is a linear function of a number of columns.
Member 1.0 needs to smooth the fact that dependence on the number of columns is
weaker than linear.
It is an extreme formula. The number of comparisons depends on the distinct
values in each column. As a TODO, we can natively elaborate this model by the
next step, involving distinct statistics to make estimations more precise.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 15 +++--
 src/backend/optimizer/path/costsize.c         | 22 +++++--
 src/test/regress/expected/aggregates.out      | 13 ++--
 src/test/regress/expected/join.out            | 20 +++---
 src/test/regress/expected/partition_join.out  |  8 ++-
 src/test/regress/expected/union.out           | 66 +++++++++----------
 6 files changed, 78 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f2bcd6aa98..5da33ab69e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9995,13 +9995,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
 -- left outer join + nullable clause
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-                                                                                                                     QUERY PLAN                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+                                                                                    QUERY PLAN                                                                                     
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: t1.a, fprt2.b, fprt2.c
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-   Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10)) ORDER BY r5.a ASC NULLS LAST, r6.b ASC NULLS LAST, r6.c ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.a, fprt2.b, fprt2.c
+   ->  Foreign Scan
+         Output: t1.a, fprt2.b, fprt2.c
+         Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+         Remote SQL: SELECT r5.a, r6.b, r6.c FROM (public.fprt1_p1 r5 LEFT JOIN public.fprt2_p1 r6 ON (((r5.a = r6.b)) AND ((r5.b = r6.a)) AND ((r6.a < 10)))) WHERE ((r5.a < 10))
+(7 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 686d5883d1..ca1fd48af8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -495,6 +495,8 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	int			npathkeys = list_length(((Path *) path)->pathkeys);
+	double		cmpfrac = (npathkeys == 0) ? 2.0 : npathkeys + 1.0;
 
 	/* Mark the path with the correct row estimate */
 	if (rows)
@@ -514,7 +516,7 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = cmpfrac * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
@@ -1898,7 +1900,7 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
  */
 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-			   double tuples, int width,
+			   double tuples, int width, double cmpfrac,
 			   Cost comparison_cost, int sort_mem,
 			   double limit_tuples)
 {
@@ -1915,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		tuples = 2.0;
 
 	/* Include the default cost-per-comparison */
-	comparison_cost += 2.0 * cpu_operator_cost;
+	comparison_cost += cmpfrac * cpu_operator_cost;
 
 	/* Do we have a useful LIMIT? */
 	if (limit_tuples > 0 && limit_tuples < tuples)
@@ -2087,7 +2089,9 @@ cost_incremental_sort(Path *path,
 	 * are equal.
 	 */
 	cost_tuplesort(&group_startup_cost, &group_run_cost,
-				   group_tuples, width, comparison_cost, sort_mem,
+				   group_tuples, width,
+				   list_length(pathkeys) + 1.0,
+				   comparison_cost, sort_mem,
 				   limit_tuples);
 
 	/*
@@ -2111,7 +2115,7 @@ cost_incremental_sort(Path *path,
 	 * detect the sort groups. This is roughly equal to one extra copy and
 	 * comparison per tuple.
 	 */
-	run_cost += (cpu_tuple_cost + comparison_cost) * input_tuples;
+	run_cost += (cpu_tuple_cost + (presorted_keys + 1) * comparison_cost) * input_tuples;
 
 	/*
 	 * Additionally, we charge double cpu_tuple_cost for each input group to
@@ -2151,9 +2155,11 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
+	double		cmpfrac =
+						(pathkeys == NIL) ? 2.0 : list_length(pathkeys) + 1.0;
 
 	cost_tuplesort(&startup_cost, &run_cost,
-				   tuples, width,
+				   tuples, width, cmpfrac,
 				   comparison_cost, sort_mem,
 				   limit_tuples);
 
@@ -2441,6 +2447,8 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	Cost		comparison_cost;
 	double		N;
 	double		logN;
+	double		cmpfrac =
+						(pathkeys == NIL) ? 2.0 : list_length(pathkeys) + 1.0;
 
 	/*
 	 * Avoid log(0)...
@@ -2449,7 +2457,7 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	logN = LOG2(N);
 
 	/* Assumed cost per tuple comparison */
-	comparison_cost = 2.0 * cpu_operator_cost;
+	comparison_cost = cmpfrac * cpu_operator_cost;
 
 	/* Heap creation cost */
 	startup_cost += comparison_cost * N * logN;
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1e682565d1..7546c795d0 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3031,17 +3031,18 @@ ANALYZE agg_sort_order;
 EXPLAIN (COSTS OFF)
 SELECT array_agg(c1 ORDER BY c2),c2
 FROM agg_sort_order WHERE c2 < 100 GROUP BY c1 ORDER BY 2;
-                                 QUERY PLAN                                 
-----------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: c2
    ->  GroupAggregate
          Group Key: c1
-         ->  Sort
+         ->  Incremental Sort
                Sort Key: c1, c2
-               ->  Index Scan using agg_sort_order_c2_idx on agg_sort_order
-                     Index Cond: (c2 < 100)
-(8 rows)
+               Presorted Key: c1
+               ->  Index Scan using agg_sort_order_pkey on agg_sort_order
+                     Filter: (c2 < 100)
+(9 rows)
 
 DROP TABLE agg_sort_order CASCADE;
 DROP TABLE btg;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b2973694f..8a12fb582a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5806,18 +5806,20 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+                 QUERY PLAN                  
+---------------------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.id, b.c_id
+                     ->  Seq Scan on b
+(11 rows)
 
 -- join removal is not possible here
 explain (costs off)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 108f9ecb44..886dc559a5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1274,9 +1274,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
                                  QUERY PLAN                                 
 ----------------------------------------------------------------------------
- Sort
+ Incremental Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Append
+   Presorted Key: t1.a
+   ->  Merge Append
+         Sort Key: t1.a
          ->  Merge Left Join
                Merge Cond: (t1_1.a = t2_1.b)
                ->  Sort
@@ -1325,7 +1327,7 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
                ->  Sort
                      Sort Key: t2_3.b
                      ->  Seq Scan on prt2_p3 t2_3
-(51 rows)
+(53 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index c73631a9a1..f1c0af1c8d 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1225,18 +1225,17 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x < 4
 ORDER BY x;
-                    QUERY PLAN                    
---------------------------------------------------
+                 QUERY PLAN                 
+--------------------------------------------
  Sort
    Sort Key: (2)
-   ->  Unique
-         ->  Sort
-               Sort Key: (1), (2)
-               ->  Append
-                     ->  Result
-                     ->  Result
-                           One-Time Filter: false
-(9 rows)
+   ->  HashAggregate
+         Group Key: (1), (2)
+         ->  Append
+               ->  Result
+               ->  Result
+                     One-Time Filter: false
+(8 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, 2 AS x
@@ -1290,19 +1289,18 @@ SELECT * FROM
    SELECT 2 AS t, 4 AS x) ss
 WHERE x > 3
 ORDER BY x;
-                                     QUERY PLAN                                     
-------------------------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Sort
    Sort Key: ss.x
    ->  Subquery Scan on ss
          Filter: (ss.x > 3)
-         ->  Unique
-               ->  Sort
-                     Sort Key: (1), (((random() * '3'::double precision))::integer)
-                     ->  Append
-                           ->  Result
-                           ->  Result
-(10 rows)
+         ->  HashAggregate
+               Group Key: (1), (((random() * '3'::double precision))::integer)
+               ->  Append
+                     ->  Result
+                     ->  Result
+(9 rows)
 
 SELECT * FROM
   (SELECT 1 AS t, (random()*3)::int AS x
@@ -1323,24 +1321,22 @@ select distinct q1 from
    union all
    select distinct * from int8_tbl i82) ss
 where q2 = q2;
-                        QUERY PLAN                        
-----------------------------------------------------------
- Unique
-   ->  Merge Append
-         Sort Key: "*SELECT* 1".q1
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: "*SELECT* 1".q1
+   ->  Append
          ->  Subquery Scan on "*SELECT* 1"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i81.q1, i81.q2
-                           ->  Seq Scan on int8_tbl i81
-                                 Filter: (q2 IS NOT NULL)
+               ->  HashAggregate
+                     Group Key: i81.q1, i81.q2
+                     ->  Seq Scan on int8_tbl i81
+                           Filter: (q2 IS NOT NULL)
          ->  Subquery Scan on "*SELECT* 2"
-               ->  Unique
-                     ->  Sort
-                           Sort Key: i82.q1, i82.q2
-                           ->  Seq Scan on int8_tbl i82
-                                 Filter: (q2 IS NOT NULL)
-(15 rows)
+               ->  HashAggregate
+                     Group Key: i82.q1, i82.q2
+                     ->  Seq Scan on int8_tbl i82
+                           Filter: (q2 IS NOT NULL)
+(13 rows)
 
 select distinct q1 from
   (select distinct * from int8_tbl i81
-- 
2.39.5

v2-0003-Consider-ndistinct-on-the-first-column-in-cost_so.patchtext/x-patch; charset=UTF-8; name=v2-0003-Consider-ndistinct-on-the-first-column-in-cost_so.patchDownload
From 3c0483df88a950c3109c574a4a50a615ab122a23 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 22 Oct 2024 16:54:49 +0400
Subject: [PATCH v2 3/4] Consider ndistinct on the first column in cost_sort().

---
 src/backend/optimizer/path/costsize.c        | 46 ++++++++++++++++++--
 src/test/regress/expected/aggregates.out     | 22 +++++-----
 src/test/regress/expected/partition_join.out | 18 ++++----
 src/test/regress/sql/aggregates.sql          |  5 ++-
 4 files changed, 66 insertions(+), 25 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ca1fd48af8..20e09a391a 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -205,6 +205,8 @@ static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
 static EquivalenceMember *identify_sort_ecmember(PlannerInfo *root,
 												 EquivalenceClass *ec);
+static double sort_comparisons_factor(PlannerInfo *root, List *pathkeys,
+									  double ntuples);
 
 
 /*
@@ -2155,8 +2157,7 @@ cost_sort(Path *path, PlannerInfo *root,
 {
 	Cost		startup_cost;
 	Cost		run_cost;
-	double		cmpfrac =
-						(pathkeys == NIL) ? 2.0 : list_length(pathkeys) + 1.0;
+	double		cmpfrac = sort_comparisons_factor(root, pathkeys, tuples);
 
 	cost_tuplesort(&startup_cost, &run_cost,
 				   tuples, width, cmpfrac,
@@ -6673,4 +6674,43 @@ identify_sort_ecmember(PlannerInfo *root, EquivalenceClass *ec)
 
 	Assert(candidate != NULL);
 	return candidate;
-}
\ No newline at end of file
+}
+
+/*
+ * Calculate multiplier reflecting the number of comparisons which executor
+ * have to perform during the sort with this specific order of columns.
+ *
+ * The comparison factor f = 1.+F(pathkeys). There 1. incapsulates the
+ * second-order of significance phusics which cost function doesn't consider.
+ * F(pathkeys) is the estimated fraction of comparisons in the range [1..N].
+ * F = 1 corresponds the 'all-unique' first column case. In that case the sort
+ * will call comparison function only once for each couple of tuples.
+ * F = N represents the case, when values in all columns are constant.
+ */
+static double
+sort_comparisons_factor(PlannerInfo *root, List *pathkeys, double ntuples)
+{
+	int		n = list_length(pathkeys);
+	double	cmpfrac = (n == 0) ? 2.0 : n + 1;
+
+	if (root != NULL && ntuples > 1 && n > 1)
+	{
+		PathKey			   *key = linitial_node(PathKey, pathkeys);
+		EquivalenceMember  *em =  identify_sort_ecmember(root, key->pk_eclass);
+
+		Assert(em->em_ndistinct >= 0);
+
+		if (em->em_ndistinct == 0.)
+			/*
+			 * Optimiser doesn't have an info on ndistinct value, return
+			 * extreme case
+			 */
+			return cmpfrac;
+
+		if (ntuples >= em->em_ndistinct)
+			cmpfrac =
+				2.0 + ((ntuples - em->em_ndistinct) / (ntuples - 1)) * (n - 1);
+	}
+
+	return cmpfrac;
+}
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7546c795d0..b974d365be 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2814,6 +2814,7 @@ ROLLBACK;
 CREATE TABLE btg AS SELECT
   i % 10 AS x,
   i % 10 AS y,
+  i % 10 AS n,
   'abc' || i % 10 AS z,
   i AS w
 FROM generate_series(1, 100) AS i;
@@ -2863,20 +2864,20 @@ SET enable_hashjoin = off;
 SET enable_nestloop = off;
 EXPLAIN (COSTS OFF)
 SELECT count(*)
-  FROM btg t1 JOIN btg t2 ON t1.w = t2.w AND t1.x = t2.x AND t1.z = t2.z
-  GROUP BY t1.w, t1.z, t1.x;
+  FROM btg t1 JOIN btg t2 ON t1.n = t2.n AND t1.x = t2.x AND t1.z = t2.z
+  GROUP BY t1.n, t1.z, t1.x;
                                QUERY PLAN                                
 -------------------------------------------------------------------------
  GroupAggregate
-   Group Key: t1.x, t1.w, t1.z
+   Group Key: t1.x, t1.n, t1.z
    ->  Merge Join
-         Merge Cond: ((t1.x = t2.x) AND (t1.w = t2.w) AND (t1.z = t2.z))
+         Merge Cond: ((t1.x = t2.x) AND (t1.n = t2.n) AND (t1.z = t2.z))
          ->  Incremental Sort
-               Sort Key: t1.x, t1.w, t1.z
+               Sort Key: t1.x, t1.n, t1.z
                Presorted Key: t1.x
                ->  Index Scan using btg_x_y_idx on btg t1
          ->  Sort
-               Sort Key: t2.x, t2.w, t2.z
+               Sort Key: t2.x, t2.n, t2.z
                ->  Index Scan using btg_x_y_idx on btg t2
 (11 rows)
 
@@ -2902,12 +2903,11 @@ SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY x*x, z;
  Sort
    Sort Key: ((x * x)), z
    ->  GroupAggregate
-         Group Key: x, y, w, z
-         ->  Incremental Sort
-               Sort Key: x, y, w, z
-               Presorted Key: x, y
+         Group Key: w, x, y, z
+         ->  Sort
+               Sort Key: w, x, y, z
                ->  Index Scan using btg_x_y_idx on btg
-(8 rows)
+(7 rows)
 
 -- Test the case where the number of incoming subtree path keys is more than
 -- the number of grouping keys.
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 886dc559a5..7a30b8bbe9 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -654,39 +654,39 @@ SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
                ->  Sort
                      Sort Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, p2.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1.a = p2.a) AND (prt1.b = p2.b))
+                           Merge Cond: ((prt1.b = p2.b) AND (prt1.a = p2.a))
                            Filter: ((COALESCE(prt1.a, p2.a) >= 490) AND (COALESCE(prt1.a, p2.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1.a, prt1.b
+                                 Sort Key: prt1.b, prt1.a
                                  ->  Seq Scan on prt1_p1 prt1
                            ->  Sort
-                                 Sort Key: p2.a, p2.b
+                                 Sort Key: p2.b, p2.a
                                  ->  Seq Scan on prt2_p1 p2
          ->  Group
                Group Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, p2_1.b))
                ->  Sort
                      Sort Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, p2_1.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+                           Merge Cond: ((prt1_1.b = p2_1.b) AND (prt1_1.a = p2_1.a))
                            Filter: ((COALESCE(prt1_1.a, p2_1.a) >= 490) AND (COALESCE(prt1_1.a, p2_1.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1_1.a, prt1_1.b
+                                 Sort Key: prt1_1.b, prt1_1.a
                                  ->  Seq Scan on prt1_p2 prt1_1
                            ->  Sort
-                                 Sort Key: p2_1.a, p2_1.b
+                                 Sort Key: p2_1.b, p2_1.a
                                  ->  Seq Scan on prt2_p2 p2_1
          ->  Group
                Group Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, p2_2.b))
                ->  Sort
                      Sort Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, p2_2.b))
                      ->  Merge Full Join
-                           Merge Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+                           Merge Cond: ((prt1_2.b = p2_2.b) AND (prt1_2.a = p2_2.a))
                            Filter: ((COALESCE(prt1_2.a, p2_2.a) >= 490) AND (COALESCE(prt1_2.a, p2_2.a) <= 510))
                            ->  Sort
-                                 Sort Key: prt1_2.a, prt1_2.b
+                                 Sort Key: prt1_2.b, prt1_2.a
                                  ->  Seq Scan on prt1_p3 prt1_2
                            ->  Sort
-                                 Sort Key: p2_2.a, p2_2.b
+                                 Sort Key: p2_2.b, p2_2.a
                                  ->  Seq Scan on prt2_p3 p2_2
 (43 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 4885daffe6..b474ed0cbf 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1210,6 +1210,7 @@ ROLLBACK;
 CREATE TABLE btg AS SELECT
   i % 10 AS x,
   i % 10 AS y,
+  i % 10 AS n,
   'abc' || i % 10 AS z,
   i AS w
 FROM generate_series(1, 100) AS i;
@@ -1237,8 +1238,8 @@ SET enable_hashjoin = off;
 SET enable_nestloop = off;
 EXPLAIN (COSTS OFF)
 SELECT count(*)
-  FROM btg t1 JOIN btg t2 ON t1.w = t2.w AND t1.x = t2.x AND t1.z = t2.z
-  GROUP BY t1.w, t1.z, t1.x;
+  FROM btg t1 JOIN btg t2 ON t1.n = t2.n AND t1.x = t2.x AND t1.z = t2.z
+  GROUP BY t1.n, t1.z, t1.x;
 RESET enable_nestloop;
 RESET enable_hashjoin;
 
-- 
2.39.5

v2-0004-Add-GROUP-BY-strategy-put-the-most-distinct-colum.patchtext/x-patch; charset=UTF-8; name=v2-0004-Add-GROUP-BY-strategy-put-the-most-distinct-colum.patchDownload
From a0d0572457815f5f1a039a15d8922cbce947ac10 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 29 Oct 2024 09:29:30 +0700
Subject: [PATCH v2 4/4] Add GROUP-BY strategy: put the most distinct column at
 the head.

Let's allow GROUP-BY to utilize cost_sort feature which can differentiate
orders of pathkeys lists according to the ndistinct of the first column.
---
 src/backend/optimizer/path/pathkeys.c    | 73 +++++++++++++++++++++++-
 src/test/regress/expected/aggregates.out | 43 +++++++-------
 src/test/regress/sql/aggregates.sql      | 10 ++--
 3 files changed, 97 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 7cf15e89b6..5bacf170e0 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "partitioning/partbounds.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 
 /* Consider reordering of GROUP BY keys? */
 bool		enable_group_by_reordering = true;
@@ -473,6 +474,10 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 	List	   *pathkeys = root->group_pathkeys;
 	List	   *clauses = root->processed_groupClause;
 
+	double		nd_max = 0;
+	PathKey	   *pk_opt = NULL;
+	ListCell   *lc1, *lc2;
+
 	/* always return at least the original pathkeys/clauses */
 	info = makeNode(GroupByOrdering);
 	info->pathkeys = pathkeys;
@@ -526,9 +531,6 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 		/* Test consistency of info structures */
 		for_each_from(lc, infos, 1)
 		{
-			ListCell   *lc1,
-					   *lc2;
-
 			info = lfirst_node(GroupByOrdering, lc);
 
 			Assert(list_length(info->clauses) == list_length(pinfo->clauses));
@@ -546,6 +548,71 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
 		}
 	}
 #endif
+
+	/*
+	 * Let's try the order with the column having max ndistinct value
+	 */
+
+	forboth(lc1, root->group_pathkeys, lc2, root->processed_groupClause)
+	{
+		PathKey			   *pkey = lfirst_node(PathKey, lc1);
+		SortGroupClause	   *gc = (SortGroupClause *) lfirst(lc2);
+		Node			   *node;
+		Bitmapset		   *relids;
+		VariableStatData	vardata;
+		double				nd = -1;
+		bool				isdefault;
+
+		if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)
+			break;
+
+		node = get_sortgroupclause_expr(gc, root->parse->targetList);
+		relids = pull_varnos(root, node);
+
+		if (bms_num_members(relids) != 1 && bms_is_member(0, relids))
+			/*
+			 *Although functional index can estimate distincts here, the chance
+			 * is too low.
+			 */
+			continue;
+
+		examine_variable(root, node, 0, &vardata);
+		if (!HeapTupleIsValid(vardata.statsTuple))
+			continue;
+		nd = get_variable_numdistinct(&vardata, &isdefault);
+		ReleaseVariableStats(vardata);
+		if (isdefault)
+			continue;
+
+		Assert(nd >= 0);
+		if (nd_max == 0 || nd > nd_max)
+		{
+			nd_max = nd;
+			pk_opt = pkey;
+		}
+	}
+
+	if (pk_opt != NULL)
+	{
+		List   *new_pathkeys = list_make1(pk_opt);
+		int		n;
+
+		new_pathkeys = list_concat_unique_ptr(new_pathkeys, root->group_pathkeys);
+		n = group_keys_reorder_by_pathkeys(new_pathkeys, &pathkeys, &clauses,
+										   root->num_groupby_pathkeys);
+
+		if (n > 0 &&
+			(enable_incremental_sort || n == root->num_groupby_pathkeys) &&
+			compare_pathkeys(pathkeys, root->group_pathkeys) != PATHKEYS_EQUAL)
+		{
+			info = makeNode(GroupByOrdering);
+			info->pathkeys = pathkeys;
+			info->clauses = clauses;
+
+			infos = lappend(infos, info);
+		}
+	}
+
 	return infos;
 }
 
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index b974d365be..d57c4e553f 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2812,14 +2812,14 @@ SELECT balk(hundred) FROM tenk1;
 ROLLBACK;
 -- GROUP BY optimization by reordering GROUP BY clauses
 CREATE TABLE btg AS SELECT
-  i % 10 AS x,
-  i % 10 AS y,
+  i % 231 AS x,
+  i % 49 AS y,
   i % 10 AS n,
-  'abc' || i % 10 AS z,
+  'abc' || i % 2 AS z,
   i AS w
-FROM generate_series(1, 100) AS i;
+FROM generate_series(1, 1000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
-ANALYZE btg;
+VACUUM ANALYZE btg;
 SET enable_hashagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -2866,19 +2866,19 @@ EXPLAIN (COSTS OFF)
 SELECT count(*)
   FROM btg t1 JOIN btg t2 ON t1.n = t2.n AND t1.x = t2.x AND t1.z = t2.z
   GROUP BY t1.n, t1.z, t1.x;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                           QUERY PLAN                           
+----------------------------------------------------------------
  GroupAggregate
    Group Key: t1.x, t1.n, t1.z
-   ->  Merge Join
-         Merge Cond: ((t1.x = t2.x) AND (t1.n = t2.n) AND (t1.z = t2.z))
-         ->  Incremental Sort
-               Sort Key: t1.x, t1.n, t1.z
-               Presorted Key: t1.x
+   ->  Incremental Sort
+         Sort Key: t1.x, t1.n, t1.z
+         Presorted Key: t1.x
+         ->  Merge Join
+               Merge Cond: (t1.x = t2.x)
+               Join Filter: ((t2.n = t1.n) AND (t2.z = t1.z))
                ->  Index Scan using btg_x_y_idx on btg t1
-         ->  Sort
-               Sort Key: t2.x, t2.n, t2.z
-               ->  Index Scan using btg_x_y_idx on btg t2
+               ->  Materialize
+                     ->  Index Scan using btg_x_y_idx on btg t2
 (11 rows)
 
 RESET enable_nestloop;
@@ -2903,11 +2903,12 @@ SELECT count(*) FROM btg GROUP BY w, x, y, z ORDER BY x*x, z;
  Sort
    Sort Key: ((x * x)), z
    ->  GroupAggregate
-         Group Key: w, x, y, z
-         ->  Sort
-               Sort Key: w, x, y, z
+         Group Key: x, y, w, z
+         ->  Incremental Sort
+               Sort Key: x, y, w, z
+               Presorted Key: x, y
                ->  Index Scan using btg_x_y_idx on btg
-(7 rows)
+(8 rows)
 
 -- Test the case where the number of incoming subtree path keys is more than
 -- the number of grouping keys.
@@ -2944,9 +2945,9 @@ GROUP BY c1.w, c1.z;
                      QUERY PLAN                      
 -----------------------------------------------------
  GroupAggregate
-   Group Key: c1.w, c1.z
+   Group Key: c1.z, c1.w
    ->  Sort
-         Sort Key: c1.w, c1.z, c1.x, c1.y
+         Sort Key: c1.z, c1.w, c1.x, c1.y
          ->  Merge Join
                Merge Cond: (c1.x = c2.x)
                ->  Sort
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index b474ed0cbf..557efc1923 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1208,14 +1208,14 @@ ROLLBACK;
 
 -- GROUP BY optimization by reordering GROUP BY clauses
 CREATE TABLE btg AS SELECT
-  i % 10 AS x,
-  i % 10 AS y,
+  i % 231 AS x,
+  i % 49 AS y,
   i % 10 AS n,
-  'abc' || i % 10 AS z,
+  'abc' || i % 2 AS z,
   i AS w
-FROM generate_series(1, 100) AS i;
+FROM generate_series(1, 1000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
-ANALYZE btg;
+VACUUM ANALYZE btg;
 
 SET enable_hashagg = off;
 SET enable_seqscan = off;
-- 
2.39.5

#11Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Andrei Lepikhov (#10)
Re: Consider the number of columns in the sort cost model

On 29.10.2024 05:47, Andrei Lepikhov wrote:

On 10/28/24 16:48, Alena Rybakina wrote:

On 23.10.2024 04:39, Andrei Lepikhov wrote:

On 15/10/2024 12:15, David Rowley wrote:
And the last patch is a demo of how I'm going to use the previous
three patches and add one more strategy to improve the order of
columns in the GROUP-BY clause.

To be honest, I didn’t find information about this in the code, but
did I understand everything correctly?

Yes

Thanks for the confirmation.

2. I noticed that statistics of distinct values ​​are calculated
several times. Maybe I miss something, but this can be avoided if
these statistics can be saved after calculation. For example, I saw
that it is possible that you calculate the same statistic information
for the same equivalence members in cost_incremental_sort and
identify_sort_ecmember. Is it possible to store information in a
structure and use it later?

Hmm, I don't see multiple calculations. em_distinct has made
specifically for this sake. Can you provide specific case and code lines?

When you said that, I saw it. You are right, checking for 0 is what
stops us from useless recalculation. I noticed it right away.

3. I think you should initialize the variable ndist in your patch. I
faced the compile complaining during your code compilation.

costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized
[-Werror=maybe-uninitialized]
  6694 |                         em->em_ndistinct = ndist;
       |                         ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
  6680 |                         double  ndist;
       |                                 ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1

I think you can just update your compiler. But I added the ndist
initialisation to make more compilers happy :).

Thank you :)

+        Assert (node != NULL);
+
          examine_variable(root, node, 0, &vardata);
          if (!HeapTupleIsValid(vardata.statsTuple))
              continue;

I don't think so. At least until you provide the case when the
get_sortgroupclause_expr function returns NULL.
That's more, remember - the patch 0004 here is just to show the
perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly
because of 828e94c. So, see the new version in the attachment.

Yes, you are right, if this case is possible, then it is connected with
an error in the formation of the target list.

I played around with the examples a bit and couldn't figure out
something. When I added the same values ​​to different columns - firstly
in a, later in b, the order of the columns for sort operation doesn't
change. Isn't this a mistake?

create table a (x1 int, y1 int);
create table b (x2 int, y2 int);
insert into a values (NULL, NULL);
insert into a values (NULL, 1);
insert into a values (1, 1);
insert into a values (1, NULL);

create index a_x1_idx on a(x1);
create index b_x2_idx on b(x2);
create index a_y1_idx on a(y1);
create index b_y2_idx on b(y2);

insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 1, 3 from generate_series(3,30) as id;

explain analyze select a.x1, s.x2, s.y2 from a left join (select
distinct * from b) s on a.x1=s.x2;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=44.99..48.15 rows=5 width=12) (actual
time=0.225..0.250 rows=8 loops=1)
   Hash Cond: (b.x2 = a.x1)
   ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8) (actual
time=0.117..0.123 rows=3 loops=1)
         Group Key: b.x2, b.y2
         Batches: 1  Memory Usage: 40kB
         ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.030..0.044 rows=48 loops=1)
   ->  Hash  (cost=1.04..1.04 rows=4 width=4) (actual time=0.073..0.074
rows=4 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual
time=0.047..0.051 rows=4 loops=1)
 Planning Time: 1.649 ms
 Execution Time: 0.485 ms
(11 rows)

delete from b;
insert into b select 2, 1 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 3, 1 from generate_series(3,30) as id;
vacuum analyze;
explain analyze select a.x1, s.x2, s.y2 from a left join (select
distinct * from b) s on a.x1=s.x2;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.79..2.86 rows=4 width=12) (actual
time=0.083..0.090 rows=4 loops=1)
   Hash Cond: (a.x1 = b.x2)
   ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual
time=0.010..0.011 rows=4 loops=1)
   ->  Hash  (cost=1.75..1.75 rows=3 width=8) (actual time=0.067..0.068
rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=1.72..1.75 rows=3 width=8) (actual
time=0.063..0.064 rows=3 loops=1)
               Group Key: b.x2, b.y2
               Batches: 1  Memory Usage: 24kB
               ->  Seq Scan on b  (cost=0.00..1.48 rows=48 width=8)
(actual time=0.006..0.014 rows=48 loops=1)
 Planning Time: 0.391 ms
 Execution Time: 0.151 ms
(11 rows)

I noticed a spelling: "significance phusics".

/*
 * Calculate multiplier reflecting the number of comparisons which executor
 * have to perform during the sort with this specific order of columns.
 *
 * The comparison factor f = 1.+F(pathkeys). There 1. incapsulates the
 * second-order of *significance phusics* which cost function doesn't
consider.

--
Regards,
Alena Rybakina
Postgres Professional

#12Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Alena Rybakina (#11)
Re: Consider the number of columns in the sort cost model

I played around with the examples a bit and couldn't figure out
something. When I added the same values ​​to different columns -
firstly in a, later in b, the order of the columns for sort operation
doesn't change. Isn't this a mistake?

create table a (x1 int, y1 int);
create table b (x2 int, y2 int);
insert into a values (NULL, NULL);
insert into a values (NULL, 1);
insert into a values (1, 1);
insert into a values (1, NULL);

create index a_x1_idx on a(x1);
create index b_x2_idx on b(x2);
create index a_y1_idx on a(y1);
create index b_y2_idx on b(y2);

insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 1, 3 from generate_series(3,30) as id;

explain analyze select a.x1, s.x2, s.y2 from a left join (select
distinct * from b) s on a.x1=s.x2;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=44.99..48.15 rows=5 width=12) (actual
time=0.225..0.250 rows=8 loops=1)
   Hash Cond: (b.x2 = a.x1)
   ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8) (actual
time=0.117..0.123 rows=3 loops=1)
         Group Key: b.x2, b.y2
         Batches: 1  Memory Usage: 40kB
         ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.030..0.044 rows=48 loops=1)
   ->  Hash  (cost=1.04..1.04 rows=4 width=4) (actual
time=0.073..0.074 rows=4 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual
time=0.047..0.051 rows=4 loops=1)
 Planning Time: 1.649 ms
 Execution Time: 0.485 ms
(11 rows)

delete from b;
insert into b select 2, 1 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 3, 1 from generate_series(3,30) as id;
vacuum analyze;
explain analyze select a.x1, s.x2, s.y2 from a left join (select
distinct * from b) s on a.x1=s.x2;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.79..2.86 rows=4 width=12) (actual
time=0.083..0.090 rows=4 loops=1)
   Hash Cond: (a.x1 = b.x2)
   ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual
time=0.010..0.011 rows=4 loops=1)
   ->  Hash  (cost=1.75..1.75 rows=3 width=8) (actual
time=0.067..0.068 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=1.72..1.75 rows=3 width=8) (actual
time=0.063..0.064 rows=3 loops=1)
               Group Key: b.x2, b.y2
               Batches: 1  Memory Usage: 24kB
               ->  Seq Scan on b  (cost=0.00..1.48 rows=48 width=8)
(actual time=0.006..0.014 rows=48 loops=1)
 Planning Time: 0.391 ms
 Execution Time: 0.151 ms
(11 rows)

Sorry, I missed vacuum analyze before deleting all data from table b,
but after running it I still got the same plan.

alena@postgres=# create table a (x1 int, y1 int);
create table b (xcreate table a (x1 int, y1 int);
create table b (x2 int, y2 int););
insert into a values (NULL, NULL);
insert into a values (NULL, 1);
insert into a values (1, 1);L);
insert into a values (1, NULL);
create index a_x1_idx on a(x1);
create index a_x1_idx on a(x1);
create index b_x2_idx on b(x2);
create index a_y1_idx on a(y1);
create index b_y2_idx on b(y2);
insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 1, 3 from generate_series(3,30) as id;

alena@postgres=# vacuum analyze;
VACUUM
alena@postgres=# explain analyze select a.x1, s.x2, s.y2 from a left
join (select distinct * from b) s on a.x1=s.x2;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.79..2.86 rows=4 width=12) (actual
time=0.168..0.185 rows=8 loops=1)
   Hash Cond: (a.x1 = b.x2)
   ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual
time=0.027..0.029 rows=4 loops=1)
   ->  Hash  (cost=1.75..1.75 rows=3 width=8) (actual time=0.129..0.130
rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=1.72..1.75 rows=3 width=8) (actual
time=0.119..0.123 rows=3 loops=1)
               Group Key: b.x2, b.y2
               Batches: 1  Memory Usage: 24kB
               ->  Seq Scan on b  (cost=0.00..1.48 rows=48 width=8)
(actual time=0.013..0.029 rows=48 loops=1)
 Planning Time: 1.464 ms
 Execution Time: 0.352 ms
(11 rows)

--
Regards,
Alena Rybakina
Postgres Professional

#13Andrei Lepikhov
lepihov@gmail.com
In reply to: Alena Rybakina (#11)
Re: Consider the number of columns in the sort cost model

On 10/31/24 21:00, Alena Rybakina wrote:

On 29.10.2024 05:47, Andrei Lepikhov wrote:
I played around with the examples a bit and couldn't figure out
something. When I added the same values ​​to different columns - firstly
in a, later in b, the order of the columns for sort operation doesn't
change. Isn't this a mistake?

This feature implements a cost-based approach to choosing the grouping
clauses' order. What kind of 'mistake' do you see here? Is there
something wrong with the costs, or do you see a bug in the cost
calculation code? Please specify your request.

--
regards, Andrei Lepikhov

#14Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Andrei Lepikhov (#13)
Re: Consider the number of columns in the sort cost model

Hi folks,

Just wanted to mention, that looks like some CFBot test are failing,
something around level_tracking in pgss.