MergeAppend could consider sorting cheapest child path

Started by Alexander Pyhalovalmost 2 years ago44 messageshackers
Jump to latest
#1Alexander Pyhalov
a.pyhalov@postgrespro.ru

Hi.

Now when planner finds suitable pathkeys in
generate_orderedappend_paths(), it uses them, even if explicit sort of
the cheapest child path could be more efficient.

We encountered this issue on partitioned table with two indexes, where
one is suitable for sorting, and another is good for selecting data.
MergeAppend was generated
with subpaths doing index scan on suitably ordered index and filtering a
lot of data.
The suggested fix allows MergeAppend to consider sorting on cheapest
childrel total path as an alternative.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachments:

v1-0001-MergeAppend-could-consider-using-sorted-best-path.patchtext/x-diff; name=v1-0001-MergeAppend-could-consider-using-sorted-best-path.patchDownload+141-40
#2Bruce Momjian
bruce@momjian.us
In reply to: Alexander Pyhalov (#1)
Re: MergeAppend could consider sorting cheapest child path

Is this still being considered?

---------------------------------------------------------------------------

On Tue, Jun 18, 2024 at 07:45:09PM +0300, Alexander Pyhalov wrote:

Hi.

Now when planner finds suitable pathkeys in generate_orderedappend_paths(),
it uses them, even if explicit sort of the cheapest child path could be more
efficient.

We encountered this issue on partitioned table with two indexes, where one
is suitable for sorting, and another is good for selecting data. MergeAppend
was generated
with subpaths doing index scan on suitably ordered index and filtering a lot
of data.
The suggested fix allows MergeAppend to consider sorting on cheapest
childrel total path as an alternative.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

From d5eb3d326d83e2ca47c17552fcc6fd27b6b98d4e Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov <a.pyhalov@postgrespro.ru>
Date: Tue, 18 Jun 2024 15:56:18 +0300
Subject: [PATCH] MergeAppend could consider using sorted best path.

This helps when index with suitable pathkeys is not
good for filtering data.
---
.../postgres_fdw/expected/postgres_fdw.out | 6 +-
src/backend/optimizer/path/allpaths.c | 21 +++++
src/test/regress/expected/inherit.out | 45 +++++++++-
src/test/regress/expected/partition_join.out | 87 +++++++++++--------
src/test/regress/expected/union.out | 6 +-
src/test/regress/sql/inherit.sql | 17 ++++
6 files changed, 141 insertions(+), 41 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ea566d50341..687591e4a97 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10074,13 +10074,15 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a
->  Nested Loop
Join Filter: (t1.a = t2.b)
->  Append
-               ->  Foreign Scan on ftprt1_p1 t1_1
+               ->  Sort
+                     Sort Key: t1_1.a
+                     ->  Foreign Scan on ftprt1_p1 t1_1
->  Foreign Scan on ftprt1_p2 t1_2
->  Materialize
->  Append
->  Foreign Scan on ftprt2_p1 t2_1
->  Foreign Scan on ftprt2_p2 t2_2
-(10 rows)
+(12 rows)
SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1;
a  |  b  
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4895cee9944..827bc469269 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1845,6 +1845,27 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
/* Assert we do have an unparameterized path for this child */
Assert(cheapest_total->param_info == NULL);
}
+			else
+			{
+				/*
+				 * Even if we found necessary pathkeys, using unsorted path
+				 * can be more efficient.
+				 */
+				Path		sort_path;
+
+				cost_sort(&sort_path,
+						  root,
+						  pathkeys,
+						  childrel->cheapest_total_path->total_cost,
+						  childrel->cheapest_total_path->rows,
+						  childrel->cheapest_total_path->pathtarget->width,
+						  0.0,
+						  work_mem,
+						  -1.0 /* need all tuples to sort them */ );
+
+				if (compare_path_costs(&sort_path, cheapest_total, TOTAL_COST) < 0)
+					cheapest_total = childrel->cheapest_total_path;
+			}
/*
* When building a fractional path, determine a cheapest
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index ad732134148..16e78c8d2ff 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1555,6 +1555,7 @@ insert into matest2 (name) values ('Test 4');
insert into matest3 (name) values ('Test 5');
insert into matest3 (name) values ('Test 6');
set enable_indexscan = off;  -- force use of seqscan/sort, so no merge
+set enable_sort = off; -- avoid sorting below MergeAppend
explain (verbose, costs off) select * from matest0 order by 1-id;
QUERY PLAN                         
------------------------------------------------------------
@@ -1608,6 +1609,7 @@ select min(1-id) from matest0;
(1 row)
reset enable_indexscan;
+reset enable_sort;
set enable_seqscan = off;  -- plan with fewest seqscans should be merge
set enable_parallel_append = off; -- Don't let parallel-append interfere
explain (verbose, costs off) select * from matest0 order by 1-id;
@@ -1702,7 +1704,9 @@ order by t1.b limit 10;
Merge Cond: (t1.b = t2.b)
->  Merge Append
Sort Key: t1.b
-               ->  Index Scan using matest0i on matest0 t1_1
+               ->  Sort
+                     Sort Key: t1_1.b
+                     ->  Seq Scan on matest0 t1_1
->  Index Scan using matest1i on matest1 t1_2
->  Materialize
->  Merge Append
@@ -1711,7 +1715,7 @@ order by t1.b limit 10;
Filter: (c = d)
->  Index Scan using matest1i on matest1 t2_2
Filter: (c = d)
-(14 rows)
+(16 rows)

reset enable_nestloop;
drop table matest0 cascade;
@@ -2663,6 +2667,43 @@ explain (costs off) select * from mcrparted where a = 10 order by a, abs(b), c;

reset enable_bitmapscan;
drop table mcrparted;
+-- Check that sort path can be used by MergeAppend even when there are suitable pathkeys
+create table hash_parted (i int, j int, k int) partition by hash(i);
+create table hash_parted_1 partition of hash_parted for values with (modulus 4, remainder 0);
+create table hash_parted_2 partition of hash_parted for values with (modulus 4, remainder 1);
+create table hash_parted_3 partition of hash_parted for values with (modulus 4, remainder 2);
+create table hash_parted_4 partition of hash_parted for values with (modulus 4, remainder 3);
+--create table hash_parted_5 partition of hash_parted for values with (modulus 6, remainder 4);
+--create table hash_parted_6 partition of hash_parted for values with (modulus 6, remainder 5);
+create index on hash_parted(j);
+create index on hash_parted(k);
+insert into hash_parted select i, i, i from generate_series(1,10000) i;
+analyze hash_parted;
+explain (costs off) select * from hash_parted where k<100 order by j limit 100;
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: hash_parted.j
+         ->  Sort
+               Sort Key: hash_parted_1.j
+               ->  Index Scan using hash_parted_1_k_idx on hash_parted_1
+                     Index Cond: (k < 100)
+         ->  Sort
+               Sort Key: hash_parted_2.j
+               ->  Index Scan using hash_parted_2_k_idx on hash_parted_2
+                     Index Cond: (k < 100)
+         ->  Sort
+               Sort Key: hash_parted_3.j
+               ->  Index Scan using hash_parted_3_k_idx on hash_parted_3
+                     Index Cond: (k < 100)
+         ->  Sort
+               Sort Key: hash_parted_4.j
+               ->  Index Scan using hash_parted_4_k_idx on hash_parted_4
+                     Index Cond: (k < 100)
+(19 rows)
+
+drop table hash_parted;
-- Ensure LIST partitions allow an Append to be used instead of a MergeAppend
create table bool_lp (b bool) partition by list(b);
create table bool_lp_true partition of bool_lp for values in(true);
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6d07f86b9bc..80d480d33d5 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1309,28 +1309,32 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
-- This should generate a partitionwise join, but currently fails to
EXPLAIN (COSTS OFF)
SELECT t1.a, t2.b FROM (SELECT * FROM prt1 WHERE a < 450) t1 LEFT JOIN (SELECT * FROM prt2 WHERE b > 250) t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
-                        QUERY PLAN                         
------------------------------------------------------------
- Incremental Sort
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Sort
Sort Key: prt1.a, prt2.b
-   Presorted Key: prt1.a
-   ->  Merge Left Join
-         Merge Cond: (prt1.a = prt2.b)
-         ->  Sort
-               Sort Key: prt1.a
-               ->  Append
-                     ->  Seq Scan on prt1_p1 prt1_1
-                           Filter: ((a < 450) AND (b = 0))
-                     ->  Seq Scan on prt1_p2 prt1_2
-                           Filter: ((a < 450) AND (b = 0))
-         ->  Sort
-               Sort Key: prt2.b
-               ->  Append
+   ->  Merge Right Join
+         Merge Cond: (prt2.b = prt1.a)
+         ->  Append
+               ->  Sort
+                     Sort Key: prt2_1.b
->  Seq Scan on prt2_p2 prt2_1
Filter: (b > 250)
+               ->  Sort
+                     Sort Key: prt2_2.b
->  Seq Scan on prt2_p3 prt2_2
Filter: (b > 250)
-(19 rows)
+         ->  Materialize
+               ->  Append
+                     ->  Sort
+                           Sort Key: prt1_1.a
+                           ->  Seq Scan on prt1_p1 prt1_1
+                                 Filter: ((a < 450) AND (b = 0))
+                     ->  Sort
+                           Sort Key: prt1_2.a
+                           ->  Seq Scan on prt1_p2 prt1_2
+                                 Filter: ((a < 450) AND (b = 0))
+(23 rows)
SELECT t1.a, t2.b FROM (SELECT * FROM prt1 WHERE a < 450) t1 LEFT JOIN (SELECT * FROM prt2 WHERE b > 250) t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
a  |  b  
@@ -1350,25 +1354,33 @@ SELECT t1.a, t2.b FROM (SELECT * FROM prt1 WHERE a < 450) t1 LEFT JOIN (SELECT *
-- partitionwise join does not apply
EXPLAIN (COSTS OFF)
SELECT t1.a, t2.b FROM prt1 t1, prt2 t2 WHERE t1::text = t2::text AND t1.a = t2.b ORDER BY t1.a;
-                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
+                            QUERY PLAN                            
+------------------------------------------------------------------
Merge Join
-   Merge Cond: ((t1.a = t2.b) AND (((((t1.*)::prt1))::text) = ((((t2.*)::prt2))::text)))
-   ->  Sort
-         Sort Key: t1.a, ((((t1.*)::prt1))::text)
-         ->  Result
-               ->  Append
-                     ->  Seq Scan on prt1_p1 t1_1
-                     ->  Seq Scan on prt1_p2 t1_2
-                     ->  Seq Scan on prt1_p3 t1_3
-   ->  Sort
-         Sort Key: t2.b, ((((t2.*)::prt2))::text)
-         ->  Result
-               ->  Append
+   Merge Cond: (t1.a = t2.b)
+   Join Filter: ((((t2.*)::prt2))::text = (((t1.*)::prt1))::text)
+   ->  Append
+         ->  Sort
+               Sort Key: t1_1.a
+               ->  Seq Scan on prt1_p1 t1_1
+         ->  Sort
+               Sort Key: t1_2.a
+               ->  Seq Scan on prt1_p2 t1_2
+         ->  Sort
+               Sort Key: t1_3.a
+               ->  Seq Scan on prt1_p3 t1_3
+   ->  Materialize
+         ->  Append
+               ->  Sort
+                     Sort Key: t2_1.b
->  Seq Scan on prt2_p1 t2_1
+               ->  Sort
+                     Sort Key: t2_2.b
->  Seq Scan on prt2_p2 t2_2
+               ->  Sort
+                     Sort Key: t2_3.b
->  Seq Scan on prt2_p3 t2_3
-(16 rows)
+(24 rows)
SELECT t1.a, t2.b FROM prt1 t1, prt2 t2 WHERE t1::text = t2::text AND t1.a = t2.b ORDER BY t1.a;
a  | b  
@@ -4906,21 +4918,26 @@ EXPLAIN (COSTS OFF)
SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b) WHERE t1.b >= 125 AND t1.b < 225 ORDER BY t1.a, t1.b;
QUERY PLAN                             
--------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a, t1.b
-   ->  Append
+   ->  Sort
+         Sort Key: t1_1.a, t1_1.b
->  Hash Join
Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.b = t2_1.b))
->  Seq Scan on alpha_neg_p1 t1_1
Filter: ((b >= 125) AND (b < 225))
->  Hash
->  Seq Scan on beta_neg_p1 t2_1
+   ->  Sort
+         Sort Key: t1_2.a, t1_2.b
->  Hash Join
Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.b = t1_2.b))
->  Seq Scan on beta_neg_p2 t2_2
->  Hash
->  Seq Scan on alpha_neg_p2 t1_2
Filter: ((b >= 125) AND (b < 225))
+   ->  Sort
+         Sort Key: t1_4.a, t1_4.b
->  Hash Join
Hash Cond: ((t2_4.a = t1_4.a) AND (t2_4.b = t1_4.b))
->  Append
@@ -4935,7 +4952,7 @@ SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2
Filter: ((b >= 125) AND (b < 225))
->  Seq Scan on alpha_pos_p3 t1_6
Filter: ((b >= 125) AND (b < 225))
-(29 rows)
+(34 rows)
SELECT t1.*, t2.* FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a AND t1.b = t2.b) WHERE t1.b >= 125 AND t1.b < 225 ORDER BY t1.a, t1.b;
a  |  b  |  c   | a  |  b  |  c   
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 0fd0e1c38b3..4c1c173d8e6 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -1195,12 +1195,14 @@ select event_id
----------------------------------------------------------
Merge Append
Sort Key: events.event_id
-   ->  Index Scan using events_pkey on events
+   ->  Sort
+         Sort Key: events.event_id
+         ->  Seq Scan on events
->  Sort
Sort Key: events_1.event_id
->  Seq Scan on events_child events_1
->  Index Scan using other_events_pkey on other_events
-(7 rows)
+(9 rows)
drop table events_child, events, other_events;
reset enable_indexonlyscan;
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index e3bcfdb181e..5331e49283f 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -586,11 +586,13 @@ insert into matest3 (name) values ('Test 5');
insert into matest3 (name) values ('Test 6');
set enable_indexscan = off;  -- force use of seqscan/sort, so no merge
+set enable_sort = off; -- avoid sorting below MergeAppend
explain (verbose, costs off) select * from matest0 order by 1-id;
select * from matest0 order by 1-id;
explain (verbose, costs off) select min(1-id) from matest0;
select min(1-id) from matest0;
reset enable_indexscan;
+reset enable_sort;

set enable_seqscan = off; -- plan with fewest seqscans should be merge
set enable_parallel_append = off; -- Don't let parallel-append interfere
@@ -976,6 +978,21 @@ reset enable_bitmapscan;

drop table mcrparted;

+-- Check that sort path can be used by MergeAppend even when there are suitable pathkeys
+create table hash_parted (i int, j int, k int) partition by hash(i);
+create table hash_parted_1 partition of hash_parted for values with (modulus 4, remainder 0);
+create table hash_parted_2 partition of hash_parted for values with (modulus 4, remainder 1);
+create table hash_parted_3 partition of hash_parted for values with (modulus 4, remainder 2);
+create table hash_parted_4 partition of hash_parted for values with (modulus 4, remainder 3);
+--create table hash_parted_5 partition of hash_parted for values with (modulus 6, remainder 4);
+--create table hash_parted_6 partition of hash_parted for values with (modulus 6, remainder 5);
+create index on hash_parted(j);
+create index on hash_parted(k);
+insert into hash_parted select i, i, i from generate_series(1,10000) i;
+analyze hash_parted;
+explain (costs off) select * from hash_parted where k<100 order by j limit 100;
+drop table hash_parted;
+
-- Ensure LIST partitions allow an Append to be used instead of a MergeAppend
create table bool_lp (b bool) partition by list(b);
create table bool_lp_true partition of bool_lp for values in(true);
-- 
2.34.1

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"

#3Andy Fan
zhihui.fan1213@gmail.com
In reply to: Bruce Momjian (#2)
Re: MergeAppend could consider sorting cheapest child path

Bruce Momjian <bruce@momjian.us> writes:

Is this still being considered?

I'd +1 on this feature. I guess this would be more useful on parallel
case, where the Sort can be pushed down to parallel worker, and in the
distributed database case, where the Sort can be pushed down to multiple
nodes, at the result, the leader just do the merge works.

At the high level implementaion, sorting *cheapest* child path looks
doesn't add too much overhead on the planning effort.

--
Best Regards
Andy Fan

#4Nikita Malakhov
hukutoc@gmail.com
In reply to: Andy Fan (#3)
Re: MergeAppend could consider sorting cheapest child path

Hi!

I've checked this thread and examples in it, and do not see stable
improvements
in base tests. Sometimes base tests are considerably slower with patch,
like:

explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.46..19.90 rows=10 width=16) (actual time=0.007..0.008
rows=0 loops=1)
-> Merge Join (cost=0.46..181.24 rows=93 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
Merge Cond: (t1.b = t2.b)
-> Merge Append (cost=0.17..90.44 rows=1851 width=16) (actual
time=0.006..0.007 rows=0 loops=1)
Sort Key: t1.b
-> Sort (cost=0.01..0.02 rows=1 width=16) (actual
time=0.004..0.004 rows=0 loops=1)
Sort Key: t1_1.b
Sort Method: quicksort Memory: 25kB
-> Seq Scan on matest0 t1_1 (cost=0.00..0.00 rows=1
width=16) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using matest1i on matest1 t1_2
(cost=0.15..71.90 rows=1850 width=16) (actual time=0.002..0.002 rows=0
loops=1)
-> Materialize (cost=0.29..84.81 rows=10 width=4) (never
executed)
-> Merge Append (cost=0.29..84.78 rows=10 width=4) (never
executed)
Sort Key: t2.b
-> Index Scan using matest0i on matest0 t2_1
(cost=0.12..8.14 rows=1 width=4) (never executed)
Filter: (c = d)
-> Index Scan using matest1i on matest1 t2_2
(cost=0.15..76.53 rows=9 width=4) (never executed)
Filter: (c = d)
Planning Time: 0.252 ms
Execution Time: 0.048 ms
(19 rows)

explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..20.88 rows=10 width=16) (actual time=0.004..0.004
rows=0 loops=1)
-> Merge Join (cost=0.57..189.37 rows=93 width=16) (actual
time=0.003..0.004 rows=0 loops=1)
Merge Cond: (t1.b = t2.b)
-> Merge Append (cost=0.29..98.56 rows=1851 width=16) (actual
time=0.002..0.003 rows=0 loops=1)
Sort Key: t1.b
-> Index Scan using matest0i on matest0 t1_1
(cost=0.12..8.14 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using matest1i on matest1 t1_2
(cost=0.15..71.90 rows=1850 width=16) (actual time=0.001..0.001 rows=0
loops=1)
-> Materialize (cost=0.29..84.81 rows=10 width=4) (never
executed)
-> Merge Append (cost=0.29..84.78 rows=10 width=4) (never
executed)
Sort Key: t2.b
-> Index Scan using matest0i on matest0 t2_1
(cost=0.12..8.14 rows=1 width=4) (never executed)
Filter: (c = d)
-> Index Scan using matest1i on matest1 t2_2
(cost=0.15..76.53 rows=9 width=4) (never executed)
Filter: (c = d)
Planning Time: 0.278 ms
Execution Time: 0.025 ms
(16 rows)

(patched)
explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.46..19.90 rows=10 width=16) (actual time=0.007..0.008
rows=0 loops=1)
-> Merge Join (cost=0.46..181.24 rows=93 width=16) (actual
time=0.007..0.007 rows=0 loops=1)
Merge Cond: (t1.b = t2.b)
-> Merge Append (cost=0.17..90.44 rows=1851 width=16) (actual
time=0.006..0.007 rows=0 loops=1)
Sort Key: t1.b
-> Sort (cost=0.01..0.02 rows=1 width=16) (actual
time=0.004..0.004 rows=0 loops=1)
Sort Key: t1_1.b
Sort Method: quicksort Memory: 25kB
-> Seq Scan on matest0 t1_1 (cost=0.00..0.00 rows=1
width=16) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using matest1i on matest1 t1_2
(cost=0.15..71.90 rows=1850 width=16) (actual time=0.002..0.002 rows=0
loops=1)
-> Materialize (cost=0.29..84.81 rows=10 width=4) (never
executed)
-> Merge Append (cost=0.29..84.78 rows=10 width=4) (never
executed)
Sort Key: t2.b
-> Index Scan using matest0i on matest0 t2_1
(cost=0.12..8.14 rows=1 width=4) (never executed)
Filter: (c = d)
-> Index Scan using matest1i on matest1 t2_2
(cost=0.15..76.53 rows=9 width=4) (never executed)
Filter: (c = d)
Planning Time: 0.252 ms
Execution Time: 0.048 ms
(19 rows)

(vanilla)
explain analyze
select t1.* from matest0 t1, matest0 t2
where t1.b = t2.b and t2.c = t2.d
order by t1.b limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..20.88 rows=10 width=16) (actual time=0.004..0.004
rows=0 loops=1)
-> Merge Join (cost=0.57..189.37 rows=93 width=16) (actual
time=0.003..0.004 rows=0 loops=1)
Merge Cond: (t1.b = t2.b)
-> Merge Append (cost=0.29..98.56 rows=1851 width=16) (actual
time=0.002..0.003 rows=0 loops=1)
Sort Key: t1.b
-> Index Scan using matest0i on matest0 t1_1
(cost=0.12..8.14 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using matest1i on matest1 t1_2
(cost=0.15..71.90 rows=1850 width=16) (actual time=0.001..0.001 rows=0
loops=1)
-> Materialize (cost=0.29..84.81 rows=10 width=4) (never
executed)
-> Merge Append (cost=0.29..84.78 rows=10 width=4) (never
executed)
Sort Key: t2.b
-> Index Scan using matest0i on matest0 t2_1
(cost=0.12..8.14 rows=1 width=4) (never executed)
Filter: (c = d)
-> Index Scan using matest1i on matest1 t2_2
(cost=0.15..76.53 rows=9 width=4) (never executed)
Filter: (c = d)
Planning Time: 0.278 ms
Execution Time: 0.025 ms
(16 rows)

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

#5Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Andy Fan (#3)
Re: MergeAppend could consider sorting cheapest child path

Andy Fan писал(а) 2024-10-17 03:33:

Bruce Momjian <bruce@momjian.us> writes:

Is this still being considered?

I'd +1 on this feature. I guess this would be more useful on parallel
case, where the Sort can be pushed down to parallel worker, and in the
distributed database case, where the Sort can be pushed down to
multiple
nodes, at the result, the leader just do the merge works.

At the high level implementaion, sorting *cheapest* child path looks
doesn't add too much overhead on the planning effort.

Hi.

I've updated patch. One more interesting case which we found - when
fractional path is selected, it still can be more expensive than sorted
cheapest total path (as we look only on indexes whith necessary
pathkeys, not on indexes which allow efficiently fetch data).
So far couldn't find artificial example, but we've seen inadequate index
selection due to this issue - instead of using index suited for filters
in where, index, suitable for sorting was selected as one having the
cheapest fractional cost.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachments:

v2-0001-MergeAppend-could-consider-using-sorted-best-path.patchtext/x-diff; name=v2-0001-MergeAppend-could-consider-using-sorted-best-path.patchDownload+194-64
#6Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Pyhalov (#5)
Re: MergeAppend could consider sorting cheapest child path

On 3/28/25 09:19, Alexander Pyhalov wrote:

Andy Fan писал(а) 2024-10-17 03:33:
I've updated patch. One more interesting case which we found - when
fractional path is selected, it still can be more expensive than sorted
cheapest total path (as we look only on indexes whith necessary
pathkeys, not on indexes which allow efficiently fetch data).
So far couldn't find artificial example, but we've seen inadequate index
selection due to this issue - instead of using index suited for filters
in where, index, suitable for sorting was selected as one having the
cheapest fractional cost.

I think it is necessary to generalise the approach a little.

Each MergeAppend subpath candidate that fits pathkeys should be compared
to the overall-optimal path + explicit Sort node. Let's label this
two-node composition as base_path. There are three cases exist:
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the
total-optimal case - cheapest_total_path and for a fractional case, we
may employ the get_cheapest_fractional_path routine to detect proper
base_path.

It may provide a more effective plan either in full, fractional and
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is
profitable to sort them instead of switching to plain Append.

In general, analysing the regression tests changed by this code, I see
that the cost model prefers a series of small sortings instead of a
single massive one. May be it will show some profit for execution time.

I am not afraid of any palpable planning overhead here because we just
do cheap cost estimation and comparison operations that don't need
additional memory allocations. The caller is responsible for building a
proper Sort node if this method is chosen as optimal.

In the attachment, see the patch written according to the idea. There
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

I have designed the code that way to reduce duplicated code in the
generate_orderedappend_paths routine. But the main point is that I
envision these new routines may be reused elsewhere.

This feature looks сlose to the one we discussed before [1]/messages/by-id/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA@mail.gmail.com. So, let me
CC the people from the previous conversation to the discussion.

[1]: /messages/by-id/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA@mail.gmail.com
/messages/by-id/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA@mail.gmail.com

--
regards, Andrei Lepikhov

Attachments:

v0-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/x-patch; charset=UTF-8; name=v0-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+255-99
#7Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Andrei Lepikhov (#6)
Re: MergeAppend could consider sorting cheapest child path

Andrei Lepikhov писал(а) 2025-04-24 16:01:

On 3/28/25 09:19, Alexander Pyhalov wrote:

Andy Fan писал(а) 2024-10-17 03:33:
I've updated patch. One more interesting case which we found - when
fractional path is selected, it still can be more expensive than
sorted cheapest total path (as we look only on indexes whith necessary
pathkeys, not on indexes which allow efficiently fetch data).
So far couldn't find artificial example, but we've seen inadequate
index selection due to this issue - instead of using index suited for
filters in where, index, suitable for sorting was selected as one
having the cheapest fractional cost.

I think it is necessary to generalise the approach a little.

Each MergeAppend subpath candidate that fits pathkeys should be
compared to the overall-optimal path + explicit Sort node. Let's label
this two-node composition as base_path. There are three cases exist:
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the
total-optimal case - cheapest_total_path and for a fractional case, we
may employ the get_cheapest_fractional_path routine to detect proper
base_path.

It may provide a more effective plan either in full, fractional and
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is
profitable to sort them instead of switching to plain Append.

In general, analysing the regression tests changed by this code, I see
that the cost model prefers a series of small sortings instead of a
single massive one. May be it will show some profit for execution time.

I am not afraid of any palpable planning overhead here because we just
do cheap cost estimation and comparison operations that don't need
additional memory allocations. The caller is responsible for building a
proper Sort node if this method is chosen as optimal.

In the attachment, see the patch written according to the idea. There
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

Hi. I'm a bit confused that
get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in
STARTUP_COST case looks only on sorting cheapest_startup_path.
Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As result
selected cheapest_startup and cheapest_fractional can be not cheapest
for startup or selecting a fraction of rows.

Consider the partition with the following access paths:

1) cheapest_startup without required pathkeys:
startup_cost: 0.42
total_cost: 4004

2) some index_path with required pathkeys:
startup_cost: 6.6
total_cost: 2000

3) cheapest_total_path:
startup_cost: 0.42
total_cost: 3.48

Here, when selecting cheapest startup subpath we'll compare costs of
index path (2) and sorted cheapest_startup (1), but will ignore sorted
cheapest_total_path (3), despite the fact that it really can be the
cheapest startup path, providing required sorting order.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Pyhalov (#7)
Re: MergeAppend could consider sorting cheapest child path

On 4/25/25 11:16, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-04-24 16:01:

On 3/28/25 09:19, Alexander Pyhalov wrote:
In the attachment, see the patch written according to the idea. There
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

Hi. I'm a bit confused that

Thanks for the participation!

get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in
STARTUP_COST case looks only on sorting cheapest_startup_path.

At first, at the moment, I don't understand why we calculate the
cheapest_startup path at all. In my opinion, after commit 6b94e7a [1,
2], the min-fractional path totally covers the case. I began this
discussion in [3]/messages/by-id/f0206ef2-6b5a-4d07-8770-cfa7cd30f685@gmail.com - maybe we need to scrutinise that issue beforehand.

Looking into the min-fractional-path + Sort, we propose a path for the
case when extracting minor part of tuples with sorting later is cheaper
than doing a massive job of non-selective index scan. You also may
imagine the case of a JOIN as a subpath: non-sorted, highly selective
parameterised NestLoop may be way more optimal than MergeJoin, which
fits the pathkeys.

Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As result
selected cheapest_startup and cheapest_fractional can be not cheapest
for startup or selecting a fraction of rows.

I don't know what you mean by that. The cheapest_total_path is
considered when we chose optimal cheapest_total path. The same works for
the fractional path - get_cheapest_fractional_path gives us the most
optimal fractional path and probes cheapest_total_path too.
As above, not sure about min-startup case for now. I can imagine
MergeAppend over sophisticated subquery: non-sorted includes highly
parameterised JOINs and the alternative (with pathkeys) includes
HashJoin, drastically increasing startup cost. It is only a theory, of
course. So, lets discover how min-startup works.

At the end, when the sorted path already estimated, we each time compare
it with previously selected pathkeys-cheapest. So, if the sorted path is
worse, we end up with the original path and don't lose anything.

[1]: /messages/by-id/e8f9ec90-546d-e948-acce-0525f3e92773@enterprisedb.com
/messages/by-id/e8f9ec90-546d-e948-acce-0525f3e92773@enterprisedb.com
[2]: /messages/by-id/1581042da8044e71ada2d6e3a51bf7bb@index.de
/messages/by-id/1581042da8044e71ada2d6e3a51bf7bb@index.de
[3]: /messages/by-id/f0206ef2-6b5a-4d07-8770-cfa7cd30f685@gmail.com
/messages/by-id/f0206ef2-6b5a-4d07-8770-cfa7cd30f685@gmail.com

--
regards, Andrei Lepikhov

#9Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#8)
Re: MergeAppend could consider sorting cheapest child path

On 4/25/25 17:13, Andrei Lepikhov wrote:

On 4/25/25 11:16, Alexander Pyhalov wrote:

Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As result
selected cheapest_startup and cheapest_fractional can be not cheapest
for startup or selecting a fraction of rows.

I don't know what you mean by that. The cheapest_total_path is
considered when we chose optimal cheapest_total path. The same works for
the fractional path - get_cheapest_fractional_path gives us the most
optimal fractional path and probes cheapest_total_path too.
As above, not sure about min-startup case for now. I can imagine
MergeAppend over sophisticated subquery: non-sorted includes highly
parameterised JOINs and the alternative (with pathkeys) includes
HashJoin, drastically increasing startup cost. It is only a theory, of
course. So, lets discover how min-startup works.

After a second thought I have caught your idea. I agree that for a
fractional path it have no sense to choose any other path except a
cheapest total one.
There are the modified patch in the attachment.

Also, to be more objective, I propose to use examples in argumentation -
something like in attached test2.sql script.

--
regards, Andrei Lepikhov

Attachments:

v1-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/x-patch; charset=UTF-8; name=v1-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+241-99
test2.sqlapplication/sql; name=test2.sqlDownload
#10Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Andrei Lepikhov (#9)
Re: MergeAppend could consider sorting cheapest child path

Andrei Lepikhov писал(а) 2025-04-29 16:52:

On 4/25/25 17:13, Andrei Lepikhov wrote:

On 4/25/25 11:16, Alexander Pyhalov wrote:

Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As
result selected cheapest_startup and cheapest_fractional can be not
cheapest for startup or selecting a fraction of rows.

I don't know what you mean by that. The cheapest_total_path is
considered when we chose optimal cheapest_total path. The same works
for the fractional path - get_cheapest_fractional_path gives us the
most optimal fractional path and probes cheapest_total_path too.
As above, not sure about min-startup case for now. I can imagine
MergeAppend over sophisticated subquery: non-sorted includes highly
parameterised JOINs and the alternative (with pathkeys) includes
HashJoin, drastically increasing startup cost. It is only a theory, of
course. So, lets discover how min-startup works.

After a second thought I have caught your idea. I agree that for a
fractional path it have no sense to choose any other path except a
cheapest total one.
There are the modified patch in the attachment.

Also, to be more objective, I propose to use examples in argumentation
- something like in attached test2.sql script.

Hi.
I've looked through new patch and found minor inconsistencies in
get_cheapest_path_for_pathkeys_ext() and
get_cheapest_fractional_path_for_pathkeys_ext().

In get_cheapest_fractional_path_for_pathkeys_ext() we check that
base_path is not NULL
path = get_cheapest_fractional_path_for_pathkeys(rel->pathlist,
pathkeys,

required_outer, fraction);

base_path = rel->cheapest_total_path;

/* Stop here if the path doesn't satisfy necessary conditions */
if (!base_path || !bms_is_subset(PATH_REQ_OUTER(base_path),
required_outer))
return path;

But it seems, base_path can't be NULL (as add_paths_to_append_rel() is
called after set_rel_pathlist() for childrels).
However, path can. Can we do these two functions
get_cheapest_path_for_pathkeys_ext() and
get_cheapest_fractional_path_for_pathkeys_ext()
more similar?

Also we check base_path for required_outer and require_parallel_safe,
but if cheapest path for pathkeys is NULL, these checks are not
performed. Luckily, they seen to be no-op anyway due to
cheapest_total->param_info == NULL and function arguments being NULL
(required_outer) and false (require_parallel_safe). Should we do
something about this? Don't know, perhaps, remove these misleading
arguments?

Now, if we return cheapest_total_path from
get_cheapest_fractional_path_for_pathkeys_ext() if cheapest paths for
pathkeys don't exist, do the following lines

/*
* If we found no path with matching
pathkeys, use the
* cheapest total path instead.
*
* XXX We might consider partially
sorted paths too (with an
* incremental sort on top). But we'd
have to build all the
* incremental paths, do the costing
etc.
*/
if (!cheapest_fractional)
cheapest_fractional =
cheapest_total;

become no-op? And we do return non-null path from
get_cheapest_fractional_path_for_pathkeys_ext(), as it seems we return
either cheapest_total_path or cheapest fractional path from
get_cheapest_fractional_path_for_pathkeys_ext().

--
Best regards,
Alexander Pyhalov,
Postgres Professional

#11Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Pyhalov (#10)
Re: MergeAppend could consider sorting cheapest child path

On 4/29/25 19:25, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-04-29 16:52:
But it seems, base_path can't be NULL

Correct. Fixed.

Also we check base_path for required_outer and require_parallel_safe,
but if cheapest path for pathkeys is NULL, these checks are not
performed.

Thanks. Fixed.

Luckily, they seen to be no-op anyway due to cheapest_total- >
param_info == NULL and function arguments being NULL (required_outer)
and false (require_parallel_safe). Should we do something about this?
Don't know, perhaps, remove these misleading arguments?

The main reason why I introduced these _ext routines was that this
schema may be used somewhere else. And in that case parameterised paths
may exist. Who knows, may be parameterised paths will be introduced for
merge append too?

become no-op? And we do return non-null path from
get_cheapest_fractional_path_for_pathkeys_ext(), as it seems we return
either cheapest_total_path or cheapest fractional path from
get_cheapest_fractional_path_for_pathkeys_ext().

Ok.

Let's check next version of the patch in the attachment.

--
regards, Andrei Lepikhov

Attachments:

v2-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/x-patch; charset=UTF-8; name=v2-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+246-105
#12Andrei Lepikhov
lepihov@gmail.com
In reply to: Andrei Lepikhov (#11)
Re: MergeAppend could consider sorting cheapest child path

On 5/5/25 13:38, Andrei Lepikhov wrote:

Let's check next version of the patch in the attachment.

Oops, I forgot some tails - see this new version.

--
regards, Andrei Lepikhov

Attachments:

v3-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/x-patch; charset=UTF-8; name=v3-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+245-105
#13Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Andrei Lepikhov (#11)
Re: MergeAppend could consider sorting cheapest child path

Andrei Lepikhov писал(а) 2025-05-05 14:38:

On 4/29/25 19:25, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-04-29 16:52:
But it seems, base_path can't be NULL

Correct. Fixed.

Also we check base_path for required_outer and require_parallel_safe,
but if cheapest path for pathkeys is NULL, these checks are not
performed.

Thanks. Fixed.

Luckily, they seen to be no-op anyway due to cheapest_total- >

param_info == NULL and function arguments being NULL (required_outer)

and false (require_parallel_safe). Should we do something about this?
Don't know, perhaps, remove these misleading arguments?

The main reason why I introduced these _ext routines was that this
schema may be used somewhere else. And in that case parameterised paths
may exist. Who knows, may be parameterised paths will be introduced for
merge append too?

become no-op? And we do return non-null path from
get_cheapest_fractional_path_for_pathkeys_ext(), as it seems we return
either cheapest_total_path or cheapest fractional path from
get_cheapest_fractional_path_for_pathkeys_ext().

Ok.

Let's check next version of the patch in the attachment.

Hi.

Both functions are a bit different:

Path *base_path = rel->cheapest_total_path;
Path *path;

path = get_cheapest_path_for_pathkeys(rel->pathlist, pathkeys,

required_outer, cost_criterion,

require_parallel_safe);

/* Stop here if the path doesn't satisfy necessary conditions */
if ((require_parallel_safe && !base_path->parallel_safe) ||
!bms_is_subset(PATH_REQ_OUTER(base_path),
required_outer))
return path;

Here comment speaks about "the path", and check is performed on
base_path, could it be misleading?

In get_cheapest_fractional_path_for_pathkeys_ext():

path = get_cheapest_fractional_path_for_pathkeys(rel->pathlist,
pathkeys,

required_outer, fraction);

base_path = rel->cheapest_total_path;

/* Stop here if the path doesn't satisfy necessary conditions */
if (!bms_is_subset(PATH_REQ_OUTER(base_path), required_outer))
return path;

Here "the path" also refers to base_path, but here at least it's the
last path we've just looked at. Should we make base_path initialization
consistent and fix comment a bit, so that there's no possible ambiguity
and it's evident that it refers to the base_path?

Also logic a bit differs if path is NULL. In
get_cheapest_path_for_pathkeys_ext() we explicitly check for path being
NULL, in get_cheapest_fractional_path_for_pathkeys_ext() only after
calculating sort cost.

I've tried to fix comments a bit and unified functions definitions.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachments:

v4-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/x-diff; name=v4-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+250-105
#14Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Pyhalov (#13)
Re: MergeAppend could consider sorting cheapest child path

On 5/5/2025 15:56, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-05 14:38:
Also logic a bit differs if path is NULL. In
get_cheapest_path_for_pathkeys_ext() we explicitly check for path being
NULL, in get_cheapest_fractional_path_for_pathkeys_ext() only after
calculating sort cost.

I've tried to fix comments a bit and unified functions definitions.

Generally seems ok, I'm not a native speaker to judge the comments. But:
if (base_path && path != base_path)

What is the case in your mind where the base_path pointer still may be
null at that point?

--
regards, Andrei Lepikhov

#15Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Andrei Lepikhov (#14)
Re: MergeAppend could consider sorting cheapest child path

Andrei Lepikhov писал(а) 2025-05-07 08:02:

On 5/5/2025 15:56, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-05 14:38:
Also logic a bit differs if path is NULL. In
get_cheapest_path_for_pathkeys_ext() we explicitly check for path
being NULL, in get_cheapest_fractional_path_for_pathkeys_ext() only
after calculating sort cost.

I've tried to fix comments a bit and unified functions definitions.

Generally seems ok, I'm not a native speaker to judge the comments.
But:
if (base_path && path != base_path)

What is the case in your mind where the base_path pointer still may be
null at that point?

Hi.

It seems if some childrel doesn't have valid pathlist, subpaths_valid
would be false in add_paths_to_append_rel()
and generate_orderedappend_paths() will not be called. So when we
iterate over live_childrels,
all of them will have cheapest_total path. This is why we can assert
that base_path is not NULL.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachments:

v5-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/x-diff; name=v5-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+264-105
#16Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Pyhalov (#15)
Re: MergeAppend could consider sorting cheapest child path

On 7/5/2025 08:57, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-07 08:02:

On 5/5/2025 15:56, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-05 14:38:
Also logic a bit differs if path is NULL. In
get_cheapest_path_for_pathkeys_ext() we explicitly check for path
being NULL, in get_cheapest_fractional_path_for_pathkeys_ext() only
after calculating sort cost.

I've tried to fix comments a bit and unified functions definitions.

Generally seems ok, I'm not a native speaker to judge the comments. But:
if (base_path && path != base_path)

What is the case in your mind where the base_path pointer still may be
null at that point?

Hi.

It seems if some childrel doesn't have valid pathlist, subpaths_valid
would be false in add_paths_to_append_rel()
and generate_orderedappend_paths() will not  be called. So when we
iterate over live_childrels,
all of them will have cheapest_total path. This is why we can assert
that base_path is not NULL.

I'm not sure I understand you correctly. Under which conditions will
rel->cheapest_total_path be set to NULL for a childrel? Could you
provide an example?

--
regards, Andrei Lepikhov

#17Alexander Pyhalov
a.pyhalov@postgrespro.ru
In reply to: Andrei Lepikhov (#16)
Re: MergeAppend could consider sorting cheapest child path

Andrei Lepikhov писал(а) 2025-05-07 12:03:

On 7/5/2025 08:57, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-07 08:02:

On 5/5/2025 15:56, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-05 14:38:
Also logic a bit differs if path is NULL. In
get_cheapest_path_for_pathkeys_ext() we explicitly check for path
being NULL, in get_cheapest_fractional_path_for_pathkeys_ext() only
after calculating sort cost.

I've tried to fix comments a bit and unified functions definitions.

Generally seems ok, I'm not a native speaker to judge the comments.
But:
if (base_path && path != base_path)

What is the case in your mind where the base_path pointer still may
be null at that point?

Hi.

It seems if some childrel doesn't have valid pathlist, subpaths_valid
would be false in add_paths_to_append_rel()
and generate_orderedappend_paths() will not  be called. So when we
iterate over live_childrels,
all of them will have cheapest_total path. This is why we can assert
that base_path is not NULL.

I'm not sure I understand you correctly. Under which conditions will
rel->cheapest_total_path be set to NULL for a childrel? Could you
provide an example?

Sorry, perhaps I was not clear enough. I've stated the opposite - it
seems we can be sure that it's not NULL.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

#18Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Pyhalov (#17)
Re: MergeAppend could consider sorting cheapest child path

Hi, Alexander!

On Wed, May 7, 2025 at 12:06 PM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:

Andrei Lepikhov писал(а) 2025-05-07 12:03:

On 7/5/2025 08:57, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-07 08:02:

On 5/5/2025 15:56, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-05-05 14:38:
Also logic a bit differs if path is NULL. In
get_cheapest_path_for_pathkeys_ext() we explicitly check for path
being NULL, in get_cheapest_fractional_path_for_pathkeys_ext() only
after calculating sort cost.

I've tried to fix comments a bit and unified functions definitions.

Generally seems ok, I'm not a native speaker to judge the comments.
But:
if (base_path && path != base_path)

What is the case in your mind where the base_path pointer still may
be null at that point?

Hi.

It seems if some childrel doesn't have valid pathlist, subpaths_valid
would be false in add_paths_to_append_rel()
and generate_orderedappend_paths() will not be called. So when we
iterate over live_childrels,
all of them will have cheapest_total path. This is why we can assert
that base_path is not NULL.

I'm not sure I understand you correctly. Under which conditions will
rel->cheapest_total_path be set to NULL for a childrel? Could you
provide an example?

Sorry, perhaps I was not clear enough. I've stated the opposite - it
seems we can be sure that it's not NULL.

Thank you for your work on this subject!

I have the following question. I see patch changes some existing
plans from Sort(Append(...)) to MergeAppend(Sort(), ..., Sort(...)) or
even Materialize(MergeAppend(Sort(), ..., Sort(...))). This should be
some problem in cost_sort(). Otherwise, that would mean that Sort
node doesn't know how to do its job: explicit splitting dataset into
pieces then merging sorting result appears to be cheaper, but Sort
node contains merge-sort algorithm inside and it's supposed to be more
efficient. Could you, please, revise the patch to avoid these
unwanted changes?

------
Regards,
Alexander Korotkov
Supabase

#19Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#18)
Re: MergeAppend could consider sorting cheapest child path

On 2/6/2025 20:21, Alexander Korotkov wrote:

I have the following question. I see patch changes some existing
plans from Sort(Append(...)) to MergeAppend(Sort(), ..., Sort(...)) or
even Materialize(MergeAppend(Sort(), ..., Sort(...))). This should be
some problem in cost_sort(). Otherwise, that would mean that Sort
node doesn't know how to do its job: explicit splitting dataset into
pieces then merging sorting result appears to be cheaper, but Sort
node contains merge-sort algorithm inside and it's supposed to be more
efficient. Could you, please, revise the patch to avoid these
unwanted changes?

I think, this issue is related to corner-cases of the
compare_path_costs_fuzzily.

Let's glance into one of the problematic queries:
EXPLAIN (COSTS ON)
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C"
ORDER BY 1;

if you play with the plan, you can find that total_cost of the
Sort->Append path is cheaper:

Sort (cost=2.40..2.41 rows=4 width=40)
-> Append (cost=1.15..2.36 rows=4 width=40)
Merge Append (cost=2.37..2.42 rows=4 width=40)

But the difference is less than fuzz_factor. In this case, Postgres
probes startup_cost, which is obviously less for the MergeAppend strategy.
This is a good decision, and I think it should stay as is.
What can we do here? We might change the test to increase the cost gap.
However, while designing this patch, I skimmed through each broken query
and didn't find a reason to specifically shift to the Sort->Append
strategy, as it tested things that were not dependent on Append or Sort.

To establish a stable foundation for discussion, I conducted simple
tests - see, for example, a couple of queries in the attachment. As I
see it, Sort->Append works faster: in my test bench, it takes 1250ms on
average versus 1430ms, and it also has lower costs - the same for data
with and without massive numbers of duplicates. Playing with sizes of
inputs, I see the same behaviour.

--
regards, Andrei Lepikhov

Attachments:

merge_sort.sqlapplication/sql; name=merge_sort.sqlDownload
v6-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchtext/plain; charset=UTF-8; name=v6-0001-Consider-explicit-sort-of-the-MergeAppend-subpath.patchDownload+264-105
#20Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#19)
Re: MergeAppend could consider sorting cheapest child path

On Tue, Jun 3, 2025 at 4:23 PM Andrei Lepikhov <lepihov@gmail.com> wrote:

On 2/6/2025 20:21, Alexander Korotkov wrote:

I have the following question. I see patch changes some existing
plans from Sort(Append(...)) to MergeAppend(Sort(), ..., Sort(...)) or
even Materialize(MergeAppend(Sort(), ..., Sort(...))). This should be
some problem in cost_sort(). Otherwise, that would mean that Sort
node doesn't know how to do its job: explicit splitting dataset into
pieces then merging sorting result appears to be cheaper, but Sort
node contains merge-sort algorithm inside and it's supposed to be more
efficient. Could you, please, revise the patch to avoid these
unwanted changes?

I think, this issue is related to corner-cases of the
compare_path_costs_fuzzily.

Let's glance into one of the problematic queries:
EXPLAIN (COSTS ON)
SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C"
ORDER BY 1;

if you play with the plan, you can find that total_cost of the
Sort->Append path is cheaper:

Sort (cost=2.40..2.41 rows=4 width=40)
-> Append (cost=1.15..2.36 rows=4 width=40)
Merge Append (cost=2.37..2.42 rows=4 width=40)

But the difference is less than fuzz_factor. In this case, Postgres
probes startup_cost, which is obviously less for the MergeAppend strategy.
This is a good decision, and I think it should stay as is.
What can we do here? We might change the test to increase the cost gap.
However, while designing this patch, I skimmed through each broken query
and didn't find a reason to specifically shift to the Sort->Append
strategy, as it tested things that were not dependent on Append or Sort.

To establish a stable foundation for discussion, I conducted simple
tests - see, for example, a couple of queries in the attachment. As I
see it, Sort->Append works faster: in my test bench, it takes 1250ms on
average versus 1430ms, and it also has lower costs - the same for data
with and without massive numbers of duplicates. Playing with sizes of
inputs, I see the same behaviour.

I run your tests. For Sort(Append()) case I've got actual
time=811.047..842.473. For MergeAppend case I've got actual time
actual time=723.678..967.004. That looks interesting. At some point
we probably should teach our Sort node to start returning tuple before
finishing the last merge stage.

However, I think costs are not adequate to the timing. Our cost model
predicts that startup cost of MergeAppend is less than startup cost of
Sort(Append()). And that's correct. However, in fast total time of
MergeAppend is bigger than total time of Sort(Append()). The
differences in these two cases are comparable. I think we need to
just our cost_sort() to reflect that.

------
Regards,
Alexander Korotkov
Supabase

#21Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#20)
#22Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#21)
#23Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#22)
#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#23)
#25Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#24)
#26Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#24)
#27Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#26)
#28Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#27)
#29Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#28)
#30Richard Guo
guofenglinux@gmail.com
In reply to: Alexander Korotkov (#29)
#31Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#30)
#32Andrei Lepikhov
lepihov@gmail.com
In reply to: Alexander Korotkov (#29)
#33Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrei Lepikhov (#32)
#34Richard Guo
guofenglinux@gmail.com
In reply to: Alexander Korotkov (#33)
#35Alexander Korotkov
aekorotkov@gmail.com
In reply to: Richard Guo (#34)
#36Alexander Korotkov
aekorotkov@gmail.com
In reply to: Richard Guo (#34)
#37Alena Rybakina
a.rybakina@postgrespro.ru
In reply to: Alexander Korotkov (#33)
#38Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alena Rybakina (#37)
#39Richard Guo
guofenglinux@gmail.com
In reply to: Alexander Korotkov (#36)
#40Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#39)
#41David Rowley
dgrowleyml@gmail.com
In reply to: Richard Guo (#40)
#42Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#41)
#43David Rowley
dgrowleyml@gmail.com
In reply to: Andrei Lepikhov (#42)
#44Andrei Lepikhov
lepihov@gmail.com
In reply to: David Rowley (#43)