Todo: Teach planner to evaluate multiple windows in the optimal order
Hi,
While looking at one of the todo item in Window function, namely:
/Teach planner to evaluate multiple windows in the optimal order
Currently windows are always evaluated in the query-specified order./
From threads, relevant points.
Point #1
In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.
and Point #2
Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may
be no
index to help sort the first window, but perhaps there are for other
windows
in the query. This may allow an index scan instead of a seqscan -> sort.
Repro:
select pg_catalog.version();
/version //
//----------------------------------------------------------------------------------------------------//
// PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
12.2.0-3ubuntu1) 12.2.0, 64-bit//
//(1 row)/
create table empsalary(depname text, empno int, salary int);
insert into empsalary values (select substr(md5(random()::text), 0, 25),
generate_series(1,10), generate_series(10000,12000));
explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary)
OVER (ORDER BY empno) FROM empsalary ORDER BY salary;
/ QUERY PLAN //
//--------------------------------------------------------------------------------------//
// WindowAgg (cost=289.47..324.48 rows=2001 width=49)//
// -> Sort (cost=289.47..294.47 rows=2001 width=41)//
// Sort Key: salary//
// -> WindowAgg (cost=144.73..179.75 rows=2001 width=41)//
// -> Sort (cost=144.73..149.73 rows=2001 width=33)//
// Sort Key: empno//
// -> Seq Scan on empsalary (cost=0.00..35.01
rows=2001 width=33)//
//(7 rows)/
As it is seen, for case #1, issue looks like resolved and only 2 sorts
are performed.
For #2, index col ordering is changed.
create index idx_emp on empsalary (empno);
explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary)
OVER (ORDER BY empno) FROM empsalary ORDER BY salary;
/ QUERY PLAN //
//------------------------------------------------------------------------------------------------//
// WindowAgg (cost=204.03..239.04 rows=2001 width=49)//
// -> Sort (cost=204.03..209.03 rows=2001 width=41)//
// Sort Key: salary//
// -> WindowAgg (cost=0.28..94.31 rows=2001 width=41)//
// -> Index Scan using idx_emp on empsalary
(cost=0.28..64.29 rows=2001 width=33)//
//(5 rows)/
explain SELECT depname, SUM(salary) OVER (ORDER BY empno), SUM(salary)
OVER (ORDER BY salary) FROM empsalary ORDER BY salary;
/ QUERY PLAN //
//------------------------------------------------------------------------------------------------//
// WindowAgg (cost=204.03..239.04 rows=2001 width=49)//
// -> Sort (cost=204.03..209.03 rows=2001 width=41)//
// Sort Key: salary//
// -> WindowAgg (cost=0.28..94.31 rows=2001 width=41)//
// -> Index Scan using idx_emp on empsalary
(cost=0.28..64.29 rows=2001 width=33)//
//(5 rows)/
In both cases, index scan is performed, which means this issue is
resolved as well.
Is this todo still relevant?
Further down the threads:
I do think the patch has probably left some low-hanging fruit on the
simpler end of the difficulty spectrum, namely when the window stuff
requires only one ordering that could be done either explicitly or
by an indexscan. That choice should ideally be done with a proper
cost comparison taking any LIMIT into account. I think right now
the LIMIT might not be accounted for, or might be considered even
when it shouldn't be because another sort is needed anyway.
I am not sure if I understand this fully but does it means proposed todo
(to use indexes) should be refined to
teach planner to take into account of cost model while deciding to use
index or not in window functions? Meaning not always go with index route
(modify point #2)?
--
Regards,
Ankit Kumar Pandey
On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Point #1
In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.
This shouldn't be too hard to do. See the code in
select_active_windows(). You'll likely want to pay attention to the
DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
the query has no DISTINCT clause. DISTINCT is evaluated after Window
and before ORDER BY.
One idea to implement this would be to adjust the loop in
select_active_windows() so that we record any WindowClauses which have
the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
those separately and append those onto the end of the actives array
after the sort.
I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last. If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort. The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.
You might also think that we could perform additional optimisations
and also adjust the ORDER BY clause of a WindowClause if it contains
the pathkeys of the DISTINCT / ORDER BY clause. For example:
SELECT *,row_number() over (order by a,b) from tab order by a,b,c;
However, if you were to adjust the WindowClauses ORDER BY to become
a,b,c then you could produce incorrect results for window functions
that change their result based on peer rows.
Note the difference in results from:
create table ab(a int, b int);
insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;
select a,b,count(*) over (order by a) from ab order by a,b;
select a,b,count(*) over (order by a,b) from ab order by a,b;
and Point #2
Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may be no
index to help sort the first window, but perhaps there are for other windows
in the query. This may allow an index scan instead of a seqscan -> sort.
What Tom wrote about that in the first paragraph of [1]/messages/by-id/11535.1230501658@sss.pgh.pa.us still applies.
The problem is that if the query contains many joins that to properly
find the cheapest way of executing the query we'd have to perform the
join search once for each unique sort order of each WindowClause.
That's just not practical to do from a performance standpoint. The
join search can be very expensive. There may be something that could
be done to better determine the most likely candidate for the first
WindowClause using some heuristics, but I've no idea what those would
be. You should look into point #1 first. Point #2 is significantly
more difficult to solve in a way that would be acceptable to the
project.
David
On 03/01/23 08:21, David Rowley wrote:
On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Point #1
In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
sorts. We also perform 3.This shouldn't be too hard to do. See the code in
select_active_windows(). You'll likely want to pay attention to the
DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
the query has no DISTINCT clause. DISTINCT is evaluated after Window
and before ORDER BY.One idea to implement this would be to adjust the loop in
select_active_windows() so that we record any WindowClauses which have
the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
those separately and append those onto the end of the actives array
after the sort.I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last. If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort. The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.You might also think that we could perform additional optimisations
and also adjust the ORDER BY clause of a WindowClause if it contains
the pathkeys of the DISTINCT / ORDER BY clause. For example:SELECT *,row_number() over (order by a,b) from tab order by a,b,c;
However, if you were to adjust the WindowClauses ORDER BY to become
a,b,c then you could produce incorrect results for window functions
that change their result based on peer rows.Note the difference in results from:
create table ab(a int, b int);
insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;select a,b,count(*) over (order by a) from ab order by a,b;
select a,b,count(*) over (order by a,b) from ab order by a,b;
Thanks, let me try this.
and Point #2
Teach planner to decide which window to evaluate first based on costs.
Currently the first window in the query is evaluated first, there may be no
index to help sort the first window, but perhaps there are for other windows
in the query. This may allow an index scan instead of a seqscan -> sort.What Tom wrote about that in the first paragraph of [1] still applies.
The problem is that if the query contains many joins that to properly
find the cheapest way of executing the query we'd have to perform the
join search once for each unique sort order of each WindowClause.
That's just not practical to do from a performance standpoint. The
join search can be very expensive. There may be something that could
be done to better determine the most likely candidate for the first
WindowClause using some heuristics, but I've no idea what those would
be. You should look into point #1 first. Point #2 is significantly
more difficult to solve in a way that would be acceptable to the
project.
Okay, leaving this out for now.
--
Regards,
Ankit Kumar Pandey
Hi,
On 03/01/23 08:21, David Rowley wrote:
I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last. If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort. The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.
I think current implementation does exactly this.
#1. If order by clause in the window function is subset of order by in query
create table abcd(a int, b int, c int, d int);
insert into abcd select x,y,z,c from generate_series(1,5) x, generate_Series(1,5)y, generate_Series(1,5) z, generate_Series(1,5) c;
explain analyze select a,row_number() over (order by b),count(*) over (order by a,b) from abcd order by a,b,c;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
--------
Incremental Sort (cost=80.32..114.56 rows=625 width=28) (actual time=1.440..3.311 rows=625 loops=1)
Sort Key: a, b, c
Presorted Key: a, b
Full-sort Groups: 13 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
-> WindowAgg (cost=79.24..91.74 rows=625 width=28) (actual time=1.272..2.567 rows=625 loops=1)
-> Sort (cost=79.24..80.80 rows=625 width=20) (actual time=1.233..1.296 rows=625 loops=1)
Sort Key: a, b
Sort Method: quicksort Memory: 64kB
-> WindowAgg (cost=39.27..50.21 rows=625 width=20) (actual time=0.304..0.786 rows=625 loops=1)
-> Sort (cost=39.27..40.84 rows=625 width=12) (actual time=0.300..0.354 rows=625 loops=1)
Sort Key: b
Sort Method: quicksort Memory: 54kB
-> Seq Scan on abcd (cost=0.00..10.25 rows=625 width=12) (actual time=0.021..0.161 rows=625 l
oops=1)
Planning Time: 0.068 ms
Execution Time: 3.509 ms
(15 rows)
Here, as window function (row count) has two cols a, b for order by,
incremental sort is performed for remaining col in query,
which makes sense.
#2. If order by clause in the Window function is superset of order by in
query
explain analyze select a,row_number() over (order by a,b,c),count(*) over (order by a,b) from abcd order by a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=39.27..64.27 rows=625 width=28) (actual time=1.089..3.020 rows=625 loops=1)
-> WindowAgg (cost=39.27..53.34 rows=625 width=20) (actual time=1.024..1.635 rows=625 loops=1)
-> Sort (cost=39.27..40.84 rows=625 width=12) (actual time=1.019..1.084 rows=625 loops=1)
Sort Key: a, b, c
Sort Method: quicksort Memory: 54kB
-> Seq Scan on abcd (cost=0.00..10.25 rows=625 width=12) (actual time=0.023..0.265 rows=625 loops=1)
Planning Time: 0.071 ms
Execution Time: 3.156 ms
(8 rows)
No, additional sort is needed to be performed in this case, as you referred.
On 03/01/23 08:21, David Rowley wrote:
If they're a superset then we won't need to perform any additional
ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort.
So question is, how current implementation is different from desired one?
--
Regards,
Ankit Kumar Pandey
On Wed, 4 Jan 2023 at 03:11, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
#2. If order by clause in the Window function is superset of order by in query
explain analyze select a,row_number() over (order by a,b,c),count(*) over (order by a,b) from abcd order by a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=39.27..64.27 rows=625 width=28) (actual time=1.089..3.020 rows=625 loops=1)
-> WindowAgg (cost=39.27..53.34 rows=625 width=20) (actual time=1.024..1.635 rows=625 loops=1)
-> Sort (cost=39.27..40.84 rows=625 width=12) (actual time=1.019..1.084 rows=625 loops=1)
Sort Key: a, b, c
Sort Method: quicksort Memory: 54kB
-> Seq Scan on abcd (cost=0.00..10.25 rows=625 width=12) (actual time=0.023..0.265 rows=625 loops=1)
Planning Time: 0.071 ms
Execution Time: 3.156 ms
(8 rows)No, additional sort is needed to be performed in this case, as you referred.
It looks like that works by accident. I see no mention of this either
in the comments or in [1]/messages/by-id/124A7F69-84CD-435B-BA0E-2695BE21E5C2@yesql.se. What seems to be going on is that
common_prefix_cmp() is coded in such a way that the WindowClauses end
up ordered by the highest tleSortGroupRef first, resulting in the
lowest order tleSortGroupRefs being the last WindowAgg to be
processed. We do transformSortClause() before
transformWindowDefinitions(), this is where the tleSortGroupRef
indexes are assigned, so the ORDER BY clause will have a lower
tleSortGroupRef than the WindowClauses.
If we don't have one already, then we should likely add a regression
test that ensures that this remains true. Since it does not seem to
be documented in the code anywhere, it seems like something that could
easily be overlooked if we were to ever refactor that code.
I just tried moving the calls to transformWindowDefinitions() so that
they come before transformSortClause() and our regression tests still
pass. That's not great.
With that change, the following query has an additional sort for the
ORDER BY clause which previously wasn't done.
explain select a,b,c,row_number() over (order by a) rn1, row_number()
over(partition by b) rn2, row_number() over (order by c) from abc
order by b;
David
[1]: /messages/by-id/124A7F69-84CD-435B-BA0E-2695BE21E5C2@yesql.se
On 04/01/23 09:32, David Rowley wrote:
It looks like that works by accident. I see no mention of this either
in the comments or in [1].
This kind of troubles me because function name
/select_active_windows///doesn't tell me if its only job is
to reorder window clauses for optimizing sort. From code, I don't see it
doing anything else either.
If we don't have one already, then we should likely add a regression
test that ensures that this remains true. Since it does not seem to
be documented in the code anywhere, it seems like something that could
easily be overlooked if we were to ever refactor that code.
I don't see any tests in windows specific to sorting operation (and in
what order). I will add those.
Also, one thing, consider the following query:
explain analyze select row_number() over (order by a,b),count(*) over
(order by a) from abcd order by a,b,c;
In this case, sorting is done on (a,b) followed by incremental sort on c
at final stage.
If we do just one sort: a,b,c at first stage then there won't be need to
do another sort (incremental one).
Now, I am not sure if which one would be faster: sorting (a,b,c) vs
sort(a,b) + incremental sort(c)
because even though datum sort is fast, there can be n number of combos
where we won't be doing that.
I might be looking at extreme corner cases though but still wanted to share.
--
Regards,
Ankit Kumar Pandey
Attaching test cases for this (+ small change in doc).
Tested this in one of WIP branch where I had modified
select_active_windows and it failed
as expected.
Please let me know if something can be improved in this.
Regards,
Ankit Kumar Pandey
Attachments:
v1-0001-add-test-cases-for-optimal-ordering-of-window-functi.patchtext/x-patch; charset=UTF-8; name=v1-0001-add-test-cases-for-optimal-ordering-of-window-functi.patchDownload
From 7647759eb92e1a0560bcff73b4169be8694f83d8 Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey <itsankitkp@gmail.com>
Date: Wed, 4 Jan 2023 19:57:57 +0530
Subject: [PATCH] Add test cases for optimal ordering of window function order
by clauses
In case of multiple orderby clauses for window function, it is desired
to have minimum sort operations. This is already implemented in code but
corresponding test cases were missing.
This patch adds test cases covering various cases where order by clause
get optimized and some additional comments in function which implements
this feature.
---
src/backend/optimizer/plan/planner.c | 5 +
src/test/regress/expected/window.out | 140 +++++++++++++++++++++++++++
src/test/regress/sql/window.sql | 51 ++++++++++
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6ba7589f3..e3989a7b44 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5592,6 +5592,11 @@ optimize_window_clauses(PlannerInfo *root, WindowFuncLists *wflists)
* select_active_windows
* Create a list of the "active" window clauses (ie, those referenced
* by non-deleted WindowFuncs) in the order they are to be executed.
+ * Window clauses are ordered by the highest tleSortGroupRef first,
+ * resulting in the lowest order tleSortGroupRefs being the last WindowAgg to be
+ * processed. This reduces requirement for sort in lower order WindowAgg
+ * as it is quite likely that required column is already sorted and if not,
+ * there is possibility of incremental sort.
*/
static List *
select_active_windows(PlannerInfo *root, WindowFuncLists *wflists)
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 5505e9a2da..71d0cff587 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -4733,3 +4733,143 @@ SELECT * FROM pg_temp.f(2);
{5}
(5 rows)
+-- test optimal ordering for minimal sort operations
+CREATE temp TABLE abcd (a int, b int, c int, d int);
+INSERT INTO abcd
+ SELECT p,q,r,s FROM
+ generate_series(1,5) p,
+ generate_Series(1,5) q,
+ generate_Series(1,5) r,
+ generate_Series(1,5) s;
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY b),
+ count(*) OVER (ORDER BY a,b)
+ FROM abcd ORDER BY a,b,c;
+ QUERY PLAN
+------------------------------------------------
+ Incremental Sort
+ Sort Key: a, b, c
+ Presorted Key: a, b
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, b
+ -> WindowAgg
+ -> Sort
+ Sort Key: b
+ -> Seq Scan on abcd
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a, b),
+ count(*) OVER (ORDER BY a)
+ FROM abcd ORDER BY a,b,c;
+ QUERY PLAN
+------------------------------------------
+ Incremental Sort
+ Sort Key: a, b, c
+ Presorted Key: a, b
+ -> WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, b
+ -> Seq Scan on abcd
+(8 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,b,c)
+ FROM abcd ORDER BY a,b;
+ QUERY PLAN
+------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, b, c
+ -> Seq Scan on abcd
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,c),
+ count(*) OVER (ORDER BY a,b)
+ FROM abcd ORDER BY a,d;
+ QUERY PLAN
+------------------------------------------------------
+ Incremental Sort
+ Sort Key: a, d
+ Presorted Key: a
+ -> WindowAgg
+ -> WindowAgg
+ -> Incremental Sort
+ Sort Key: a, c
+ Presorted Key: a
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, b
+ -> Seq Scan on abcd
+(12 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,b),
+ count(*) OVER (ORDER BY a,c)
+ FROM abcd ORDER BY a,d;
+ QUERY PLAN
+------------------------------------------------------
+ Incremental Sort
+ Sort Key: a, d
+ Presorted Key: a
+ -> WindowAgg
+ -> WindowAgg
+ -> Incremental Sort
+ Sort Key: a, b
+ Presorted Key: a
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, c
+ -> Seq Scan on abcd
+(12 rows)
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,b),
+ count(*) OVER (ORDER BY a,c)
+ FROM abcd ORDER BY a,b,c;
+ QUERY PLAN
+------------------------------------------------------
+ Incremental Sort
+ Sort Key: a, b, c
+ Presorted Key: a, b
+ -> WindowAgg
+ -> WindowAgg
+ -> Incremental Sort
+ Sort Key: a, b
+ Presorted Key: a
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, c
+ -> Seq Scan on abcd
+(12 rows)
+
+EXPLAIN (COSTS OFF) SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,c),
+ count(*) OVER (ORDER BY a,b)
+ FROM abcd ORDER BY a,b,c;
+ QUERY PLAN
+------------------------------------------------------
+ Incremental Sort
+ Sort Key: a, b, c
+ Presorted Key: a, b
+ -> WindowAgg
+ -> WindowAgg
+ -> Incremental Sort
+ Sort Key: a, b
+ Presorted Key: a
+ -> WindowAgg
+ -> Sort
+ Sort Key: a, c
+ -> Seq Scan on abcd
+(12 rows)
+
+-- cleanup
+DROP TABLE abcd;
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index 10f450fee4..f17b8cedfd 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1673,3 +1673,54 @@ $$ LANGUAGE SQL STABLE;
EXPLAIN (costs off) SELECT * FROM pg_temp.f(2);
SELECT * FROM pg_temp.f(2);
+
+-- test optimal ordering for minimal sort operations
+
+CREATE temp TABLE abcd (a int, b int, c int, d int);
+INSERT INTO abcd
+ SELECT p,q,r,s FROM
+ generate_series(1,5) p,
+ generate_Series(1,5) q,
+ generate_Series(1,5) r,
+ generate_Series(1,5) s;
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY b),
+ count(*) OVER (ORDER BY a,b)
+ FROM abcd ORDER BY a,b,c;
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a, b),
+ count(*) OVER (ORDER BY a)
+ FROM abcd ORDER BY a,b,c;
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,b,c)
+ FROM abcd ORDER BY a,b;
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,c),
+ count(*) OVER (ORDER BY a,b)
+ FROM abcd ORDER BY a,d;
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,b),
+ count(*) OVER (ORDER BY a,c)
+ FROM abcd ORDER BY a,d;
+
+EXPLAIN (COSTS OFF)
+ SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,b),
+ count(*) OVER (ORDER BY a,c)
+ FROM abcd ORDER BY a,b,c;
+
+EXPLAIN (COSTS OFF) SELECT row_number() OVER (ORDER BY a),
+ count(*) OVER (ORDER BY a,c),
+ count(*) OVER (ORDER BY a,b)
+ FROM abcd ORDER BY a,b,c;
+
+-- cleanup
+DROP TABLE abcd;
--
2.37.2
On 05/01/23 07:48, Vik Fearing wrote:
On 1/4/23 13:07, Ankit Kumar Pandey wrote:
Also, one thing, consider the following query:
explain analyze select row_number() over (order by a,b),count(*) over
(order by a) from abcd order by a,b,c;In this case, sorting is done on (a,b) followed by incremental sort
on c at final stage.If we do just one sort: a,b,c at first stage then there won't be need
to do another sort (incremental one).This could give incorrect results. Consider the following query:
postgres=# select a, b, c, rank() over (order by a, b)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;a | b | c | rank
---+---+---+------
1 | 2 | 1 | 1
1 | 2 | 1 | 1
1 | 2 | 2 | 1
(3 rows)If you change the window's ordering like you suggest, you get this
different result:postgres=# select a, b, c, rank() over (order by a, b, c)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;a | b | c | rank
---+---+---+------
1 | 2 | 1 | 1
1 | 2 | 1 | 1
1 | 2 | 2 | 3
(3 rows)
We are already doing something like I mentioned.
Consider this example:
explain SELECT rank() OVER (ORDER BY a), count(*) OVER (ORDER BY a,b)
FROM abcd;
QUERY PLAN
--------------------------------------------------------------------------
WindowAgg (cost=83.80..127.55 rows=1250 width=24)
-> WindowAgg (cost=83.80..108.80 rows=1250 width=16)
-> Sort (cost=83.80..86.92 rows=1250 width=8)
Sort Key: a, b
-> Seq Scan on abcd (cost=0.00..19.50 rows=1250 width=8)
(5 rows)
If it is okay to do extra sort for first window function (rank) here,
why would it be
any different in case which I mentioned?
My suggestion rest on assumption that for a window function, say
rank() OVER (ORDER BY a), ordering of columns (other than column 'a')
shouldn't matter.
--
Regards,
Ankit Kumar Pandey
Import Notes
Reply to msg id not found: 441d135e-1941-c3ef-1649-18c3e8811549@postgresfriends.org
On 1/4/23 13:07, Ankit Kumar Pandey wrote:
Also, one thing, consider the following query:
explain analyze select row_number() over (order by a,b),count(*) over
(order by a) from abcd order by a,b,c;In this case, sorting is done on (a,b) followed by incremental sort on c
at final stage.If we do just one sort: a,b,c at first stage then there won't be need to
do another sort (incremental one).
This could give incorrect results. Consider the following query:
postgres=# select a, b, c, rank() over (order by a, b)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;
a | b | c | rank
---+---+---+------
1 | 2 | 1 | 1
1 | 2 | 1 | 1
1 | 2 | 2 | 1
(3 rows)
If you change the window's ordering like you suggest, you get this
different result:
postgres=# select a, b, c, rank() over (order by a, b, c)
from (values (1, 2, 1), (1, 2, 2), (1, 2, 1)) as abcd (a, b, c)
order by a, b, c;
a | b | c | rank
---+---+---+------
1 | 2 | 1 | 1
1 | 2 | 1 | 1
1 | 2 | 2 | 3
(3 rows)
--
Vik Fearing
On Thu, 5 Jan 2023 at 15:18, Vik Fearing <vik@postgresfriends.org> wrote:
On 1/4/23 13:07, Ankit Kumar Pandey wrote:
Also, one thing, consider the following query:
explain analyze select row_number() over (order by a,b),count(*) over
(order by a) from abcd order by a,b,c;In this case, sorting is done on (a,b) followed by incremental sort on c
at final stage.If we do just one sort: a,b,c at first stage then there won't be need to
do another sort (incremental one).This could give incorrect results.
Yeah, this seems to be what I warned against in [1]/messages/by-id/CAApHDvp=r1LnEKCmWCYaruMPL-jP4j_sdc8yeFYwaDT1ac5GsQ@mail.gmail.com.
If we wanted to make that work we'd need to do it without adjusting
the WindowClause's orderClause so that the peer row checks still
worked correctly in nodeWindowAgg.c.
Additionally, it's also not that clear to me that sorting by more
columns in the sort below the WindowAgg would always be a win over
doing the final sort for the ORDER BY. What if the WHERE clause (that
could not be pushed down before a join) filtered out the vast majority
of the rows before the ORDER BY. It might be cheaper to do the sort
then than to sort by the additional columns earlier.
David
[1]: /messages/by-id/CAApHDvp=r1LnEKCmWCYaruMPL-jP4j_sdc8yeFYwaDT1ac5GsQ@mail.gmail.com
Vik Fearing <vik@postgresfriends.org> writes:
On 1/4/23 13:07, Ankit Kumar Pandey wrote:
Also, one thing, consider the following query:
explain analyze select row_number() over (order by a,b),count(*) over
(order by a) from abcd order by a,b,c;
In this case, sorting is done on (a,b) followed by incremental sort on c
at final stage.
If we do just one sort: a,b,c at first stage then there won't be need to
do another sort (incremental one).
This could give incorrect results.
Mmmm ... your counterexample doesn't really prove that. Yes,
the "rank()" step must consider only two ORDER BY columns while
deciding which rows are peers, but I don't see why it wouldn't
be okay if the rows happened to already be sorted by "c" within
those peer groups.
I don't recall the implementation details well enough to be sure
how hard it would be to keep that straight.
regards, tom lane
David Rowley <dgrowleyml@gmail.com> writes:
Additionally, it's also not that clear to me that sorting by more
columns in the sort below the WindowAgg would always be a win over
doing the final sort for the ORDER BY. What if the WHERE clause (that
could not be pushed down before a join) filtered out the vast majority
of the rows before the ORDER BY. It might be cheaper to do the sort
then than to sort by the additional columns earlier.
That's certainly a legitimate question to ask, but I don't quite see
where you figure we'd be sorting more rows? WHERE filtering happens
before window functions, which never eliminate any rows. So it seems
like a sort just before the window functions must sort the same number
of rows as one just after them.
regards, tom lane
On Thu, 5 Jan 2023 at 16:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
Additionally, it's also not that clear to me that sorting by more
columns in the sort below the WindowAgg would always be a win over
doing the final sort for the ORDER BY. What if the WHERE clause (that
could not be pushed down before a join) filtered out the vast majority
of the rows before the ORDER BY. It might be cheaper to do the sort
then than to sort by the additional columns earlier.That's certainly a legitimate question to ask, but I don't quite see
where you figure we'd be sorting more rows? WHERE filtering happens
before window functions, which never eliminate any rows. So it seems
like a sort just before the window functions must sort the same number
of rows as one just after them.
Yeah, I didn't think the WHERE clause thing out carefully enough. I
think it's only the WindowClause's runCondition that could possibly
filter any rows between the Sort below the WindowAgg and before the
ORDER BY is evaluated.
David
On Thu, 5 Jan 2023 at 19:14, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
We are already doing something like I mentioned.
Consider this example:
explain SELECT rank() OVER (ORDER BY a), count(*) OVER (ORDER BY a,b)
FROM abcd;
QUERY PLAN
--------------------------------------------------------------------------
WindowAgg (cost=83.80..127.55 rows=1250 width=24)
-> WindowAgg (cost=83.80..108.80 rows=1250 width=16)
-> Sort (cost=83.80..86.92 rows=1250 width=8)
Sort Key: a, b
-> Seq Scan on abcd (cost=0.00..19.50 rows=1250 width=8)
(5 rows)If it is okay to do extra sort for first window function (rank) here,
why would it beany different in case which I mentioned?
We *can* reuse Sorts where a more strict or equivalent sort order is
available. The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY. One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.
It might be possible to adjust create_one_window_path() so that when
processing the final WindowClause that it looks at the DISTINCT or
ORDER BY clause to see if we can sort on a few extra columns to save
having to do any further sorting. We just *cannot* make any
adjustments to the WindowClause's orderClause.
David
On 05/01/23 12:53, David Rowley wrote:
We *can* reuse Sorts where a more strict or equivalent sort order is
available. The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY. One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.
Okay, now I see issue in my approach.
It might be possible to adjust create_one_window_path() so that when
processing the final WindowClause that it looks at the DISTINCT or
ORDER BY clause to see if we can sort on a few extra columns to save
having to do any further sorting. We just *cannot* make any
adjustments to the WindowClause's orderClause.
This is much better solution. I will check
create_one_window_path for the same.
--
Regards,
Ankit Kumar Pandey
Sorry if multiple mails has been sent for this.
On 05/01/23 12:53, David Rowley wrote:
We *can* reuse Sorts where a more strict or equivalent sort order is
available. The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY. One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.
I attempted this in attached patch.
#1. No op case
------------------------------------------In patched
version-----------------------------------------------
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER
(ORDER BY a,b,c) FROM abcd order by a;
QUERY PLAN
------------------------------------------
WindowAgg
-> Sort
Sort Key: a, b, c
-> WindowAgg
-> Sort
Sort Key: b
-> Seq Scan on abcd
(7 rows)
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER
(ORDER BY a,b,c) FROM abcd;
QUERY PLAN
------------------------------------------
WindowAgg
-> Sort
Sort Key: b
-> WindowAgg
-> Sort
Sort Key: a, b, c
-> Seq Scan on abcd
(7 rows)
----------------------------------------------In
master--------------------------------------------------------
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER
(ORDER BY a,b,c) FROM abcd order by a;
QUERY PLAN
------------------------------------------
WindowAgg
-> Sort
Sort Key: a, b, c
-> WindowAgg
-> Sort
Sort Key: b
-> Seq Scan on abcd
(7 rows)
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER
(ORDER BY a,b,c) FROM abcd;
QUERY PLAN
------------------------------------------
WindowAgg
-> Sort
Sort Key: b
-> WindowAgg
-> Sort
Sort Key: a, b, c
-> Seq Scan on abcd
(7 rows)
No change between patched version and master.
2. In case where optimization can happen
----------------------------In patched
version-------------------------------------------------------
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER
(ORDER BY a) FROM abcd order by a,b;
QUERY PLAN
------------------------------------------
WindowAgg
-> Sort
Sort Key: a, b
-> WindowAgg
-> Sort
Sort Key: b
-> Seq Scan on abcd
(7 rows)
explain (costs off) SELECT rank() OVER (ORDER BY a), count(*) OVER
(ORDER BY b), count(*) OVER (PARTITION BY a ORDER BY b) FROM abcd order
by a,b,c,d;
QUERY PLAN
------------------------------------------------
WindowAgg
-> WindowAgg
-> Sort
Sort Key: a, b, c, d
-> WindowAgg
-> Sort
Sort Key: b
-> Seq Scan on abcd
(8 rows)
-------------------------------------------In
master--------------------------------------------------------
explain (costs off) SELECT rank() OVER (ORDER BY b), count(*) OVER
(ORDER BY a) FROM abcd order by a,b;
QUERY PLAN
------------------------------------------------
Incremental Sort
Sort Key: a, b
Presorted Key: a
-> WindowAgg
-> Sort
Sort Key: a
-> WindowAgg
-> Sort
Sort Key: b
-> Seq Scan on abcd
(10 rows)
explain (costs off) SELECT rank() OVER (ORDER BY a), count(*) OVER
(ORDER BY b), count(*) OVER (PARTITION BY a ORDER BY b) FROM abcd order
by a,b,c,d;
QUERY PLAN
------------------------------------------------------
Incremental Sort
Sort Key: a, b, c, d
Presorted Key: a, b
-> WindowAgg
-> WindowAgg
-> Sort
Sort Key: a, b
-> WindowAgg
-> Sort
Sort Key: b
-> Seq Scan on abcd
(11 rows)
Patched version removes few sorts.
Regression tests all passed so it is not breaking anything existing.
We don't have any tests for verifying sorting plan in window functions
(which would have failed, if present).
Please let me know any feedbacks (I have added some my own concerns in
the comments)
Thanks
--
Regards,
Ankit Kumar Pandey
Attachments:
v1-0002-Teach-planner-to-optimize-sorting-operations-for-win.patchtext/x-patch; charset=UTF-8; name=v1-0002-Teach-planner-to-optimize-sorting-operations-for-win.patchDownload
From 8fb4b6188eda111bee2e47d3e85289064b428702 Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey <itsankitkp@gmail.com>
Date: Fri, 6 Jan 2023 14:30:38 +0530
Subject: [PATCH] Teach planner to optimize sorting operations for window
function
Additional sort for root's order by can be optimized if sorting for
extra columns are performed while sorting for window functions (if sort
clause of window functions are subset of root's sort clause).
---
src/backend/optimizer/plan/planner.c | 70 ++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6ba7589f3..048651dffe 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4448,11 +4448,81 @@ create_one_window_path(PlannerInfo *root,
int presorted_keys;
bool is_sorted;
bool topwindow;
+ List *window_sortclauses;
window_pathkeys = make_pathkeys_for_window(root,
wc,
root->processed_tlist);
+
+ /*
+ * Add aditional sorting clause from
+ * root's order by to window's order by clause,
+ * if they are compatible. This will save additional
+ * sorting at the end. This should happen only if sort
+ * clause in window function is subset of root's order by.
+ * XXX Ideally this check need to be performed on final
+ * window function but if I add condition here, for final
+ * window function, pathkeys would have been same as root's
+ * order by but for others, it would have been lesser number
+ * (if they are subset). So, system would ignore extra sort
+ * and take it as first step (where final function was expected
+ * to be), do regular sort as in master.
+ * Try adding (foreach_current_index(l) == list_length(activeWindows) - 1)
+ * in if condition which check if window_sortclause is subset lengthwise
+ */
+ window_sortclauses = list_concat_copy(wc->partitionClause, wc->orderClause);
+
+ if(list_length(window_sortclauses) < list_length(root->parse->sortClause))
+ {
+ ListCell *wsc;
+ ListCell *sc;
+ bool should_prepend = true; /* wishful thinking */
+
+ /*
+ * Same as in make_pathkeys_for_window
+ * for sorting, partition clause is taken first
+ */
+ window_sortclauses = list_concat_copy(wc->partitionClause, wc->orderClause);
+
+ /*
+ * Sort clauses of root can be prepended if and only if
+ * sort clause of window function is subset of root's sort clause
+ * XXX is there better way to find this?
+ */
+ forboth(wsc, window_sortclauses, sc, root->parse->sortClause)
+ {
+ SortGroupClause *window_sortclause = (SortGroupClause*) lfirst(wsc);
+ SortGroupClause *sortclause = (SortGroupClause*) lfirst(sc);
+ if (window_sortclause->tleSortGroupRef != sortclause->tleSortGroupRef)
+ {
+ should_prepend = false;
+ break;
+ }
+ }
+
+
+ /*
+ * Override window path key defined by make_pathkeys_for_window
+ * XXX since we are overriding make_pathkeys_for_window, it is
+ * a wasteful call in critical path. Should we call make_pathkeys_for_window
+ * after this function?
+ */
+ if (should_prepend)
+ {
+ /*
+ * add root's sort clauses in the end
+ */
+ window_sortclauses = list_concat_unique(window_sortclauses,
+ root->parse->sortClause);
+ window_pathkeys = make_pathkeys_for_sortclauses(root,
+ window_sortclauses,
+ root->processed_tlist);
+
+ }
+
+ }
+
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys,
&presorted_keys);
--
2.37.2
On Thu, 5 Jan 2023 at 04:11, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Attaching test cases for this (+ small change in doc).
Tested this in one of WIP branch where I had modified
select_active_windows and it failedas expected.
Please let me know if something can be improved in this.
Thanks for writing that.
I had a look over the patch and ended up making some adjustments to
the tests. Looking back at 728202b63, I think any tests we add here
should be kept alongside the tests added by that commit rather than
tacked on to the end of the test file. It also makes sense to me just
to use the same table as the original tests. I also thought the
comment in select_active_windows should be in the sort comparator
function instead. I think that's a more likely place to capture the
attention of anyone making modifications.
I've now pushed the adjusted patch.
David
On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 05/01/23 12:53, David Rowley wrote:
We *can* reuse Sorts where a more strict or equivalent sort order is
available. The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY. One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.I attempted this in attached patch.
I had a quick look at this and it's going to need some additional code
to ensure there are no WindowFuncs in the ORDER BY clause. We can't
sort on those before we evaluate them.
Right now you get:
postgres=# explain select *,row_number() over (order by oid) rn1 from
pg_class order by oid,rn1;
ERROR: could not find pathkey item to sort
I also don't think there's any point in adding the additional pathkeys
when the input path is already presorted. Have a look at:
postgres=# set enable_seqscan=0;
SET
postgres=# explain select *,row_number() over (order by oid) rn1 from
pg_class order by oid,relname;
QUERY PLAN
----------------------------------------------------------------------------------------------------
WindowAgg (cost=0.43..85.44 rows=412 width=281)
-> Incremental Sort (cost=0.43..79.26 rows=412 width=273)
Sort Key: oid, relname
Presorted Key: oid
-> Index Scan using pg_class_oid_index on pg_class
(cost=0.27..60.72 rows=412 width=273)
(5 rows)
It would be better to leave this case alone and just do the
incremental sort afterwards.
You also don't seem to be considering the fact that the query might
have a DISTINCT clause. That's evaluated between window functions and
the order by. It would be fairly useless to do a more strict sort when
the sort order is going to be obliterated by a Hash Aggregate. Perhaps
we can just not do this when the query has a DISTINCT clause.
On the other hand, there are also a few reasons why we shouldn't do
this. I mentioned the WindowClause runConditions earlier here.
The patched version produces:
postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
QUERY PLAN
------------------------------------------------------------------------------
WindowAgg (actual time=0.488..0.497 rows=9 loops=1)
Run Condition: (row_number() OVER (?) < 10)
-> Sort (actual time=0.466..0.468 rows=10 loops=1)
Sort Key: pg_class.oid, pg_class.relname
Sort Method: quicksort Memory: 67kB
-> Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1)
Planning Time: 0.214 ms
Execution Time: 0.581 ms
(8 rows)
Whereas master produces:
postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
QUERY PLAN
----------------------------------------------------------------------------------------
Incremental Sort (actual time=0.506..0.508 rows=9 loops=1)
Sort Key: pg_class.oid, pg_class.relname
Presorted Key: pg_class.oid
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 26kB
Peak Memory: 26kB
-> WindowAgg (actual time=0.475..0.483 rows=9 loops=1)
Run Condition: (row_number() OVER (?) < 10)
-> Sort (actual time=0.461..0.461 rows=10 loops=1)
Sort Key: pg_class.oid
Sort Method: quicksort Memory: 67kB
-> Seq Scan on pg_class (actual time=0.022..0.178
rows=420 loops=1)
Planning Time: 0.245 ms
Execution Time: 0.594 ms
(12 rows)
(slightly bad example since oid is unique but...)
It's not too clear to me that the patched version is a better plan.
The bottom level sort, which sorts 420 rows has a more complex
comparison to do. Imagine the 2nd column is a long text string. That
would make the sort much more expensive. The top-level sort has far
fewer rows to sort due to the runCondition filtering out anything that
does not match rn1 < 10. The same can be said for a query with a LIMIT
clause.
I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together. That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.
Also, since I seem to be only be able to think of these cases properly
by actually trying them, I ended up with the attached patch. I opted
to not do the optimisation when there are runConditions or a LIMIT
clause. Doing it when there are runConditions really should be a
cost-based decision, but we've about no hope of having any idea about
how many rows will match the runCondition. For the LIMIT case, it's
also difficult as it would be hard to get an idea of how many times
the additional sort columns would need their comparison function
called. That's only required in a tie-break when the leading columns
are all equal.
The attached patch has no tests added. It's going to need some of
those. These tests should go directly after the tests added in
a14a58329 and likely use the same table for consistency.
David
Attachments:
better_windowclause_sorting_dgr.patchtext/plain; charset=US-ASCII; name=better_windowclause_sorting_dgr.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..781916f219 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,7 @@ create_one_window_path(PlannerInfo *root,
PathTarget *window_target;
ListCell *l;
List *topqual = NIL;
+ bool has_runcondition = false;
/*
* Since each window clause could require a different sort order, we stack
@@ -4457,6 +4458,42 @@ create_one_window_path(PlannerInfo *root,
path->pathkeys,
&presorted_keys);
+ has_runcondition |= (wc->runCondition != NIL);
+
+ /*
+ * If not sorted already and the query has an ORDER BY clause, then we
+ * try to push the entire ORDER BY below the final WindowAgg. We
+ * don't do this when the query has a DISTINCT clause as the planner
+ * might opt to do a Hash Aggregate and spoil our efforts here. We
+ * also can't do this when the ORDER BY contains any WindowFuncs as we
+ * obviously can't sort something that's yet to be evaluated. We also
+ * don't bother with this when there is a WindowClauses with a
+ * runCondition as those can reduce the number of rows to sort in the
+ * ORDER BY and we don't want add complexity to the WindowAgg sort if
+ * the final ORDER BY sort is going to have far fewer rows to sort.
+ * For the same reason, don't bother if the query has a LIMIT clause.
+ */
+ if (!is_sorted && lnext(activeWindows, l) == NULL &&
+ root->parse->distinctClause == NIL &&
+ root->parse->sortClause != NIL &&
+ !has_runcondition && root->parse->limitCount == NULL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+ root->processed_tlist)))
+ {
+ List *orderby_pathkeys;
+
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ window_pathkeys = orderby_pathkeys;
+
+ is_sorted = pathkeys_count_contained_in(window_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ }
+
/* Sort if necessary */
if (!is_sorted)
{
Thanks for looking into this.
On 07/01/23 09:58, David Rowley wrote:
On Sat, 7 Jan 2023 at 02:11, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 05/01/23 12:53, David Rowley wrote:
We *can* reuse Sorts where a more strict or equivalent sort order is
available. The question is how do we get the final WindowClause to do
something slightly more strict to save having to do anything for the
ORDER BY. One way you might think would be to adjust the
WindowClause's orderClause to add the additional clauses, but that
cannot be done because that would cause are_peers() in nodeWindowAgg.c
to not count some rows as peers when they maybe should be given a less
strict orderClause in the WindowClause.I attempted this in attached patch.
I had a quick look at this and it's going to need some additional code
to ensure there are no WindowFuncs in the ORDER BY clause. We can't
sort on those before we evaluate them.
Okay I will add this check.
I also don't think there's any point in adding the additional pathkeys
when the input path is already presorted.It would be better to leave this case alone and just do the
incremental sort afterwards.
So this will be no operation case well.
You also don't seem to be considering the fact that the query might
have a DISTINCT clause.
Major reason for this was that I am not exactly aware of what distinct
clause means (especially in
context of window functions) and how it is different from other
sortClauses (partition, order by, group).
Comments in parsenodes.h didn't help.
That's evaluated between window functions and
the order by. It would be fairly useless to do a more strict sort when
the sort order is going to be obliterated by a Hash Aggregate. Perhaps
we can just not do this when the query has a DISTINCT clause.On the other hand, there are also a few reasons why we shouldn't do
this. I mentioned the WindowClause runConditions earlier here.The patched version produces:
postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
QUERY PLAN
------------------------------------------------------------------------------
WindowAgg (actual time=0.488..0.497 rows=9 loops=1)
Run Condition: (row_number() OVER (?) < 10)
-> Sort (actual time=0.466..0.468 rows=10 loops=1)
Sort Key: pg_class.oid, pg_class.relname
Sort Method: quicksort Memory: 67kB
-> Seq Scan on pg_class (actual time=0.028..0.170 rows=420 loops=1)
Planning Time: 0.214 ms
Execution Time: 0.581 ms
(8 rows)Whereas master produces:
postgres=# explain (analyze, costs off) select * from (select
oid,relname,row_number() over (order by oid) rn1 from pg_class order
by oid,relname) where rn1 < 10;
QUERY PLAN
----------------------------------------------------------------------------------------
Incremental Sort (actual time=0.506..0.508 rows=9 loops=1)
Sort Key: pg_class.oid, pg_class.relname
Presorted Key: pg_class.oid
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 26kB
Peak Memory: 26kB
-> WindowAgg (actual time=0.475..0.483 rows=9 loops=1)
Run Condition: (row_number() OVER (?) < 10)
-> Sort (actual time=0.461..0.461 rows=10 loops=1)
Sort Key: pg_class.oid
Sort Method: quicksort Memory: 67kB
-> Seq Scan on pg_class (actual time=0.022..0.178
rows=420 loops=1)
Planning Time: 0.245 ms
Execution Time: 0.594 ms
(12 rows)(slightly bad example since oid is unique but...)
It's not too clear to me that the patched version is a better plan.
The bottom level sort, which sorts 420 rows has a more complex
comparison to do. Imagine the 2nd column is a long text string. That
would make the sort much more expensive. The top-level sort has far
fewer rows to sort due to the runCondition filtering out anything that
does not match rn1 < 10. The same can be said for a query with a LIMIT
clause.
Yes, this is a fair point. Multiple sort is actually beneficial in cases
like this, perhaps limits clause and runCondition should be no op too?
I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together. That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.
Make sense, I actually duplicated that logic from
make_pathkeys_for_window. We should make this changes there as well because
if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
(weird example but you get the idea), it leads to duplicates in
window_sortclauses.
Also, since I seem to be only be able to think of these cases properly
by actually trying them, I ended up with the attached patch. I opted
to not do the optimisation when there are runConditions or a LIMIT
clause. Doing it when there are runConditions really should be a
cost-based decision, but we've about no hope of having any idea about
how many rows will match the runCondition. For the LIMIT case, it's
also difficult as it would be hard to get an idea of how many times
the additional sort columns would need their comparison function
called. That's only required in a tie-break when the leading columns
are all equal.
Agree with runConditions part but for limit clause, row reduction happens
at the last, so whether we use patched version or master version,
none of sorts would benefit/degrade from that, right?
The attached patch has no tests added. It's going to need some of
those. These tests should go directly after the tests added in
a14a58329 and likely use the same table for consistency.
Thanks for the patch. It looks much neater now. I will add cases for this
(after a14a58329). I do have a very general question though. Is it okay
to add comments in test cases? I don't see it much on existing cases
so kind of reluctant to add but it makes intentions much more clear.
--
Regards,
Ankit Kumar Pandey
On 07/01/23 07:59, David Rowley wrote:
On Thu, 5 Jan 2023 at 04:11, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Attaching test cases for this (+ small change in doc).
Tested this in one of WIP branch where I had modified
select_active_windows and it failedas expected.
Please let me know if something can be improved in this.
Thanks for writing that.
I had a look over the patch and ended up making some adjustments to
the tests. Looking back at 728202b63, I think any tests we add here
should be kept alongside the tests added by that commit rather than
tacked on to the end of the test file. It also makes sense to me just
to use the same table as the original tests. I also thought the
comment in select_active_windows should be in the sort comparator
function instead. I think that's a more likely place to capture the
attention of anyone making modifications.
Thanks, I will look it through.
I've now pushed the adjusted patch.
I can't seem to find updated patch in the attachment, can you please
forward the patch again.
Thanks.
--
Regards,
Ankit Kumar Pandey
Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?
On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 07/01/23 09:58, David Rowley wrote:
You also don't seem to be considering the fact that the query might
have a DISTINCT clause.Major reason for this was that I am not exactly aware of what distinct
clause means (especially incontext of window functions) and how it is different from other
sortClauses (partition, order by, group).
I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().
Yes, this is a fair point. Multiple sort is actually beneficial in cases
like this, perhaps limits clause and runCondition should be no op too?
I'm not sure what you mean by "no op". Do you mean not apply the optimization?
I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together. That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.Make sense, I actually duplicated that logic from
make_pathkeys_for_window. We should make this changes there as well because
if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
(weird example but you get the idea), it leads to duplicates in
window_sortclauses.
It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.
Agree with runConditions part but for limit clause, row reduction happens
at the last, so whether we use patched version or master version,
none of sorts would benefit/degrade from that, right?
Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.
You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case. Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.
Is it okay
to add comments in test cases? I don't see it much on existing cases
so kind of reluctant to add but it makes intentions much more clear.
I think tests should always have a comment to state what they're
testing. Not many people seem to do that, unfortunately. The problem
with not stating what the test is testing is that if, for example, the
test is checking that the EXPLAIN output is showing a Sort, what if at
some point in the future someone adjusts some costing code and the
plan changes to an Index Scan. If there's no comment to state that
we're looking for a Sort plan, then the author of the patch that's
adjusting the costs might just think it's ok to change the expected
plan to an Index Scan. I've seen this problem occur even when the
comments *do* exist. There's just about no hope of such a test
continuing to do what it's meant to if the comments don't exist.
David
On 07/01/23 17:28, David Rowley wrote:
Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?
I have adjusted my mail client, hope it is better now?
On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 07/01/23 09:58, David Rowley wrote:
You also don't seem to be considering the fact that the query might
have a DISTINCT clause.Major reason for this was that I am not exactly aware of what distinct
clause means (especially incontext of window functions) and how it is different from other
sortClauses (partition, order by, group).I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().
Yes just saw this and got what you meant.
Yes, this is a fair point. Multiple sort is actually beneficial in cases
like this, perhaps limits clause and runCondition should be no op too?I'm not sure what you mean by "no op". Do you mean not apply the optimization?
Yes, no op = no optimization. Sorry I didn't mention it before.
I think the patch should also be using pathkeys_contained_in() and
Lists of pathkeys rather than concatenating lists of SortGroupClauses
together. That should allow things to work correctly when a given
pathkey has become redundant due to either duplication or a Const in
the Eclass.Make sense, I actually duplicated that logic from
make_pathkeys_for_window. We should make this changes there as well because
if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
(weird example but you get the idea), it leads to duplicates in
window_sortclauses.It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.
Okay I see this, pathkey_is_redundant is much more smarter as well.
Replacing list_concat_copy with list_concat_unique in
make_pathkeys_for_window
won't be of much benefit.
Agree with runConditions part but for limit clause, row reduction happens
at the last, so whether we use patched version or master version,
none of sorts would benefit/degrade from that, right?Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case. Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.
I checked this. For limit <<< total number of rows, top-n sort was
performed but when I changed limit to higher value (or no limit),
quick sort was performed.
Top-n sort was twice as fast.
Also, tested (first) patch version vs master, top-n sort
was twice as fast there as well (outputs mentioned below).
Current patch version (with limit excluded for optimizations)
explain (analyze ,costs off) select count(*) over (order by id) from tt
order by id, name limit 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Limit (actual time=1.718..1.719 rows=1 loops=1)
-> Incremental Sort (actual time=1.717..1.717 rows=1 loops=1)
Sort Key: id, name
Presorted Key: id
Full-sort Groups: 1 Sort Method: top-N heapsort Average
Memory: 25kB Peak Memory: 25kB
-> WindowAgg (actual time=0.028..0.036 rows=6 loops=1)
-> Sort (actual time=0.017..0.018 rows=6 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on tt (actual time=0.011..0.012
rows=6 loops=1)
Planning Time: 0.069 ms
Execution Time: 1.799 ms
Earlier patch(which included limit clause for optimizations)
explain (analyze ,costs off) select count(*) over (order by id) from tt
order by id, name limit 1;
QUERY PLAN
----------------------------------------------------------------------------
Limit (actual time=3.766..3.767 rows=1 loops=1)
-> WindowAgg (actual time=3.764..3.765 rows=1 loops=1)
-> Sort (actual time=3.749..3.750 rows=6 loops=1)
Sort Key: id, name
Sort Method: quicksort Memory: 25kB
-> Seq Scan on tt (actual time=0.011..0.013 rows=6 loops=1)
Planning Time: 0.068 ms
Execution Time: 3.881 ms
I am just wondering though, why can we not do top-N sort
in optimized version if we include limit clause? Is top-N sort is
limited to non strict sorting or
cases last operation before limit is sort? .
Is it okay
to add comments in test cases? I don't see it much on existing cases
so kind of reluctant to add but it makes intentions much more clear.I think tests should always have a comment to state what they're
testing. Not many people seem to do that, unfortunately. The problem
with not stating what the test is testing is that if, for example, the
test is checking that the EXPLAIN output is showing a Sort, what if at
some point in the future someone adjusts some costing code and the
plan changes to an Index Scan. If there's no comment to state that
we're looking for a Sort plan, then the author of the patch that's
adjusting the costs might just think it's ok to change the expected
plan to an Index Scan. I've seen this problem occur even when the
comments *do* exist. There's just about no hope of such a test
continuing to do what it's meant to if the comments don't exist.
Thanks for clarifying this out, I will freely add comments if that helps
to explain things better.
--
Regards,
Ankit Kumar Pandey
On 07/01/23 09:58, David Rowley wrote:
The attached patch has no tests added. It's going to need some of
those.
While writing test cases, I found that optimization do not happen for
case #1
(which is prime candidate for such operation) like
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date
This happens because mutual exclusiveness of two operands (when number
of window functions > 1) viz
is_sorted and last activeWindow in the condition:
( !is_sorted && lnext(activeWindows, l) == NULL)
For 2nd last window function, is_sorted is false and path keys get added.
In next run (for last window function), is_sorted becomes true and whole
optimization
part is skipped.
Note: Major issue that if I remove is_sorted from condition, even though
path keys are added, it still do not perform optimization and works same
as in master/unoptimized case.
Perhaps adding path keys at last window function is not doing trick?
Maybe we need to add pathkeys
to all window functions which are subset of query's order by
irrespective of being last or not?
Case #2:
For presorted columns, eg
CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;
Is this correct plan:
a)
QUERY PLAN
-------------------------------------------------------
Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> WindowAgg
-> Index Scan using depname_idx on empsalary
(5 rows)
or this:
b) (Akin to Optimized version)
QUERY PLAN
-------------------------------------------------------
WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Index Scan using depname_idx on empsalary
(5 rows)
Patched version does (a) because of is_sorted condition.
If we remove both is_sorted and lnext(activeWindows, l) == NULL conditions,
we get correct results in these two cases.
--
Regards,
Ankit Kumar Pandey
On 07/01/23 21:57, Ankit Kumar Pandey wrote:
On 07/01/23 09:58, David Rowley wrote:
The attached patch has no tests added. It's going to need some of
those.While writing test cases, I found that optimization do not happen for
case #1(which is prime candidate for such operation) like
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_dateThis happens because mutual exclusiveness of two operands (when number
of window functions > 1) vizis_sorted and last activeWindow in the condition:
( !is_sorted && lnext(activeWindows, l) == NULL)
For 2nd last window function, is_sorted is false and path keys get added.
In next run (for last window function), is_sorted becomes true and whole
optimizationpart is skipped.
Note: Major issue that if I remove is_sorted from condition, even though
path keys are added, it still do not perform optimization and works same
as in master/unoptimized case.Perhaps adding path keys at last window function is not doing trick?
Maybe we need to add pathkeysto all window functions which are subset of query's order by
irrespective of being last or not?Case #2:
For presorted columns, eg
CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;Is this correct plan:
a)
QUERY PLAN
-------------------------------------------------------
Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> WindowAgg
-> Index Scan using depname_idx on empsalary
(5 rows)or this:
b) (Akin to Optimized version)
QUERY PLAN
-------------------------------------------------------
WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Index Scan using depname_idx on empsalary
(5 rows)Patched version does (a) because of is_sorted condition.
If we remove both is_sorted and lnext(activeWindows, l) == NULL conditions,
we get correct results in these two cases.
Attached patch with test cases.
For case #2, test cases still uses (a) as expected output which I don't
think is right
and we should revisit. Other than that, only failing case is due to
issue mentioned in case #1.
Thanks
--
Regards,
Ankit Kumar Pandey
Attachments:
better_windowclause_sorting_dgr-v2.patchtext/x-patch; charset=UTF-8; name=better_windowclause_sorting_dgr-v2.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..781916f219 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,7 @@ create_one_window_path(PlannerInfo *root,
PathTarget *window_target;
ListCell *l;
List *topqual = NIL;
+ bool has_runcondition = false;
/*
* Since each window clause could require a different sort order, we stack
@@ -4457,6 +4458,42 @@ create_one_window_path(PlannerInfo *root,
path->pathkeys,
&presorted_keys);
+ has_runcondition |= (wc->runCondition != NIL);
+
+ /*
+ * If not sorted already and the query has an ORDER BY clause, then we
+ * try to push the entire ORDER BY below the final WindowAgg. We
+ * don't do this when the query has a DISTINCT clause as the planner
+ * might opt to do a Hash Aggregate and spoil our efforts here. We
+ * also can't do this when the ORDER BY contains any WindowFuncs as we
+ * obviously can't sort something that's yet to be evaluated. We also
+ * don't bother with this when there is a WindowClauses with a
+ * runCondition as those can reduce the number of rows to sort in the
+ * ORDER BY and we don't want add complexity to the WindowAgg sort if
+ * the final ORDER BY sort is going to have far fewer rows to sort.
+ * For the same reason, don't bother if the query has a LIMIT clause.
+ */
+ if (!is_sorted && lnext(activeWindows, l) == NULL &&
+ root->parse->distinctClause == NIL &&
+ root->parse->sortClause != NIL &&
+ !has_runcondition && root->parse->limitCount == NULL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+ root->processed_tlist)))
+ {
+ List *orderby_pathkeys;
+
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ window_pathkeys = orderby_pathkeys;
+
+ is_sorted = pathkeys_count_contained_in(window_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ }
+
/* Sort if necessary */
if (!is_sorted)
{
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..2b8b5c4533 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,158 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(11 rows)
+-- Do not perform additional sort if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------
+ Sort
+ Sort Key: empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(6 rows)
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(10 rows)
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..1f4264ce03 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,8 +1324,84 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+-- Do not perform additional sort if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
--
2.37.2
On Sun, 8 Jan 2023 at 01:52, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
I am just wondering though, why can we not do top-N sort
in optimized version if we include limit clause? Is top-N sort is
limited to non strict sorting or
cases last operation before limit is sort? .
Maybe the sort bound can be pushed down. You'd need to adjust
ExecSetTupleBound() so that it pushes the bound through
WindowAggState.
David
(your email client still seems broken)
On Sun, 8 Jan 2023 at 05:27, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
While writing test cases, I found that optimization do not happen for
case #1(which is prime candidate for such operation) like
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_dateThis happens because mutual exclusiveness of two operands (when number
of window functions > 1) vizis_sorted and last activeWindow in the condition:
( !is_sorted && lnext(activeWindows, l) == NULL)
For 2nd last window function, is_sorted is false and path keys get added.
In next run (for last window function), is_sorted becomes true and whole
optimizationpart is skipped.
Note: Major issue that if I remove is_sorted from condition, even though
path keys are added, it still do not perform optimization and works same
as in master/unoptimized case.Perhaps adding path keys at last window function is not doing trick?
Maybe we need to add pathkeysto all window functions which are subset of query's order by
irrespective of being last or not?
You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop. Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation. That'll save from having to check things like if the
query has a LIMIT clause multiple times.
Case #2:
For presorted columns, eg
CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;Is this correct plan:
a)
QUERY PLAN
-------------------------------------------------------
Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> WindowAgg
-> Index Scan using depname_idx on empsalary
(5 rows)or this:
b) (Akin to Optimized version)
QUERY PLAN
-------------------------------------------------------
WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Index Scan using depname_idx on empsalary
(5 rows)Patched version does (a) because of is_sorted condition.
a) looks like the best plan to me. What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.
David
On Sun, 8 Jan 2023 at 05:45, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Attached patch with test cases.
I can look at this in a bit more detail if you find a way to fix the
case you mentioned earlier. i.e, push the sort down to the deepest
WindowAgg that has pathkeys contained in the query's ORDER BY
pathkeys.
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
-----------------------------------------------
Incremental Sort
Sort Key: depname, empno, enroll_date
Presorted Key: depname, empno
-> WindowAgg
-> WindowAgg
-> Sort
Sort Key: depname, empno
-> Seq Scan on empsalary
(8 rows)
You'll also need to pay attention to how the has_runcondition is set.
If you start pushing before looking at all the WindowClauses then you
won't know if some later WindowClause has a runCondition. Adding an
additional backwards foreach loop should allow you to do all the
required prechecks and find the index of the WindowClause which you
should start pushing from.
David
On 08/01/23 03:56, David Rowley wrote:
(your email client still seems broken)
I am looking at this again, will be changing client for here onward.
You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop. Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation. That'll save from having to check things like if the
query has a LIMIT clause multiple times.
Thanks, this should do the trick.
a) looks like the best plan to me. What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.
Yes, you are right, not in this case. I actually mentioned wrong case here,
real problematic case is:
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
-------------------------------------------------------------------
Incremental Sort
Sort Key: depname, empno, enroll_date
Presorted Key: depname, empno
-> WindowAgg
-> WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Index Scan using depname_idx on empsalary
(9 rows)
Here, it could have sorted on depname, empno, enroll_date.
Again, as I mentioned before, this is implementation issue. We shouldn't be
skipping optimization if pre-sorted keys are present.
--
Regards,
Ankit Kumar Pandey
On 08/01/23 04:06, David Rowley wrote:
On Sun, 8 Jan 2023 at 05:45, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Attached patch with test cases.
I can look at this in a bit more detail if you find a way to fix the
case you mentioned earlier. i.e, push the sort down to the deepest
WindowAgg that has pathkeys contained in the query's ORDER BY
pathkeys.
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
-----------------------------------------------
Incremental Sort
Sort Key: depname, empno, enroll_date
Presorted Key: depname, empno
-> WindowAgg
-> WindowAgg
-> Sort
Sort Key: depname, empno
-> Seq Scan on empsalary
(8 rows)You'll also need to pay attention to how the has_runcondition is set.
If you start pushing before looking at all the WindowClauses then you
won't know if some later WindowClause has a runCondition.
Yes, this should be the main culprit.
Adding an additional backwards foreach loop should allow you to do all the
required prechecks and find the index of the WindowClause which you
should start pushing from.
This should do the trick. Thanks for headup, I will update the patch
with suggested
changes and required fixes.
Regards,
Ankit
Hi David,
Please find attached patch with addressed issues mentioned before.
Things resolved:
1. Correct position of window function from where order by push down can
happen
is determined, this fixes issue mentioned in case #1.
While writing test cases, I found that optimization do not happen for
case #1(which is prime candidate for such operation) like
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date
2. Point #2 as in above discussions
a) looks like the best plan to me. What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.
Works as mentioned.
All test cases (newly added and existing ones) are green.
--
Regards,
Ankit Kumar Pandey
Attachments:
better_windowclause_sorting_dgr-v3.patchtext/x-patch; charset=UTF-8; name=better_windowclause_sorting_dgr-v3.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..fd3fcd43a0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,12 @@ create_one_window_path(PlannerInfo *root,
PathTarget *window_target;
ListCell *l;
List *topqual = NIL;
+ bool has_runcondition = false;
+
+ List *window_pathkeys;
+ int index = 0;
+ int i;
+ bool enable_order_by_pushdown = false;
/*
* Since each window clause could require a different sort order, we stack
@@ -4441,10 +4447,45 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
+ enable_order_by_pushdown = (root->parse->distinctClause == NIL &&
+ root->parse->sortClause != NIL &&
+ root->parse->limitCount == NULL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+ root->processed_tlist)));
+
+
+ i=0;
+ if (enable_order_by_pushdown)
+ {
+ /*
+ * Search for the window function whose path keys are
+ * subset of orderby_path keys, this allows us to perform
+ * order by pushdown from this position of window function onwards
+ */
+ foreach(l, activeWindows)
+ {
+ List *orderby_pathkeys;
+ WindowClause *wc = lfirst_node(WindowClause, l);
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+ window_pathkeys = make_pathkeys_for_window(root,
+ wc,
+ root->processed_tlist);
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ {
+
+ index = i;
+ break;
+ }
+ i++;
+ }
+ }
+
+ i=0;
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
- List *window_pathkeys;
int presorted_keys;
bool is_sorted;
bool topwindow;
@@ -4457,6 +4498,38 @@ create_one_window_path(PlannerInfo *root,
path->pathkeys,
&presorted_keys);
+ has_runcondition |= (wc->runCondition != NIL);
+
+ /*
+ * If not sorted already and the query has an ORDER BY clause, then we
+ * try to push the entire ORDER BY below the final WindowAgg. We
+ * don't do this when the query has a DISTINCT clause as the planner
+ * might opt to do a Hash Aggregate and spoil our efforts here. We
+ * also can't do this when the ORDER BY contains any WindowFuncs as we
+ * obviously can't sort something that's yet to be evaluated. We also
+ * don't bother with this when there is a WindowClauses with a
+ * runCondition as those can reduce the number of rows to sort in the
+ * ORDER BY and we don't want add complexity to the WindowAgg sort if
+ * the final ORDER BY sort is going to have far fewer rows to sort.
+ * For the same reason, don't bother if the query has a LIMIT clause.
+ */
+ if (enable_order_by_pushdown && !is_sorted && !has_runcondition && (i == index))
+ {
+ List *orderby_pathkeys;
+
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ window_pathkeys = orderby_pathkeys;
+
+ is_sorted = pathkeys_count_contained_in(window_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ }
+ i++;
+
/* Sort if necessary */
if (!is_sorted)
{
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..08663fd914 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,194 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(11 rows)
+-- Do not perform additional sort if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-------------------------------------------------------
+ WindowAgg
+ -> Incremental Sort
+ Sort Key: depname, empno, enroll_date
+ Presorted Key: depname
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------
+ Sort
+ Sort Key: empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(6 rows)
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(10 rows)
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..274b7fed7d 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,8 +1324,102 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+-- Do not perform additional sort if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
--
2.37.2
On Sun, 8 Jan 2023 at 23:21, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Please find attached patch with addressed issues mentioned before.
Here's a quick review:
1. You can use foreach_current_index(l) to obtain the index of the list element.
2. I'd rather see you looping backwards over the list in the first
loop. I think it'll be more efficient to loop backwards as you can
just break out the loop when you see the pathkeys are not contained in
the order by pathkreys. When the optimisation does not apply that
means you only need to look at the last item in the list. You could
maybe just invent foreach_reverse() for this purpose and put it in
pg_list.h. That'll save having to manually code up the loop.
3. I don't think you should call the variable
enable_order_by_pushdown. We've a bunch of planner related GUCs that
start with enable_. That might cause a bit of confusion. Maybe just
try_sort_pushdown.
4. You should widen the scope of orderby_pathkeys and set it within
the if (enable_order_by_pushdown) scope. You can reuse this again in
the 2nd loop too. Just initialise it to NIL
5. You don't seem to be checking all WindowClauses for a runCondtion.
If you do #2 above then you can start that process in the initial
reverse loop so that you've checked them all by the time you get
around to that WindowClause again in the 2nd loop.
6. The test with "+-- Do not perform additional sort if column is
presorted". I don't think "additional" is the correct word here. I
think you want to ensure that we don't push down the ORDER BY below
the WindowAgg for this case. There is no ability to reduce the sorts
here, only move them around, which we agreed we don't want to do for
this case.
Also, do you want to have a go at coding up the sort bound pushdown
too? It'll require removing the limitCount restriction and adjusting
ExecSetTupleBound() to recurse through a WindowAggState. I think it's
pretty easy. You could try it then play around with it to make sure it
works and we get the expected performance. You'll likely want to add a
few more rows than the last performance test you did or run the query
with pgbench. Running a query once that only takes 1-2ms is likely not
a reliable way to test the performance of something.
David
On 1/8/23 11:21, Ankit Kumar Pandey wrote:
Please find attached patch with addressed issues mentioned before.
I am curious about this plan:
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
Why aren't min() and sum() calculated on the same WindowAgg run?
--
Vik Fearing
On 08/01/23 21:36, Vik Fearing wrote:
On 1/8/23 11:21, Ankit Kumar Pandey wrote:
Please find attached patch with addressed issues mentioned before.
I am curious about this plan:
+-- ORDER BY's in multiple Window functions can be combined into one +-- if they are subset of QUERY's ORDER BY +EXPLAIN (COSTS OFF) +SELECT empno, + depname, + min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary, + sum(salary) OVER (PARTITION BY depname) depsalary, + count(*) OVER (ORDER BY enroll_date DESC) c +FROM empsalary +ORDER BY depname, empno, enroll_date; + QUERY PLAN +------------------------------------------------------ + WindowAgg + -> WindowAgg + -> Sort + Sort Key: depname, empno, enroll_date + -> WindowAgg + -> Sort + Sort Key: enroll_date DESC + -> Seq Scan on empsalary +(8 rows) +
Why aren't min() and sum() calculated on the same WindowAgg run?
Isn't that exactly what is happening here? First count() with sort on
enroll_date is run and
then min() and sum()?
Only difference between this and plan generated by master(given below)
is a sort in the end.
QUERY PLAN
------------------------------------------------------------
Incremental Sort
Sort Key: depname, empno, enroll_date
Presorted Key: depname, empno
-> WindowAgg
-> WindowAgg
-> Sort
Sort Key: depname, empno
-> WindowAgg
-> Sort
Sort Key: enroll_date DESC
-> Seq Scan on empsalary
Let me know if I am missing anything. Thanks.
--
Regards,
Ankit Kumar Pandey
On 08/01/23 16:33, David Rowley wrote:
On Sun, 8 Jan 2023 at 23:21, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Please find attached patch with addressed issues mentioned before.
Here's a quick review:
1. You can use foreach_current_index(l) to obtain the index of the list element.
2. I'd rather see you looping backwards over the list in the first
loop. I think it'll be more efficient to loop backwards as you can
just break out the loop when you see the pathkeys are not contained in
the order by pathkreys. When the optimisation does not apply that
means you only need to look at the last item in the list. You could
maybe just invent foreach_reverse() for this purpose and put it in
pg_list.h. That'll save having to manually code up the loop.3. I don't think you should call the variable
enable_order_by_pushdown. We've a bunch of planner related GUCs that
start with enable_. That might cause a bit of confusion. Maybe just
try_sort_pushdown.4. You should widen the scope of orderby_pathkeys and set it within
the if (enable_order_by_pushdown) scope. You can reuse this again in
the 2nd loop too. Just initialise it to NIL5. You don't seem to be checking all WindowClauses for a runCondtion.
If you do #2 above then you can start that process in the initial
reverse loop so that you've checked them all by the time you get
around to that WindowClause again in the 2nd loop.6. The test with "+-- Do not perform additional sort if column is
presorted". I don't think "additional" is the correct word here. I
think you want to ensure that we don't push down the ORDER BY below
the WindowAgg for this case. There is no ability to reduce the sorts
here, only move them around, which we agreed we don't want to do for
this case.
I have addressed all points 1-6 in the attached patch.
I have one doubt regarding runCondition, do we only need to ensure
that window function which has subset sort clause of main query should
not have runCondition or none of the window functions should not contain
runCondition? I have gone with later case but wanted to clarify.
Also, do you want to have a go at coding up the sort bound pushdown
too? It'll require removing the limitCount restriction and adjusting
ExecSetTupleBound() to recurse through a WindowAggState. I think it's
pretty easy. You could try it then play around with it to make sure it
works and we get the expected performance.
I tried this in the patch but kept getting `retrieved too many tuples in
a bounded sort`.
Added following code in ExecSetTupleBound which correctly found sortstate
and set bound value.
else if(IsA(child_node, WindowAggState))
{
WindowAggState *winstate = (WindowAggState *) child_node;
if (outerPlanState(winstate))
ExecSetTupleBound(tuples_needed, outerPlanState(winstate));
}
I think problem is that are not using limit clause inside window
function (which
may need to scan all tuples) so passing bound value to
WindowAggState->sortstate
is not working as we might expect. Or maybe I am getting it wrong? I was
trying to
have top-N sort for limit clause if orderby pushdown happens.
You'll likely want to add a few more rows than the last performance test you did or run the query
with pgbench. Running a query once that only takes 1-2ms is likely not
a reliable way to test the performance of something.
I did some profiling.
CREATE TABLE empsalary1 (
depname varchar,
empno bigint,
salary int,
enroll_date date
);
INSERT INTO empsalary1(depname, empno, salary, enroll_date)
SELECT string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() * 62)::integer, 1000), '')
AS depname, generate_series(1, 10000000) AS empno, ceil (random()*10000)::integer AS salary
, NOW() + (random() * (interval '90 days')) + '30 days' AS enroll_date;
1) Optimization case
EXPLAIN (ANALYZE)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary,
count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary1
ORDER BY depname, empno, enroll_date;
EXPLAIN (ANALYZE)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary1
ORDER BY depname, empno;
In patched version:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
-----------------------
WindowAgg (cost=93458.04..93458.08 rows=1 width=61) (actual time=149996.006..156756.995 rows=10000000 loops=1)
-> WindowAgg (cost=93458.04..93458.06 rows=1 width=57) (actual time=108559.126..135892.188 rows=10000000 loops=1)
-> Sort (cost=93458.04..93458.04 rows=1 width=53) (actual time=108554.213..112564.168 rows=10000000 loops=1)
Sort Key: depname, empno, enroll_date
Sort Method: external merge Disk: 645856kB
-> WindowAgg (cost=93458.01..93458.03 rows=1 width=53) (actual time=30386.551..62357.669 rows=10000000 lo
ops=1)
-> Sort (cost=93458.01..93458.01 rows=1 width=45) (actual time=23260.104..26313.395 rows=10000000 l
oops=1)
Sort Key: enroll_date DESC
Sort Method: external merge Disk: 528448kB
-> Seq Scan on empsalary1 (cost=0.00..93458.00 rows=1 width=45) (actual time=0.032..4833.603
rows=10000000 loops=1)
Planning Time: 4.693 ms
Execution Time: 158161.281 ms
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
-------------
WindowAgg (cost=1903015.63..2078015.74 rows=10000006 width=39) (actual time=40565.305..46598.984 rows=10000000 loops=1)
-> Sort (cost=1903015.63..1928015.65 rows=10000006 width=39) (actual time=23411.837..27467.962 rows=10000000 loops=1)
Sort Key: depname, empno
Sort Method: external merge Disk: 528448kB
-> Seq Scan on empsalary1 (cost=0.00..193458.06 rows=10000006 width=39) (actual time=5.095..5751.675 rows=10000
000 loops=1)
Planning Time: 0.099 ms
Execution Time: 47415.926 ms
In master:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
-------------------------------------
Incremental Sort (cost=3992645.36..4792645.79 rows=10000006 width=59) (actual time=147130.132..160985.373 rows=10000000
loops=1)
Sort Key: depname, empno, enroll_date
Presorted Key: depname, empno
Full-sort Groups: 312500 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
-> WindowAgg (cost=3992645.31..4342645.52 rows=10000006 width=59) (actual time=147129.936..154023.147 rows=10000000 l
oops=1)
-> WindowAgg (cost=3992645.31..4192645.43 rows=10000006 width=55) (actual time=104665.289..133089.188 rows=1000
0000 loops=1)
-> Sort (cost=3992645.31..4017645.33 rows=10000006 width=51) (actual time=104665.257..108710.282 rows=100
00000 loops=1)
Sort Key: depname, empno
Sort Method: external merge Disk: 645856kB
-> WindowAgg (cost=1971370.63..2146370.74 rows=10000006 width=51) (actual time=28314.300..59737.949
rows=10000000 loops=1)
-> Sort (cost=1971370.63..1996370.65 rows=10000006 width=43) (actual time=21190.188..24098.59
6 rows=10000000 loops=1)
Sort Key: enroll_date DESC
Sort Method: external merge Disk: 528448kB
-> Seq Scan on empsalary1 (cost=0.00..193458.06 rows=10000006 width=43) (actual time=0.
630..5317.862 rows=10000000 loops=1)
Planning Time: 0.982 ms
Execution Time: 163369.242 ms
(16 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
-------------------
Incremental Sort (cost=3787573.31..3912573.41 rows=10000006 width=39) (actual time=51547.195..53950.034 rows=10000000 lo
ops=1)
Sort Key: depname, empno
Presorted Key: depname
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB
Pre-sorted Groups: 1 Sort Method: external merge Average Disk: 489328kB Peak Disk: 489328kB
-> WindowAgg (cost=1903015.63..2078015.74 rows=10000006 width=39) (actual time=33413.954..39771.262 rows=10000000 loo
ps=1)
-> Sort (cost=1903015.63..1928015.65 rows=10000006 width=39) (actual time=18991.129..21992.353 rows=10000000 lo
ops=1)
Sort Key: depname
Sort Method: external merge Disk: 528456kB
-> Seq Scan on empsalary1 (cost=0.00..193458.06 rows=10000006 width=39) (actual time=1.300..5269.729 rows
=10000000 loops=1)
Planning Time: 4.506 ms
Execution Time: 54768.697 ms
2) No optimization case:
EXPLAIN (ANALYZE)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
FROM empsalary1
ORDER BY enroll_date;
Patch:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
--------------------------------
Sort (cost=3968191.79..3993191.80 rows=10000006 width=43) (actual
time=57863.173..60976.324 rows=10000000 loops=1)
Sort Key: enroll_date
Sort Method: external merge Disk: 528448kB
-> WindowAgg (cost=850613.62..2190279.21 rows=10000006 width=43)
(actual time=7478.966..42502.541 rows=10000000 loops
=1)
-> Gather Merge (cost=850613.62..2015279.11 rows=10000006
width=43) (actual time=7478.935..18037.001 rows=10000
000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=849613.60..860030.27 rows=4166669
width=43) (actual time=7349.101..9397.713 rows=3333333 lo
ops=3)
Sort Key: depname, empno
Sort Method: external merge Disk: 181544kB
Worker 0: Sort Method: external merge Disk: 169328kB
Worker 1: Sort Method: external merge Disk: 177752kB
-> Parallel Seq Scan on empsalary1
(cost=0.00..135124.69 rows=4166669 width=43) (actual time=0.213.
.2450.635 rows=3333333 loops=3)
Planning Time: 0.100 ms
Execution Time: 63341.783 ms
master:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
--------------------------------
Sort (cost=3968191.79..3993191.80 rows=10000006 width=43) (actual
time=54097.880..57000.806 rows=10000000 loops=1)
Sort Key: enroll_date
Sort Method: external merge Disk: 528448kB
-> WindowAgg (cost=850613.62..2190279.21 rows=10000006 width=43)
(actual time=7075.245..39200.756 rows=10000000 loops
=1)
-> Gather Merge (cost=850613.62..2015279.11 rows=10000006
width=43) (actual time=7075.217..15988.922 rows=10000
000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=849613.60..860030.27 rows=4166669
width=43) (actual time=6993.974..8799.701 rows=3333333 lo
ops=3)
Sort Key: depname, empno
Sort Method: external merge Disk: 171904kB
Worker 0: Sort Method: external merge Disk: 178496kB
Worker 1: Sort Method: external merge Disk: 178224kB
-> Parallel Seq Scan on empsalary1
(cost=0.00..135124.69 rows=4166669 width=43) (actual time=0.044.
.2683.598 rows=3333333 loops=3)
Planning Time: 5.718 ms
Execution Time: 59188.469 ms
(15 rows)
Master and patch have same performance as plan is same.
pgbench (this is to find average performance):
create table empsalary2 as select * from empsalary1 limit 1000;
-------------------------------------------------------------------
test.sql
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary,
count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary2
ORDER BY depname, empno, enroll_date;
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary2
ORDER BY depname, empno;
----------------------------------------------------------------------
/usr/local/pgsql/bin/pgbench -d test -c 10 -j 4 -t 1000 -f test.sql
Patch:
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 55.262 ms
initial connection time = 8.480 ms
tps = 180.957685 (without initial connection time)
Master:
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 60.489 ms
initial connection time = 7.069 ms
tps = 165.320205 (without initial connection time)
TPS of patched version is higher than that of master's for same set of
queries.
where optimization is performed.
--
Regards,
Ankit Kumar Pandey
Attachments:
better_windowclause_sorting_dgr-v4.patchtext/x-patch; charset=UTF-8; name=better_windowclause_sorting_dgr-v4.patchDownload
From f3dd5a27f63030dea8d1530896daf4ba1c8a5265 Mon Sep 17 00:00:00 2001
From: Ankit Kumar Pandey <itsankitkp@gmail.com>
Date: Sat, 7 Jan 2023 19:10:24 +0530
Subject: [PATCH] Additional sort for root's order by can be optimized if
sorting for extra columns are performed while sorting for window functions
(if sort clause of window functions are subset of root's sort clause).
---
src/backend/optimizer/plan/planner.c | 68 +++++++++-
src/include/nodes/pg_list.h | 14 ++
src/test/regress/expected/window.out | 187 +++++++++++++++++++++++++++
src/test/regress/sql/window.sql | 94 ++++++++++++++
4 files changed, 362 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..10d68e2fc3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,13 @@ create_one_window_path(PlannerInfo *root,
PathTarget *window_target;
ListCell *l;
List *topqual = NIL;
+ bool has_runcondition = false;
+
+ List *window_pathkeys;
+ List *orderby_pathkeys = NIL;
+ int index = 0;
+ int i;
+ bool try_sort_pushdown = false;
/*
* Since each window clause could require a different sort order, we stack
@@ -4441,10 +4448,41 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
+ try_sort_pushdown = (root->parse->distinctClause == NIL &&
+ root->parse->sortClause != NIL &&
+ root->parse->limitCount == NULL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+ root->processed_tlist)));
+
+
+ if (try_sort_pushdown)
+ {
+ /*
+ * Search for the window function whose path keys are
+ * subset of orderby_path keys, this allows us to perform
+ * order by pushdown from this position of window function onwards
+ */
+ foreach_reverse(l, activeWindows)
+ {
+ WindowClause *wc = lfirst_node(WindowClause, l);
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+ window_pathkeys = make_pathkeys_for_window(root,
+ wc,
+ root->processed_tlist);
+ has_runcondition |= (wc->runCondition != NIL);
+ if (!pathkeys_contained_in(window_pathkeys, orderby_pathkeys) || has_runcondition)
+ break;
+
+ index = foreach_current_index(l);
+ }
+ }
+
+ i=0;
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
- List *window_pathkeys;
int presorted_keys;
bool is_sorted;
bool topwindow;
@@ -4457,6 +4495,34 @@ create_one_window_path(PlannerInfo *root,
path->pathkeys,
&presorted_keys);
+ /*
+ * If not sorted already and the query has an ORDER BY clause, then we
+ * try to push the entire ORDER BY below the final WindowAgg. We
+ * don't do this when the query has a DISTINCT clause as the planner
+ * might opt to do a Hash Aggregate and spoil our efforts here. We
+ * also can't do this when the ORDER BY contains any WindowFuncs as we
+ * obviously can't sort something that's yet to be evaluated. We also
+ * don't bother with this when there is a WindowClauses with a
+ * runCondition as those can reduce the number of rows to sort in the
+ * ORDER BY and we don't want add complexity to the WindowAgg sort if
+ * the final ORDER BY sort is going to have far fewer rows to sort.
+ * For the same reason, don't bother if the query has a LIMIT clause.
+ */
+ if (try_sort_pushdown && !is_sorted && (i == index))
+ {
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ window_pathkeys = orderby_pathkeys;
+
+ is_sorted = pathkeys_count_contained_in(window_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ }
+ i++;
+
/* Sort if necessary */
if (!is_sorted)
{
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 529a382d28..82d108f672 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -378,6 +378,20 @@ lnext(const List *l, const ListCell *c)
(cell = NULL, false); \
cell##__state.i++)
+/*
+ * foreach_reverse -
+ * a convenience macro for looping through a list in reverse
+ *
+ * This is equivalent to foreach, only in reverse
+ */
+#define foreach_reverse(cell, lst) \
+ for (ForEachState cell##__state = {(lst), cell##__state.l->length-1}; \
+ (cell##__state.l != NIL && \
+ cell##__state.i >= 0) ? \
+ (cell = &cell##__state.l->elements[cell##__state.i], true) : \
+ (cell = NULL, false); \
+ cell##__state.i--)
+
/*
* foreach_delete_current -
* delete the current list element from the List associated with a
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..d83fea8091 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,194 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(11 rows)
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-------------------------------------------------------
+ WindowAgg
+ -> Incremental Sort
+ Sort Key: depname, empno, enroll_date
+ Presorted Key: depname
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------
+ Sort
+ Sort Key: empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(6 rows)
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(10 rows)
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..e989e93737 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,8 +1324,102 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
--
2.37.2
On 1/8/23 18:05, Ankit Kumar Pandey wrote:
On 08/01/23 21:36, Vik Fearing wrote:
On 1/8/23 11:21, Ankit Kumar Pandey wrote:
Please find attached patch with addressed issues mentioned before.
I am curious about this plan:
+-- ORDER BY's in multiple Window functions can be combined into one +-- if they are subset of QUERY's ORDER BY +EXPLAIN (COSTS OFF) +SELECT empno, + depname, + min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary, + sum(salary) OVER (PARTITION BY depname) depsalary, + count(*) OVER (ORDER BY enroll_date DESC) c +FROM empsalary +ORDER BY depname, empno, enroll_date; + QUERY PLAN +------------------------------------------------------ + WindowAgg + -> WindowAgg + -> Sort + Sort Key: depname, empno, enroll_date + -> WindowAgg + -> Sort + Sort Key: enroll_date DESC + -> Seq Scan on empsalary +(8 rows) +Why aren't min() and sum() calculated on the same WindowAgg run?
Isn't that exactly what is happening here? First count() with sort on
enroll_date is run andthen min() and sum()?
No, there are two passes over the window for those two but I don't see
that there needs to be.
Only difference between this and plan generated by master(given below)
is a sort in the end.
Then this is probably not this patch's job to fix.
--
Vik Fearing
On Mon, 9 Jan 2023 at 05:06, Vik Fearing <vik@postgresfriends.org> wrote:
+EXPLAIN (COSTS OFF) +SELECT empno, + depname, + min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary, + sum(salary) OVER (PARTITION BY depname) depsalary, + count(*) OVER (ORDER BY enroll_date DESC) c +FROM empsalary +ORDER BY depname, empno, enroll_date; + QUERY PLAN +------------------------------------------------------ + WindowAgg + -> WindowAgg + -> Sort + Sort Key: depname, empno, enroll_date + -> WindowAgg + -> Sort + Sort Key: enroll_date DESC + -> Seq Scan on empsalary
Why aren't min() and sum() calculated on the same WindowAgg run?
We'd need to have an ORDER BY per WindowFunc rather than per
WindowClause to do that. The problem is when there is no ORDER BY,
all rows are peers.
Likely there likely are a bunch more optimisations we could do in that
area. I think all the builtin window functions (not aggregates being
used as window functions) don't care about peer rows, so it may be
possible to merge the WindowClauses when the WIndowClause being merged
only has window functions that don't care about peer rows. Not for
this patch though.
David
On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
I have addressed all points 1-6 in the attached patch.
A few more things:
1. You're still using the 'i' variable in the foreach loop.
foreach_current_index() will work.
2. I think the "index" variable needs a better name. sort_pushdown_idx maybe.
3. I don't think you need to set "index" on every loop. Why not just
set it to foreach_current_index(l) - 1; break;
4. You're still setting orderby_pathkeys in the foreach loop. That's
already been set above and it won't have changed.
5. I don't think there's any need to check pathkeys_contained_in() in
the foreach loop anymore. With #3 the index will be -1 if the
optimisation cannot apply. You could likely also get rid of
try_sort_pushdown too and just make the condition "if
(sort_pushdown_idx == foreach_current_index(l))". I'm a little unsure
why there's still the is_sorted check there. Shouldn't that always be
false now that you're looping until the pathkeys don't match in the
foreach_reverse loop?
Correct me if I'm wrong as I've not tested, but I think the new code
in the foreach loop can just become:
if (sort_pushdown_idx == foreach_current_index(l))
{
Assert(!is_sorted);
window_pathkeys = orderby_pathkeys;
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys, &presorted_keys);
}
I have one doubt regarding runCondition, do we only need to ensure
that window function which has subset sort clause of main query should
not have runCondition or none of the window functions should not contain
runCondition? I have gone with later case but wanted to clarify.
Actually, maybe it's ok just to check the top-level WindowClause for
runConditions. It's only that one that'll filter rows. That probably
simplifies the code quite a bit. Lower-level runConditions only serve
to halt the evaluation of WindowFuncs when the runCondition is no
longer met.
Also, do you want to have a go at coding up the sort bound pushdown
too? It'll require removing the limitCount restriction and adjusting
ExecSetTupleBound() to recurse through a WindowAggState. I think it's
pretty easy. You could try it then play around with it to make sure it
works and we get the expected performance.I tried this in the patch but kept getting `retrieved too many tuples in
a bounded sort`.Added following code in ExecSetTupleBound which correctly found sortstate
and set bound value.
else if(IsA(child_node, WindowAggState))
{
WindowAggState *winstate = (WindowAggState *) child_node;
if (outerPlanState(winstate))
ExecSetTupleBound(tuples_needed, outerPlanState(winstate));
}
I think problem is that are not using limit clause inside window
function (which
may need to scan all tuples) so passing bound value to
WindowAggState->sortstate
is not working as we might expect. Or maybe I am getting it wrong? I was
trying to
have top-N sort for limit clause if orderby pushdown happens.
hmm, perhaps the Limit would have to be put between the WindowAgg and
Sort for it to work. Maybe that's more complexity than it's worth.
David
On 09/01/23 03:48, David Rowley wrote:
On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
I have addressed all points 1-6 in the attached patch.
A few more things:
1. You're still using the 'i' variable in the foreach loop.
foreach_current_index() will work.
2. I think the "index" variable needs a better name. sort_pushdown_idx maybe.
Done these (1 & 2)
3. I don't think you need to set "index" on every loop. Why not just
set it to foreach_current_index(l) - 1; break;
Consider this query
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary,
count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary
ORDER BY depname, empno, enroll_date;
Here, W1 = min(salary) OVER (PARTITION BY depname ORDER BY empno) W2 =
sum(salary) OVER (PARTITION BY depname)
W3 = count(*) OVER (ORDER BY enroll_date DESC)
(1,2,3 are winref).
activeWindows = [W3, W1, W2]
If we iterate from reverse and break at first occurrence, we will
break at W2 and add extra keys there, but what we want it to add
keys at W1 so that it gets spilled to W2 (as existing logic is designed to
carry over sorted cols first to last).
4. You're still setting orderby_pathkeys in the foreach loop. That's
already been set above and it won't have changed.
5. I don't think there's any need to check pathkeys_contained_in() in
the foreach loop anymore. With #3 the index will be -1 if the
optimisation cannot apply. You could likely also get rid of
try_sort_pushdown too and just make the condition "if
(sort_pushdown_idx == foreach_current_index(l))".
Done this.
Added pathkeys_contained_in as an assert, hope that's okay.
I'm a little unsure why there's still the is_sorted check there.
Shouldn't that always be false now that you're looping until the pathkeys
don't match in the foreach_reverse loop?
Removing is_sorted causes issue if there is matching pathkey which is
presorted
eg this case
-- Do not perform sort pushdown if column is presorted
CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;
EXPLAIN (COSTS OFF)
SELECT empno,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;
We can move this to if (try_sort_pushdown) block but it looks to me bit
ugly.
Nevertheless, it make sense to have it here, sort_pushdown_index should
point to exact
window function which needs to be modified. Having extra check (for
is_sorted) in 2nd foreach loop
adds ambiguity if we don't add it in first check.
foreach_reverse(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
orderby_pathkeys = make_pathkeys_for_sortclauses(root,root->parse->sortClause,root->processed_tlist);
window_pathkeys = make_pathkeys_for_window(root,wc,root->processed_tlist);
is_sorted = pathkeys_count_contained_in(window_pathkeys,path->pathkeys,&presorted_keys);
has_runcondition |= (wc->runCondition != NIL);
if (!pathkeys_contained_in(window_pathkeys, orderby_pathkeys) || has_runcondition)
break;
if(!is_sorted)
sort_pushdown_idx = foreach_current_index(l);
}
Tests passes on this so logically it is ok.
Correct me if I'm wrong as I've not tested, but I think the new code
in the foreach loop can just become:if (sort_pushdown_idx == foreach_current_index(l))
{
Assert(!is_sorted);
window_pathkeys = orderby_pathkeys;
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys, &presorted_keys);
}
Depending on where we have is_sorted (as mentioned above) it looks a lot
like you mentioned.
Also, we can add Assert(pathkeys_contained_in(window_pathkeys,
orderby_pathkeys))
I have one doubt regarding runCondition, do we only need to ensure
that window function which has subset sort clause of main query should
not have runCondition or none of the window functions should not contain
runCondition? I have gone with later case but wanted to clarify.
Actually, maybe it's ok just to check the top-level WindowClause for
runConditions. It's only that one that'll filter rows. That probably
simplifies the code quite a bit. Lower-level runConditions only serve
to halt the evaluation of WindowFuncs when the runCondition is no
longer met.
Okay, then this approach makes sense.
hmm, perhaps the Limit would have to be put between the WindowAgg and
Sort for it to work. Maybe that's more complexity than it's worth.
Yes, not specific to this change. It is more around allowing top-N sort in
window functions (in general). Once we have it there, then this could be
taken care of.
I have attached patch which fixes 1 & 2 and rearrange is_sorted.
Point #3 needs to be resolved (and perhaps another way to handle is_sorted)
Thanks,
--
Regards,
Ankit Kumar Pandey
Attachments:
better_windowclause_sorting_dgr-v5.patchtext/x-patch; charset=UTF-8; name=better_windowclause_sorting_dgr-v5.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..7e7f95d929 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4423,6 +4423,14 @@ create_one_window_path(PlannerInfo *root,
PathTarget *window_target;
ListCell *l;
List *topqual = NIL;
+ bool has_runcondition = false;
+
+ List *window_pathkeys;
+ List *orderby_pathkeys = NIL;
+ int sort_pushdown_idx = -1;
+ int presorted_keys;
+ bool is_sorted;
+ bool try_sort_pushdown = false;
/*
* Since each window clause could require a different sort order, we stack
@@ -4441,12 +4449,43 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
+ try_sort_pushdown = (root->parse->distinctClause == NIL &&
+ root->parse->sortClause != NIL &&
+ root->parse->limitCount == NULL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(root->parse->sortClause,
+ root->processed_tlist)));
+
+
+ if (try_sort_pushdown)
+ {
+ /*
+ * Search for the window function whose path keys are
+ * subset of orderby_path keys, this allows us to perform
+ * order by pushdown from this position of window function onwards
+ */
+ foreach_reverse(l, activeWindows)
+ {
+ WindowClause *wc = lfirst_node(WindowClause, l);
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ root->parse->sortClause,
+ root->processed_tlist);
+ window_pathkeys = make_pathkeys_for_window(root,
+ wc,
+ root->processed_tlist);
+ is_sorted = pathkeys_count_contained_in(window_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ has_runcondition |= (wc->runCondition != NIL);
+ if (!pathkeys_contained_in(window_pathkeys, orderby_pathkeys) || has_runcondition)
+ break;
+ if (!is_sorted)
+ sort_pushdown_idx = foreach_current_index(l);
+ }
+ }
+
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
- List *window_pathkeys;
- int presorted_keys;
- bool is_sorted;
bool topwindow;
window_pathkeys = make_pathkeys_for_window(root,
@@ -4457,6 +4496,30 @@ create_one_window_path(PlannerInfo *root,
path->pathkeys,
&presorted_keys);
+ /*
+ * If not sorted already and the query has an ORDER BY clause, then we
+ * try to push the entire ORDER BY below the final WindowAgg. We
+ * don't do this when the query has a DISTINCT clause as the planner
+ * might opt to do a Hash Aggregate and spoil our efforts here. We
+ * also can't do this when the ORDER BY contains any WindowFuncs as we
+ * obviously can't sort something that's yet to be evaluated. We also
+ * don't bother with this when there is a WindowClauses with a
+ * runCondition as those can reduce the number of rows to sort in the
+ * ORDER BY and we don't want add complexity to the WindowAgg sort if
+ * the final ORDER BY sort is going to have far fewer rows to sort.
+ * For the same reason, don't bother if the query has a LIMIT clause.
+ */
+ if (foreach_current_index(l) == sort_pushdown_idx)
+ {
+ Assert(!is_sorted);
+ Assert(pathkeys_contained_in(window_pathkeys, orderby_pathkeys));
+
+ window_pathkeys = orderby_pathkeys;
+ is_sorted = pathkeys_count_contained_in(window_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ }
+
/* Sort if necessary */
if (!is_sorted)
{
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 529a382d28..82d108f672 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -378,6 +378,20 @@ lnext(const List *l, const ListCell *c)
(cell = NULL, false); \
cell##__state.i++)
+/*
+ * foreach_reverse -
+ * a convenience macro for looping through a list in reverse
+ *
+ * This is equivalent to foreach, only in reverse
+ */
+#define foreach_reverse(cell, lst) \
+ for (ForEachState cell##__state = {(lst), cell##__state.l->length-1}; \
+ (cell##__state.l != NIL && \
+ cell##__state.i >= 0) ? \
+ (cell = &cell##__state.l->elements[cell##__state.i], true) : \
+ (cell = NULL, false); \
+ cell##__state.i--)
+
/*
* foreach_delete_current -
* delete the current list element from the List associated with a
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..d83fea8091 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,194 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(11 rows)
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-------------------------------------------------------
+ WindowAgg
+ -> Incremental Sort
+ Sort Key: depname, empno, enroll_date
+ Presorted Key: depname
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------
+ Sort
+ Sort Key: empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(6 rows)
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(10 rows)
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..e989e93737 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,8 +1324,102 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
--
2.37.2
On Mon, 9 Jan 2023 at 21:34, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 09/01/23 03:48, David Rowley wrote:
3. I don't think you need to set "index" on every loop. Why not justset it to foreach_current_index(l) - 1; break;
Consider this query
EXPLAIN (COSTS OFF)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary,
count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary
ORDER BY depname, empno, enroll_date;Here, W1 = min(salary) OVER (PARTITION BY depname ORDER BY empno) W2 =
sum(salary) OVER (PARTITION BY depname)W3 = count(*) OVER (ORDER BY enroll_date DESC)
(1,2,3 are winref).
activeWindows = [W3, W1, W2]
If we iterate from reverse and break at first occurrence, we will
break at W2 and add extra keys there, but what we want it to add
keys at W1 so that it gets spilled to W2 (as existing logic is designed to
carry over sorted cols first to last).
We need to keep looping backwards until we find the first WindowClause
which does not contain the pathkeys of the ORDER BY. When we find a
WindowClause that does not contain the pathkeys of the ORDER BY, then
we must set the sort_pushdown_idx to the index of the prior
WindowClause. I couldn't quite understand why the foreach() loop's
condition couldn't just be "if (foreach_current_index(l) ==
sort_pushdown_idx)", but I see that if we don't check if the path is
already correctly sorted that we could end up pushing the sort down
onto the path that's already correctly sorted. We decided we didn't
want to move the sort around if it does not reduce the amount of
sorting.
I had to try this out for myself and I've ended up with the attached
v6 patch. All the tests you added still pass. Although, I didn't
really study the tests yet to see if everything we talked about is
covered.
It turned out the sort_pushdown_idx = foreach_current_index(l) - 1;
break; didn't work as if all the WindowClauses have pathkeys contained
in the order by pathkeys then we don't ever set sort_pushdown_idx. I
adjusted it to do:
if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
sort_pushdown_idx = foreach_current_index(l);
else
break;
I also fixed up the outdated comments and changed it so we only set
orderby_pathkeys once instead of once per loop in the
foreach_reverse() loop.
I gave some thought to whether doing foreach_delete_current() is safe
within a foreach_reverse() loop. I didn't try it, but I couldn't see
any reason why not. It is pretty late here and I'd need to test that
to be certain. If it turns out not to be safe then we need to document
that fact in the comments of the foreach_reverse() macro and the
foreach_delete_current() macro.
David
Attachments:
better_windowclause_sorting_dgr-v6.patchtext/plain; charset=US-ASCII; name=better_windowclause_sorting_dgr-v6.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..cf59ebfe87 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4421,8 +4421,14 @@ create_one_window_path(PlannerInfo *root,
List *activeWindows)
{
PathTarget *window_target;
+ Query *parse = root->parse;
ListCell *l;
List *topqual = NIL;
+ List *window_pathkeys;
+ List *orderby_pathkeys = NIL;
+ int sort_pushdown_idx = -1;
+ int presorted_keys;
+ bool is_sorted;
/*
* Since each window clause could require a different sort order, we stack
@@ -4441,17 +4447,99 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
+ /*
+ * Here we attempt to minimize the number of sorts which must be performed
+ * for queries with an ORDER BY clause.
+ *
+ * It's possible due to select_active_windows() putting the WindowClauses
+ * with the lowest tleSortGroupRef last in the activeWindows list that the
+ * final WindowClause has a subset of the sort requirements that the
+ * query's ORDER BY clause has. Below we try to detect this case and if
+ * we find this is true, we consider adjusting the sort that's done for
+ * WindowAggs and include the additional clauses so that no additional
+ * sorting is required for the query's ORDER BY clause.
+ *
+ * We don't try this optimization for the following cases:
+ *
+ * 1. If the query has a DISTINCT clause. We needn't waste any
+ * additional effort for the more strict sort order as if DISTINCT
+ * is done via Hash Aggregate then that's going to undo this work.
+ *
+ * 2. If the query has a LIMIT clause. The top-level sort will be
+ * able to use a top-n sort which might be more efficient than
+ * sorting by the additional columns. If the LIMIT does not reduce
+ * the number of rows significantly then this might not be true, but
+ * we don't try to consider that here.
+ *
+ * 3. If the top-level WindowClause has a runCondition then this can
+ * filter out tuples and make the final sort cheaper. If we pushed
+ * the sort down below the WindowAgg then we'd need to sort all rows
+ * including ones that the runCondition might filter. This may waste
+ * effort so we just don't try to push down the sort for this case.
+ *
+ * 4. If the ORDER BY contains any expressions containing WindowFuncs
+ * then we can't push down the sort as these obviously must be
+ * evaluated before they can be sorted.
+ */
+ if (parse->sortClause != NIL && parse->distinctClause == NIL &&
+ parse->limitCount == NULL &&
+ linitial_node(WindowClause, activeWindows)->runCondition == NIL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(parse->sortClause,
+ root->processed_tlist)))
+ {
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ parse->sortClause,
+ root->processed_tlist);
+
+ /*
+ * Loop backwards over the WindowClauses looking for the first
+ * WindowClause which has pathkeys not contained in the
+ * orderby_pathkeys. What we're looking for here is the lowest level
+ * that we push the additional sort requirements down into. If we
+ * fail here then sort_pushdown_idx will remain at -1.
+ */
+ foreach_reverse(l, activeWindows)
+ {
+ WindowClause *wc = lfirst_node(WindowClause, l);
+
+ window_pathkeys = make_pathkeys_for_window(root,
+ wc,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ sort_pushdown_idx = foreach_current_index(l);
+ else
+ break;
+ }
+ }
+
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
- List *window_pathkeys;
- int presorted_keys;
- bool is_sorted;
bool topwindow;
window_pathkeys = make_pathkeys_for_window(root,
- wc,
- root->processed_tlist);
+ wc,
+ root->processed_tlist);
+
+ /*
+ * When we get to the sort_pushdown_idx WindowClause, if the path is
+ * not already correctly sorted, then just use the pathkeys for the
+ * query's ORDER BY clause instead of the WindowClause's pathkeys.
+ * When the path is already correctly sorted there's no point in
+ * adjusting the pathkeys as this just moves the sort down without
+ * actually reducing the number of sorts which are required in the
+ * plan overall. sort_pushdown_idx will be -1 and never match when
+ * this optimization was disabled above.
+ */
+ if (foreach_current_index(l) == sort_pushdown_idx &&
+ !pathkeys_contained_in(window_pathkeys, path->pathkeys))
+ {
+ /* check to make sure sort_pushdown_idx was set correctly */
+ Assert(pathkeys_contained_in(window_pathkeys, orderby_pathkeys));
+
+ window_pathkeys = orderby_pathkeys;
+ }
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys,
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 529a382d28..0de9d73dac 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -378,14 +378,30 @@ lnext(const List *l, const ListCell *c)
(cell = NULL, false); \
cell##__state.i++)
+/*
+ * foreach_reverse -
+ * a convenience macro for looping through a list in reverse
+ *
+ * This is equivalent to foreach, only it loops over the list starting with
+ * the last element and ends on the first element.
+ */
+#define foreach_reverse(cell, lst) \
+ for (ForEachState cell##__state = {(lst), cell##__state.l->length - 1}; \
+ (cell##__state.l != NIL && \
+ cell##__state.i >= 0) ? \
+ (cell = &cell##__state.l->elements[cell##__state.i], true) : \
+ (cell = NULL, false); \
+ cell##__state.i--)
+
/*
* foreach_delete_current -
* delete the current list element from the List associated with a
- * surrounding foreach() loop, returning the new List pointer.
+ * surrounding foreach() or foreach_reverse() loop, returning the new List
+* pointer.
*
* This is equivalent to list_delete_cell(), but it also adjusts the foreach
- * loop's state so that no list elements will be missed. Do not delete
- * elements from an active foreach loop's list in any other way!
+ * or foreach_reverse loop's state so that no list elements will be missed.
+ * Do not delete elements from an active foreach loop's list in any other way!
*/
#define foreach_delete_current(lst, cell) \
(cell##__state.i--, \
@@ -393,8 +409,9 @@ lnext(const List *l, const ListCell *c)
/*
* foreach_current_index -
- * get the zero-based list index of a surrounding foreach() loop's
- * current element; pass the name of the "ListCell *" iterator variable.
+ * get the zero-based index within the list of the current element in the
+ * surrounding foreach() or foreach_reverse() loop; pass the name of the
+ * "ListCell *" iterator variable.
*
* Beware of using this after foreach_delete_current(); the value will be
* out of sync for the rest of the current loop iteration. Anyway, since
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..d83fea8091 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,194 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(11 rows)
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-------------------------------------------------------
+ WindowAgg
+ -> Incremental Sort
+ Sort Key: depname, empno, enroll_date
+ Presorted Key: depname
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------
+ Sort
+ Sort Key: empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(6 rows)
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(10 rows)
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..e989e93737 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,8 +1324,102 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
On 09/01/23 17:53, David Rowley wrote:
We need to keep looping backwards until we find the first WindowClause
which does not contain the pathkeys of the ORDER BY.
I found the cause of confusion, *first* WindowClause means first from
forward direction. Since we are looping from backward, I took it first
from the last.
eg:
select count(*) over (order by a), count(*) over (order by a,b),
count(*) over (order by a,b,c) from abcd order by a,b,c,d;
first window clause is count(*) over (order by a) which we are using for
order-by pushdown.
This is in sync with implementation as well.
I couldn't quite understand why the foreach() loop's
condition couldn't just be "if (foreach_current_index(l) ==
sort_pushdown_idx)", but I see that if we don't check if the path is
already correctly sorted that we could end up pushing the sort down
onto the path that's already correctly sorted. We decided we didn't
want to move the sort around if it does not reduce the amount of
sorting.
Yes, this was the reason, the current patch handles this without is_sort
now, which is great.
All the tests you added still pass. Although, I didn't
really study the tests yet to see if everything we talked about is
covered.
It covers general cases and exceptions. Also, I did few additional
tests. Looked good.
It turned out the sort_pushdown_idx = foreach_current_index(l) - 1;
break; didn't work as if all the WindowClauses have pathkeys contained
in the order by pathkeys then we don't ever set sort_pushdown_idx. I
adjusted it to do:
if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
sort_pushdown_idx = foreach_current_index(l);
else
break;
Yes, that would have been problematic. I have verified this case
and on related note, I have added a test case that ensure order by pushdown
shouldn't happen if window function's order by is superset of query's
order by.
I also fixed up the outdated comments and changed it so we only set
orderby_pathkeys once instead of once per loop in the
foreach_reverse() loop.
Thanks, code look a lot neater now (is_sorted is gone and handled in a
better way).
I gave some thought to whether doing foreach_delete_current() is safe
within a foreach_reverse() loop. I didn't try it, but I couldn't see
any reason why not. It is pretty late here and I'd need to test that
to be certain. If it turns out not to be safe then we need to document
that fact in the comments of the foreach_reverse() macro and the
foreach_delete_current() macro.
I tested foreach_delete_current inside foreach_reverse loop.
It worked fine.
I have attached patch with one extra test case (as mentioned above).
Rest of the changes are looking fine.
Ran pgbench again and optimized version still had a lead (168 tps vs 135
tps) in performance.
Do we have any pending items for this patch now?
--
Regards,
Ankit Kumar Pandey
Attachments:
better_windowclause_sorting_dgr-v7.patchtext/x-patch; charset=UTF-8; name=better_windowclause_sorting_dgr-v7.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 000d757bdd..cf59ebfe87 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4421,8 +4421,14 @@ create_one_window_path(PlannerInfo *root,
List *activeWindows)
{
PathTarget *window_target;
+ Query *parse = root->parse;
ListCell *l;
List *topqual = NIL;
+ List *window_pathkeys;
+ List *orderby_pathkeys = NIL;
+ int sort_pushdown_idx = -1;
+ int presorted_keys;
+ bool is_sorted;
/*
* Since each window clause could require a different sort order, we stack
@@ -4441,17 +4447,99 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
+ /*
+ * Here we attempt to minimize the number of sorts which must be performed
+ * for queries with an ORDER BY clause.
+ *
+ * It's possible due to select_active_windows() putting the WindowClauses
+ * with the lowest tleSortGroupRef last in the activeWindows list that the
+ * final WindowClause has a subset of the sort requirements that the
+ * query's ORDER BY clause has. Below we try to detect this case and if
+ * we find this is true, we consider adjusting the sort that's done for
+ * WindowAggs and include the additional clauses so that no additional
+ * sorting is required for the query's ORDER BY clause.
+ *
+ * We don't try this optimization for the following cases:
+ *
+ * 1. If the query has a DISTINCT clause. We needn't waste any
+ * additional effort for the more strict sort order as if DISTINCT
+ * is done via Hash Aggregate then that's going to undo this work.
+ *
+ * 2. If the query has a LIMIT clause. The top-level sort will be
+ * able to use a top-n sort which might be more efficient than
+ * sorting by the additional columns. If the LIMIT does not reduce
+ * the number of rows significantly then this might not be true, but
+ * we don't try to consider that here.
+ *
+ * 3. If the top-level WindowClause has a runCondition then this can
+ * filter out tuples and make the final sort cheaper. If we pushed
+ * the sort down below the WindowAgg then we'd need to sort all rows
+ * including ones that the runCondition might filter. This may waste
+ * effort so we just don't try to push down the sort for this case.
+ *
+ * 4. If the ORDER BY contains any expressions containing WindowFuncs
+ * then we can't push down the sort as these obviously must be
+ * evaluated before they can be sorted.
+ */
+ if (parse->sortClause != NIL && parse->distinctClause == NIL &&
+ parse->limitCount == NULL &&
+ linitial_node(WindowClause, activeWindows)->runCondition == NIL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(parse->sortClause,
+ root->processed_tlist)))
+ {
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ parse->sortClause,
+ root->processed_tlist);
+
+ /*
+ * Loop backwards over the WindowClauses looking for the first
+ * WindowClause which has pathkeys not contained in the
+ * orderby_pathkeys. What we're looking for here is the lowest level
+ * that we push the additional sort requirements down into. If we
+ * fail here then sort_pushdown_idx will remain at -1.
+ */
+ foreach_reverse(l, activeWindows)
+ {
+ WindowClause *wc = lfirst_node(WindowClause, l);
+
+ window_pathkeys = make_pathkeys_for_window(root,
+ wc,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ sort_pushdown_idx = foreach_current_index(l);
+ else
+ break;
+ }
+ }
+
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
- List *window_pathkeys;
- int presorted_keys;
- bool is_sorted;
bool topwindow;
window_pathkeys = make_pathkeys_for_window(root,
- wc,
- root->processed_tlist);
+ wc,
+ root->processed_tlist);
+
+ /*
+ * When we get to the sort_pushdown_idx WindowClause, if the path is
+ * not already correctly sorted, then just use the pathkeys for the
+ * query's ORDER BY clause instead of the WindowClause's pathkeys.
+ * When the path is already correctly sorted there's no point in
+ * adjusting the pathkeys as this just moves the sort down without
+ * actually reducing the number of sorts which are required in the
+ * plan overall. sort_pushdown_idx will be -1 and never match when
+ * this optimization was disabled above.
+ */
+ if (foreach_current_index(l) == sort_pushdown_idx &&
+ !pathkeys_contained_in(window_pathkeys, path->pathkeys))
+ {
+ /* check to make sure sort_pushdown_idx was set correctly */
+ Assert(pathkeys_contained_in(window_pathkeys, orderby_pathkeys));
+
+ window_pathkeys = orderby_pathkeys;
+ }
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys,
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 529a382d28..0de9d73dac 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -378,14 +378,30 @@ lnext(const List *l, const ListCell *c)
(cell = NULL, false); \
cell##__state.i++)
+/*
+ * foreach_reverse -
+ * a convenience macro for looping through a list in reverse
+ *
+ * This is equivalent to foreach, only it loops over the list starting with
+ * the last element and ends on the first element.
+ */
+#define foreach_reverse(cell, lst) \
+ for (ForEachState cell##__state = {(lst), cell##__state.l->length - 1}; \
+ (cell##__state.l != NIL && \
+ cell##__state.i >= 0) ? \
+ (cell = &cell##__state.l->elements[cell##__state.i], true) : \
+ (cell = NULL, false); \
+ cell##__state.i--)
+
/*
* foreach_delete_current -
* delete the current list element from the List associated with a
- * surrounding foreach() loop, returning the new List pointer.
+ * surrounding foreach() or foreach_reverse() loop, returning the new List
+* pointer.
*
* This is equivalent to list_delete_cell(), but it also adjusts the foreach
- * loop's state so that no list elements will be missed. Do not delete
- * elements from an active foreach loop's list in any other way!
+ * or foreach_reverse loop's state so that no list elements will be missed.
+ * Do not delete elements from an active foreach loop's list in any other way!
*/
#define foreach_delete_current(lst, cell) \
(cell##__state.i--, \
@@ -393,8 +409,9 @@ lnext(const List *l, const ListCell *c)
/*
* foreach_current_index -
- * get the zero-based list index of a surrounding foreach() loop's
- * current element; pass the name of the "ListCell *" iterator variable.
+ * get the zero-based index within the list of the current element in the
+ * surrounding foreach() or foreach_reverse() loop; pass the name of the
+ * "ListCell *" iterator variable.
*
* Beware of using this after foreach_delete_current(); the value will be
* out of sync for the rest of the current loop iteration. Anyway, since
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 90e89fb5b6..fe0f8a039b 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3982,7 +3982,210 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(11 rows)
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-------------------------------------------------------
+ WindowAgg
+ -> Incremental Sort
+ Sort Key: depname, empno, enroll_date
+ Presorted Key: depname
+ -> Index Scan using depname_idx on empsalary
+(5 rows)
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------
+ Sort
+ Sort Key: empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(6 rows)
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(10 rows)
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
+-- Case #4: When WindowFunc ORDER BY is superset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname;
+ QUERY PLAN
+-----------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno
+ -> Seq Scan on empsalary
+(5 rows)
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..665e156920 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,8 +1324,110 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+-- Do not perform sort pushdown if column is presorted
+CREATE INDEX depname_idx ON empsalary(depname);
+SET enable_seqscan=0;
+
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Do perform sort pushdown if columns are only partially sorted
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+DROP INDEX depname_idx;
+RESET enable_seqscan;
RESET enable_hashagg;
+-- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
+-- QUERY's ORDER BY to save additional sorting in the end
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (ORDER BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Same as above, but PARTITION BY clause is included as well
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Negation of above, when sort optimization is not needed to be performed
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY empno, enroll_date;
+
+-- ORDER BY's in multiple Window functions can be combined into one
+-- if they are subset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- order of window function shouldn't affect query's ORDER BY pushdown
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- SORT OPTIMIZATION (as in cases above) is not needed to be performed
+-- in cases, as mentioned below.
+
+-- Case #1: When distinct clause is included
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: When Limit clause is specified
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname LIMIT 1;
+
+-- Case #3: When ORDER BY contains any WindowFuncs
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
+-- Case #4: When WindowFunc ORDER BY is superset of QUERY's ORDER BY
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname;
+
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
--
2.37.2
On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Do we have any pending items for this patch now?
I'm just wondering if not trying this when the query has a DISTINCT
clause is a copout. What I wanted to avoid was doing additional
sorting work for WindowAgg just to have it destroyed by Hash
Aggregate. I'm now wondering if adding both the original
slightly-less-sorted path plus the new slightly-more-sorted path then
if distinct decides to Hash Aggregate then it'll still be able to pick
the cheapest input path to do that on. Unfortunately, our sort
costing just does not seem to be advanced enough to know that sorting
by fewer columns might be cheaper, so adding the additional path is
likely just going to result in add_path() ditching the old
slightly-less-sorted path due to the new slightly-more-sorted path
having better pathkeys. So, we'd probably be wasting our time if we
added both paths with the current sort costing code.
# explain analyze select * from pg_Class order by relkind,relname;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Sort (cost=36.01..37.04 rows=412 width=273) (actual
time=0.544..0.567 rows=412 loops=1)
Sort Key: relkind, relname
Sort Method: quicksort Memory: 109kB
-> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
Planning Time: 0.152 ms
Execution Time: 0.629 ms
(6 rows)
# explain analyze select * from pg_Class order by relkind;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Sort (cost=36.01..37.04 rows=412 width=273) (actual
time=0.194..0.218 rows=412 loops=1)
Sort Key: relkind
Sort Method: quicksort Memory: 109kB
-> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
Planning Time: 0.143 ms
Execution Time: 0.278 ms
(6 rows)
the total cost is the same for both of these, but the execution time
seems to vary quite a bit.
Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys. However, if the ORDER
BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.
Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.
David
David Rowley <dgrowleyml@gmail.com> writes:
Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.
It's trickier than it looks, cf f4c7c410e. But if you just want
to add a small correction based on number of columns being sorted
by, that seems within reach. See the comment for cost_sort though.
Also, I suppose for incremental sorts we'd want to consider only
the number of newly-sorted columns, but I'm not sure if that info
is readily at hand either.
regards, tom lane
On 10/01/23 10:53, David Rowley wrote:
On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Do we have any pending items for this patch now?
I'm just wondering if not trying this when the query has a DISTINCT
clause is a copout. What I wanted to avoid was doing additional
sorting work for WindowAgg just to have it destroyed by Hash
Aggregate. I'm now wondering if adding both the original
slightly-less-sorted path plus the new slightly-more-sorted path then
if distinct decides to Hash Aggregate then it'll still be able to pick
the cheapest input path to do that on. Unfortunately, our sort
costing just does not seem to be advanced enough to know that sorting
by fewer columns might be cheaper, so adding the additional path is
likely just going to result in add_path() ditching the old
slightly-less-sorted path due to the new slightly-more-sorted path
having better pathkeys. So, we'd probably be wasting our time if we
added both paths with the current sort costing code.
Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys. However, if the ORDER
BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.
Let me take a stab at this. Depending on complexity, we can take
a call to address this in current patch or a follow up.
--
Regards,
Ankit Kumar Pandey
On 10/01/23 10:53, David Rowley wrote:
the total cost is the same for both of these, but the execution time
seems to vary quite a bit.
This is really weird, I tried it different ways (to rule out any issues
due to
caching) and execution time varied in spite of having same cost.
Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys.
However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.
Isn't order by pathkeys are always fewer than distinct pathkeys?
distinct pathkeys = order by pathkeys + window functions pathkeys
Again, I got your point which that it is okay to pushdown order by clause
if distinct is doing unique sort. But problem is (atleast from what I am
facing),
distinct is not caring about pushed down sortkeys, it goes with hashagg
or unique
with some other logic (mentioned below).
Consider following (with distinct clause restriction removed)
if (parse->distinctClause)
{
List* distinct_pathkeys = make_pathkeys_for_sortclauses(root, parse->distinctClause, root->processed_tlist);
if (!compare_pathkeys(distinct_pathkeys, orderby_pathkeys)==1) // distinct key > order by key
skip = true; // this is used to skip order by pushdown
}
CASE #1:
explain (costs off) select distinct a,b, min(a) over (partition by a), sum (a) over (partition by a) from abcd order by a,b;
QUERY PLAN
-----------------------------------------------------------
Sort
Sort Key: a, b
-> HashAggregate
Group Key: a, b, min(a) OVER (?), sum(a) OVER (?)
-> WindowAgg
-> Sort
Sort Key: a, b
-> Seq Scan on abcd
(8 rows)
explain (costs off) select distinct a,b,c, min(a) over (partition by a), sum (a) over (partition by a) from abcd order by a,b,c;
QUERY PLAN
--------------------------------------------------------------
Sort
Sort Key: a, b, c
-> HashAggregate
Group Key: a, b, c, min(a) OVER (?), sum(a) OVER (?)
-> WindowAgg
-> Sort
Sort Key: a, b, c
-> Seq Scan on abcd
(8 rows)
No matter how many columns are pushed down, it does hashagg.
On the other hand:
CASE #2:
EXPLAIN (costs off) SELECT DISTINCT depname, empno, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno;
QUERY PLAN
----------------------------------------------------------------------------------
Unique
-> Sort
Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?))
-> WindowAgg
-> Sort
Sort Key: depname, empno
-> Seq Scan on empsalary
(7 rows)
EXPLAIN (costs off) SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Unique
-> Sort
Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), (sum(salary) OVER (?))
-> WindowAgg
-> Sort
Sort Key: depname, empno, enroll_date
-> Seq Scan on empsalary
(7 rows)
It keeps doing Unique.
In both of the cases, compare_pathkeys(distinct_pathkeys,
orderby_pathkeys) returns 1
Looking bit further, planner is choosing things correctly.
I could see cost of unique being higher in 1st case and lower in 2nd case.
But the point is, if sort for orderby is pushdown, shouldn't there be
some discount on
cost of Unique sort (so that there is more possibility of it being
favorable compared to HashAgg in certain cases)?
Again, cost of Unqiue node is taken as cost of sort node as it is, but
for HashAgg, new cost is being computed. If we do incremental sort here
(for unique node),
as we have pushed down orderby's, unique cost could be reduced and our
optimization could
be made worthwhile (I assume this is what you intended here) in case of
distinct.
Eg:
EXPLAIN SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Unique (cost=1.63..1.78 rows=10 width=56)
-> Sort (cost=1.63..1.66 rows=10 width=56)
Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), (sum(salary) OVER (?))
-> WindowAgg (cost=1.27..1.47 rows=10 width=56)
-> Sort (cost=1.27..1.29 rows=10 width=48)
Sort Key: depname, empno, enroll_date
-> Seq Scan on empsalary (cost=0.00..1.10 rows=10 width=48)
depname, empno, enroll_date are presorted but still strict sorting is
done on all columns.
Additionally,
the total cost is the same for both of these, but the execution time
seems to vary quite a bit.
Even if I pushdown one or two path keys, end result is same cost (which
isn't helping).
--
Regards,
Ankit Kumar Pandey
On Wed, 11 Jan 2023 at 08:17, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 10/01/23 10:53, David Rowley wrote:
the total cost is the same for both of these, but the execution time
seems to vary quite a bit.This is really weird, I tried it different ways (to rule out any issues
due tocaching) and execution time varied in spite of having same cost.
Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys.However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.Isn't order by pathkeys are always fewer than distinct pathkeys?
Just thinking about this again, I remembered why I thought DISTINCT
was uninteresting to start with. The problem is that if the query has
WindowFuncs and also has a DISTINCT clause, then the WindowFunc
results *must* be in the DISTINCT clause and, optionally also in the
ORDER BY clause. There's no other place to write WindowFuncs IIRC.
Since we cannot pushdown the sort when the more strict version of the
pathkeys have WindowFuncs, then we must still perform the additional
sort if the planner chooses to do a non-hashed DISTINCT. The aim of
this patch is to reduce the total number of sorts, and I don't think
that's possible in this case as you can't have WindowFuncs in the
ORDER BY when they're not in the DISTINCT clause:
postgres=# select distinct relname from pg_Class order by row_number()
over (order by oid);
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: select distinct relname from pg_Class order by row_number() ...
Another type of query which is suboptimal still is when there's a
DISTINCT and WindowClause but no ORDER BY. We'll reorder the DISTINCT
clause so that the leading columns of the ORDER BY come first in
transformDistinctClause(), but we've nothing to do the same for
WindowClauses. It can't happen around when transformDistinctClause()
is called as we've yet to decide the WindowClause evaluation order,
so if we were to try to make that better it would maybe have to do in
the upper planner somewhere. It's possible it's too late by that time
to adjust the DISTINCT clause.
Here's an example of it.
# set enable_hashagg=0;
# explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
------------------------------------------------------------------------------------
Unique (cost=61.12..65.24 rows=412 width=73)
-> Sort (cost=61.12..62.15 rows=412 width=73)
Sort Key: relname, relkind, (count(*) OVER (?))
-> WindowAgg (cost=36.01..43.22 rows=412 width=73)
-> Sort (cost=36.01..37.04 rows=412 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=0.00..18.12
rows=412 width=65)
(7 rows)
We can simulate the optimisation by swapping the column order in the
targetlist. Note the planner can use Incremental Sort (at least since
3c6fc5820, from about 2 hours ago)
# explain select distinct relkind,relname,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
------------------------------------------------------------------------------------
Unique (cost=41.26..65.32 rows=412 width=73)
-> Incremental Sort (cost=41.26..62.23 rows=412 width=73)
Sort Key: relkind, relname, (count(*) OVER (?))
Presorted Key: relkind
-> WindowAgg (cost=36.01..43.22 rows=412 width=73)
-> Sort (cost=36.01..37.04 rows=412 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=0.00..18.12
rows=412 width=65)
(8 rows)
Not sure if we should be trying to improve that in this patch. I just
wanted to identify it as something else that perhaps could be done.
I'm not really all that sure the above query shape makes much sense in
the real world. Would anyone ever want to use DISTINCT on some results
containing WindowFuncs?
David
On Tue, 10 Jan 2023 at 18:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.It's trickier than it looks, cf f4c7c410e. But if you just want
to add a small correction based on number of columns being sorted
by, that seems within reach. See the comment for cost_sort though.
Also, I suppose for incremental sorts we'd want to consider only
the number of newly-sorted columns, but I'm not sure if that info
is readily at hand either.
Yeah, I had exactly that in mind when I mentioned about setting the
bar higher. It seems like a worthy enough goal to improve the sort
costs separately from this work. I'm starting to consider if we might
need to revisit cost_sort() anyway. There's been quite a number of
performance improvements made to sort in the past few years and I
don't recall if anything has been done to check if the sort costs are
still realistic. I'm aware that it's a difficult problem as the number
of comparisons is highly dependent on the order of the input rows.
David
On 11/01/23 06:18, David Rowley wrote:
Not sure if we should be trying to improve that in this patch. I just
wanted to identify it as something else that perhaps could be done.
This could be within reach but still original problem of having hashagg
removing
any gains from this remains.
eg
set enable_hashagg=0;
explain select distinct relkind, relname, count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
------------------------------------------------------------------------------------
Unique (cost=41.26..65.32 rows=412 width=73)
-> Incremental Sort (cost=41.26..62.23 rows=412 width=73)
Sort Key: relkind, relname, (count(*) OVER (?))
Presorted Key: relkind
-> WindowAgg (cost=36.01..43.22 rows=412 width=73)
-> Sort (cost=36.01..37.04 rows=412 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=65)
(8 rows)
reset enable_hashagg;
explain select distinct relkind, relname, count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
------------------------------------------------------------------------------
HashAggregate (cost=46.31..50.43 rows=412 width=73)
Group Key: relkind, relname, count(*) OVER (?)
-> WindowAgg (cost=36.01..43.22 rows=412 width=73)
-> Sort (cost=36.01..37.04 rows=412 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=65)
(6 rows)
HashAgg has better cost than Unique even with incremental sort (tried
with other case
where we have more columns pushed down but still hashAgg wins).
explain select distinct a, b, count(*) over (partition by a order by b) from abcd;
QUERY PLAN
--------------------------------------------------------------------------------------
Unique (cost=345712.12..400370.25 rows=1595 width=16)
-> Incremental Sort (cost=345712.12..395456.14 rows=655214 width=16)
Sort Key: a, b, (count(*) OVER (?))
Presorted Key: a, b
-> WindowAgg (cost=345686.08..358790.36 rows=655214 width=16)
-> Sort (cost=345686.08..347324.11 rows=655214 width=8)
Sort Key: a, b
-> Seq Scan on abcd (cost=0.00..273427.14 rows=655214 width=8)
explain select distinct a, b, count(*) over (partition by a order by b) from abcd;
QUERY PLAN
--------------------------------------------------------------------------------
HashAggregate (cost=363704.46..363720.41 rows=1595 width=16)
Group Key: a, b, count(*) OVER (?)
-> WindowAgg (cost=345686.08..358790.36 rows=655214 width=16)
-> Sort (cost=345686.08..347324.11 rows=655214 width=8)
Sort Key: a, b
-> Seq Scan on abcd (cost=0.00..273427.14 rows=655214 width=8)
(6 rows)
I'm not really all that sure the above query shape makes much sense in
the real world. Would anyone ever want to use DISTINCT on some results
containing WindowFuncs?
This could still have been good to have if there were no negative impact
and some benefit in few cases but as mentioned before, if hashagg removes
any sort (which happened due to push down), all gains will be lost
and we will be probably worse off than before.
--
Regards,
Ankit Kumar Pandey
On Wed, 11 Jan 2023 at 19:21, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
HashAgg has better cost than Unique even with incremental sort (tried
with other casewhere we have more columns pushed down but still hashAgg wins).
I don't think you can claim that one so easily. The two should have
quite different scaling characteristics which will be more evident
with a larger number of input rows. Also, Hash Aggregate makes use of
work_mem * hash_mem_multiplier, whereas sort uses work_mem. Consider
a hash_mem_multiplier less than 1.0.
I'm not really all that sure the above query shape makes much sense in
the real world. Would anyone ever want to use DISTINCT on some results
containing WindowFuncs?This could still have been good to have if there were no negative impact
and some benefit in few cases but as mentioned before, if hashagg removes
any sort (which happened due to push down), all gains will be lost
and we will be probably worse off than before.
We could consider adjusting the create_distinct_paths() so that it
uses some newly invented and less strict pathkey comparison where the
order of the pathkeys does not matter. It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list. Something like that might be able
to apply in more cases, for example:
select distinct b,a from ab where a < 10;
the distinct pathkeys would be b,a but if there's an index on (a),
then we might have a path with pathkeys containing "a".
You can see when we manually swap the order of the DISTINCT clause
that we get a more optimal plan (even if they're not costed quite as
accurately as we might have liked)
create table ab(a int, b int);
create index on ab(a);
set enable_hashagg=0;
set enable_seqscan=0;
insert into ab select x,y from generate_series(1,100)x, generate_Series(1,100)y;
analyze ab;
# explain select distinct b,a from ab where a < 10;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=72.20..78.95 rows=611 width=8)
-> Sort (cost=72.20..74.45 rows=900 width=8)
Sort Key: b, a
-> Index Scan using ab_a_idx on ab (cost=0.29..28.04
rows=900 width=8)
Index Cond: (a < 10)
(5 rows)
# explain select distinct a,b from ab where a < 10; -- manually swap
DISTINCT column order.
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=0.71..60.05 rows=611 width=8)
-> Incremental Sort (cost=0.71..55.55 rows=900 width=8)
Sort Key: a, b
Presorted Key: a
-> Index Scan using ab_a_idx on ab (cost=0.29..28.04
rows=900 width=8)
Index Cond: (a < 10)
(6 rows)
We might also want to also consider if Pathkey.pk_strategy and
pk_nulls_first need to be compared too. That makes the check a bit
more expensive as Pathkeys are canonical and if those fields vary then
we need to perform more than just a comparison by the memory address
of the pathkey. This very much seems like a separate effort than the
WindowClause sort reduction work. I think it gives us everything we've
talked about extra we might want out of reducing WindowClause sorts
and more.
David
On 13/01/23 07:48, David Rowley wrote:
I don't think you can claim that one so easily. The two should have
quite different scaling characteristics which will be more evident
with a larger number of input rows. Also, Hash Aggregate makes use of
work_mem * hash_mem_multiplier, whereas sort uses work_mem. Consider
a hash_mem_multiplier less than 1.0.
In this case, it would make sense to do push down. I will do testing
around large data to see it myself.
We could consider adjusting the create_distinct_paths() so that it
uses some newly invented and less strict pathkey comparison where the
order of the pathkeys does not matter. It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list. Something like that might be able
to apply in more cases, for example:
select distinct b,a from ab where a < 10;
the distinct pathkeys would be b,a but if there's an index on (a),
then we might have a path with pathkeys containing "a".
This would be a very good improvement.
even if they're not costed quite as
accurately as we might have liked
This is very exciting piece actually. Once current set of optimizations
gets ahead,
I will be giving this a shot. We need to look at cost models for sorting.
We might also want to also consider if Pathkey.pk_strategy and
pk_nulls_first need to be compared too. That makes the check a bit
more expensive as Pathkeys are canonical and if those fields vary then
we need to perform more than just a comparison by the memory address
of the pathkey.
This very much seems like a separate effort than the
WindowClause sort reduction work. I think it gives us everything we've
talked about extra we might want out of reducing WindowClause sorts
and more.
I will work on this as separate patch (against HEAD). It makes much more
sense to look this as distinct sort related optimizations (which window
sort optimization
can take benefit). We may take a call to combine them or apply in series.
For unit of work perspective, I would prefer later.
Anyways, the forthcoming patch will contain the following:
1. Modify create_distinct_paths with newly invented and less strict
pathkey comparison where the
order of the pathkeys does not matter.
2. Handle Pathkey.pk_strategy and pk_nulls comparison.
3. Test cases
Thanks
Regards,
Ankit Kumar Pandey
On 13/01/23 07:48, David Rowley wrote:
It would just care if the
pathkeys were present and return a list of pathkeys not contained so
that an incremental sort could be done only on the returned list and a
Unique on an empty returned list.
In create_final_distinct_paths, presorted keys are determined from
input_rel->pathlist & needed_pathkeys. Problem with input_rel->pathlist
is that, for index node, useful_pathkeys is stored in
input_rel->pathlist but this useful_pathkeys
is determined from truncate_useless_pathkeys(index_pathkeys) which
removes index_keys if ordering is different.
Hence, input_rel->pathlist returns null for select distinct b,a from ab
where a < 10; even if index is created on a.
In order to tackle this, I have added index_pathkeys in indexpath node
itself.
Although I started this patch from master, I merged changes to window sort
optimizations.
In patched version:
set enable_hashagg=0;
set enable_seqscan=0;
explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Unique (cost=10000000039.49..10000000063.73 rows=415 width=73)
-> Incremental Sort (cost=10000000039.49..10000000060.62 rows=415 width=73)
Sort Key: relkind, relname, (count(*) OVER (?))
Presorted Key: relkind
-> WindowAgg (cost=10000000034.20..10000000041.46 rows=415 width=73)
-> Sort (cost=10000000034.20..10000000035.23 rows=415 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=10000000000.00..10000000016.15 rows=415 width=65)
(8 rows)
explain select distinct b,a from ab where a < 10;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=0.71..60.05 rows=611 width=8)
-> Incremental Sort (cost=0.71..55.55 rows=900 width=8)
Sort Key: a, b
Presorted Key: a
-> Index Scan using ab_a_idx on ab (cost=0.29..28.04 rows=900 width=8)
Index Cond: (a < 10)
(6 rows)
explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Unique (cost=10000021174.63..10000038095.75 rows=60 width=16)
-> Incremental Sort (cost=10000021174.63..10000036745.75 rows=180000 width=16)
Sort Key: a, b, (count(*) OVER (?))
Presorted Key: a, b
-> WindowAgg (cost=10000020948.87..10000024098.87 rows=180000 width=16)
-> Sort (cost=10000020948.87..10000021398.87 rows=180000 width=8)
Sort Key: a, b
-> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=8)
(8 rows)
explain select distinct a, b, count(*) over (partition by a,b,c) from abcd;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Unique (cost=10000021580.47..10000036629.31 rows=60 width=20)
-> Incremental Sort (cost=10000021580.47..10000035279.31 rows=180000 width=20)
Sort Key: a, b, c, (count(*) OVER (?))
Presorted Key: a, b, c
-> WindowAgg (cost=10000021561.37..10000025611.37 rows=180000 width=20)
-> Sort (cost=10000021561.37..10000022011.37 rows=180000 width=12)
Sort Key: a, b, c
-> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=12)
(8 rows)
explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Unique (cost=2041.88..36764.90 rows=60 width=20)
-> Incremental Sort (cost=2041.88..35414.90 rows=180000 width=20)
Sort Key: b, a, c, (count(*) OVER (?))
Presorted Key: b, a, c
-> WindowAgg (cost=1989.94..25746.96 rows=180000 width=20)
-> Incremental Sort (cost=1989.94..22146.96 rows=180000 width=12)
Sort Key: b, a, c
Presorted Key: b
-> Index Scan using b_idx on abcd (cost=0.29..7174.62 rows=180000 width=12)
(9 rows)
In master:
explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Unique (cost=10000000059.50..10000000063.65 rows=415 width=73)
-> Sort (cost=10000000059.50..10000000060.54 rows=415 width=73)
Sort Key: relname, relkind, (count(*) OVER (?))
-> WindowAgg (cost=10000000034.20..10000000041.46 rows=415 width=73)
-> Sort (cost=10000000034.20..10000000035.23 rows=415 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=10000000000.00..10000000016.15 rows=415 width=65)
(7 rows)
explain select distinct b,a from ab where a < 10;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=72.20..78.95 rows=611 width=8)
-> Sort (cost=72.20..74.45 rows=900 width=8)
Sort Key: b, a
-> Index Scan using ab_a_idx on ab (cost=0.29..28.04 rows=900 width=8)
Index Cond: (a < 10)
(5 rows)
explain select distinct b,a, count(*) over (partition by a) from abcd order by a,b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Unique (cost=10000023704.77..10000041084.40 rows=60 width=16)
-> Incremental Sort (cost=10000023704.77..10000039734.40 rows=180000 width=16)
Sort Key: a, b, (count(*) OVER (?))
Presorted Key: a
-> WindowAgg (cost=10000020948.87..10000024098.87 rows=180000 width=16)
-> Sort (cost=10000020948.87..10000021398.87 rows=180000 width=8)
Sort Key: a
-> Seq Scan on abcd (cost=10000000000.00..10000002773.00 rows=180000 width=8)
(8 rows)
explain select distinct a, b, count(*) over (partition by b,a, c) from abcd;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Unique (cost=45151.33..46951.33 rows=60 width=20)
-> Sort (cost=45151.33..45601.33 rows=180000 width=20)
Sort Key: a, b, (count(*) OVER (?))
-> WindowAgg (cost=1989.94..25746.96 rows=180000 width=20)
-> Incremental Sort (cost=1989.94..22146.96 rows=180000 width=12)
Sort Key: b, a, c
Presorted Key: b
-> Index Scan using b_idx on abcd (cost=0.29..7174.62 rows=180000 width=12)
(8 rows)
Note: Composite keys are also handled.
create index xy_idx on xyz(x,y);
explain select distinct x,z,y from xyz;
QUERY PLAN
----------------------------------------------------------------------------------
Unique (cost=0.86..55.97 rows=60 width=12)
-> Incremental Sort (cost=0.86..51.47 rows=600 width=12)
Sort Key: x, y, z
Presorted Key: x, y
-> Index Scan using xy_idx on xyz (cost=0.15..32.80 rows=600 width=12)
(5 rows)
There are some cases where different kind of scan happens
explain select distinct x,y from xyz where y < 10;
QUERY PLAN
-----------------------------------------------------------------------------------
Unique (cost=47.59..51.64 rows=60 width=8)
-> Sort (cost=47.59..48.94 rows=540 width=8)
Sort Key: x, y
-> Bitmap Heap Scan on xyz (cost=12.34..23.09 rows=540 width=8)
Recheck Cond: (y < 10)
-> Bitmap Index Scan on y_idx (cost=0.00..12.20
rows=540 width=0)
Index Cond: (y < 10)
(7 rows)
As code only checks from IndexPath (at the moment), other scan paths are
not covered.
Is it okay to cover these in same way as I did for IndexPath? (with no
limitation on this behaviour on certain path types?)
Also, I am assuming distinct pathkeys can be changed without any issues.
As changes are limited to modification in distinct path only,
I don't see this affecting other nodes. Test cases are green,
with a couple of failures in window functions (one which I had added)
and one very weird:
EXPLAIN (COSTS OFF)
SELECT DISTINCT
empno,
enroll_date,
depname,
sum(salary) OVER (PARTITION BY depname order by empno) depsalary,
min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary
FROM empsalary
ORDER BY depname, empno;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Unique
-> Incremental Sort
Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?))
Presorted Key: depname
-> WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> WindowAgg
-> Sort
Sort Key: depname, enroll_date
-> Seq Scan on empsalary
(15 rows)
In above query plan, unique used to come after Incremental sort in the
master.
Pending:
1. Consider if Pathkey.pk_strategy and pk_nulls_first need to be
compared too, this is pending
as I have to look these up and understand them.
2. Test cases (failures and new cases)
3. Improve comments
Patch v8 attached.
Please let me know any review comments, will address these in followup patch
with pending items.
Thanks,
Ankit
Attachments:
better_windowclause_sorting_dgr-v8.patchtext/x-patch; charset=UTF-8; name=better_windowclause_sorting_dgr-v8.patchDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index b9da588114..e13c8f1914 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1015,7 +1015,6 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
orderbyclauses,
orderbyclausecols,
useful_pathkeys,
- index_pathkeys,
index_is_ordered ?
ForwardScanDirection :
NoMovementScanDirection,
@@ -1038,7 +1037,6 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
orderbyclauses,
orderbyclausecols,
useful_pathkeys,
- index_pathkeys,
index_is_ordered ?
ForwardScanDirection :
NoMovementScanDirection,
@@ -1074,7 +1072,6 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
NIL,
NIL,
useful_pathkeys,
- index_pathkeys,
BackwardScanDirection,
index_only_scan,
outer_relids,
@@ -1092,7 +1089,6 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
NIL,
NIL,
useful_pathkeys,
- index_pathkeys,
BackwardScanDirection,
index_only_scan,
outer_relids,
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 28be495370..609df93dc9 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1968,21 +1968,3 @@ has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel)
return true; /* might be able to use them for ordering */
return false; /* definitely useless */
}
-
-/*
- * Returns True if key1 subset is key2 i.e.
- * atleast one key from keys1 is present on key2
- */
-bool
-is_pathkey_subset(List *keys1, List* keys2)
-{
- ListCell* l;
- foreach(l, keys1)
- {
- PathKey *pathkey1 = (PathKey *) lfirst(l);
- if(pathkey_is_redundant(pathkey1, keys2)){
- return true;
- }
- }
- return false;
-}
\ No newline at end of file
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ee883048e3..044fb24666 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4421,14 +4421,8 @@ create_one_window_path(PlannerInfo *root,
List *activeWindows)
{
PathTarget *window_target;
- Query *parse = root->parse;
ListCell *l;
List *topqual = NIL;
- List *window_pathkeys;
- List *orderby_pathkeys = NIL;
- int sort_pushdown_idx = -1;
- int presorted_keys;
- bool is_sorted;
/*
* Since each window clause could require a different sort order, we stack
@@ -4447,99 +4441,17 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
- /*
- * Here we attempt to minimize the number of sorts which must be performed
- * for queries with an ORDER BY clause.
- *
- * It's possible due to select_active_windows() putting the WindowClauses
- * with the lowest tleSortGroupRef last in the activeWindows list that the
- * final WindowClause has a subset of the sort requirements that the
- * query's ORDER BY clause has. Below we try to detect this case and if
- * we find this is true, we consider adjusting the sort that's done for
- * WindowAggs and include the additional clauses so that no additional
- * sorting is required for the query's ORDER BY clause.
- *
- * We don't try this optimization for the following cases:
- *
- * 1. If the query has a DISTINCT clause. We needn't waste any
- * additional effort for the more strict sort order as if DISTINCT
- * is done via Hash Aggregate then that's going to undo this work.
- *
- * 2. If the query has a LIMIT clause. The top-level sort will be
- * able to use a top-n sort which might be more efficient than
- * sorting by the additional columns. If the LIMIT does not reduce
- * the number of rows significantly then this might not be true, but
- * we don't try to consider that here.
- *
- * 3. If the top-level WindowClause has a runCondition then this can
- * filter out tuples and make the final sort cheaper. If we pushed
- * the sort down below the WindowAgg then we'd need to sort all rows
- * including ones that the runCondition might filter. This may waste
- * effort so we just don't try to push down the sort for this case.
- *
- * 4. If the ORDER BY contains any expressions containing WindowFuncs
- * then we can't push down the sort as these obviously must be
- * evaluated before they can be sorted.
- */
- if (parse->sortClause != NIL &&
- parse->limitCount == NULL &&
- linitial_node(WindowClause, activeWindows)->runCondition == NIL &&
- !contain_window_function((Node *) get_sortgrouplist_exprs(parse->sortClause,
- root->processed_tlist)))
- {
- orderby_pathkeys = make_pathkeys_for_sortclauses(root,
- parse->sortClause,
- root->processed_tlist);
-
- /*
- * Loop backwards over the WindowClauses looking for the first
- * WindowClause which has pathkeys not contained in the
- * orderby_pathkeys. What we're looking for here is the lowest level
- * that we push the additional sort requirements down into. If we
- * fail here then sort_pushdown_idx will remain at -1.
- */
- foreach_reverse(l, activeWindows)
- {
- WindowClause *wc = lfirst_node(WindowClause, l);
-
- window_pathkeys = make_pathkeys_for_window(root,
- wc,
- root->processed_tlist);
-
- if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
- sort_pushdown_idx = foreach_current_index(l);
- else
- break;
- }
- }
-
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
+ List *window_pathkeys;
+ int presorted_keys;
+ bool is_sorted;
bool topwindow;
window_pathkeys = make_pathkeys_for_window(root,
- wc,
- root->processed_tlist);
-
- /*
- * When we get to the sort_pushdown_idx WindowClause, if the path is
- * not already correctly sorted, then just use the pathkeys for the
- * query's ORDER BY clause instead of the WindowClause's pathkeys.
- * When the path is already correctly sorted there's no point in
- * adjusting the pathkeys as this just moves the sort down without
- * actually reducing the number of sorts which are required in the
- * plan overall. sort_pushdown_idx will be -1 and never match when
- * this optimization was disabled above.
- */
- if (foreach_current_index(l) == sort_pushdown_idx &&
- !pathkeys_contained_in(window_pathkeys, path->pathkeys))
- {
- /* check to make sure sort_pushdown_idx was set correctly */
- Assert(pathkeys_contained_in(window_pathkeys, orderby_pathkeys));
-
- window_pathkeys = orderby_pathkeys;
- }
+ wc,
+ root->processed_tlist);
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys,
@@ -4933,43 +4845,9 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
bool is_sorted;
int presorted_keys;
-
- /*
- * Consider any any sorting which can be reutilized for distinct
- * even if ordering differs.
- */
- if (IsA(lfirst(lc), IndexPath))
- {
- /*
- * Find if index pathkeys are subset of needed pathkeys
- * if so then prepend these keys so that sorting is not needed
- */
- IndexPath *idx_path = (IndexPath*) lfirst(lc);
- if (is_pathkey_subset(needed_pathkeys, idx_path->indexpathkeys))
- {
-
- needed_pathkeys = list_concat_unique(list_copy(idx_path->indexpathkeys), needed_pathkeys);
- }
- is_sorted = pathkeys_count_contained_in(needed_pathkeys,
- idx_path->indexpathkeys,
- &presorted_keys);
- }
- else
- {
- /*
- * If needed pathkeys are subset of query_pathkeys,
- * if so, lower nodes will have sorted some columns which
- * we can reuse.
- */
- if (is_pathkey_subset(needed_pathkeys, root->query_pathkeys))
- {
- needed_pathkeys = list_concat_unique(list_copy(root->query_pathkeys), needed_pathkeys);
- }
- is_sorted = pathkeys_count_contained_in(needed_pathkeys,
+ is_sorted = pathkeys_count_contained_in(needed_pathkeys,
input_path->pathkeys,
&presorted_keys);
- }
-
if (is_sorted)
sorted_path = input_path;
@@ -6626,7 +6504,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
/* Estimate the cost of index scan */
indexScanPath = create_index_path(root, indexInfo,
- NIL, NIL, NIL, NIL, NIL,
+ NIL, NIL, NIL, NIL,
ForwardScanDirection, false,
NULL, 1.0, false);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c666be4f4f..4478036bb6 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1000,7 +1000,6 @@ create_index_path(PlannerInfo *root,
List *indexorderbys,
List *indexorderbycols,
List *pathkeys,
- List *indexpathkeys,
ScanDirection indexscandir,
bool indexonly,
Relids required_outer,
@@ -1024,9 +1023,7 @@ create_index_path(PlannerInfo *root,
pathnode->indexclauses = indexclauses;
pathnode->indexorderbys = indexorderbys;
pathnode->indexorderbycols = indexorderbycols;
-
pathnode->indexscandir = indexscandir;
- pathnode->indexpathkeys = indexpathkeys;
cost_index(pathnode, root, loop_count, partial_path);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a2abb40356..c20b7298a3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1601,7 +1601,6 @@ typedef struct IndexPath
List *indexclauses;
List *indexorderbys;
List *indexorderbycols;
- List *indexpathkeys;
ScanDirection indexscandir;
Cost indextotalcost;
Selectivity indexselectivity;
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 0de9d73dac..529a382d28 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -378,30 +378,14 @@ lnext(const List *l, const ListCell *c)
(cell = NULL, false); \
cell##__state.i++)
-/*
- * foreach_reverse -
- * a convenience macro for looping through a list in reverse
- *
- * This is equivalent to foreach, only it loops over the list starting with
- * the last element and ends on the first element.
- */
-#define foreach_reverse(cell, lst) \
- for (ForEachState cell##__state = {(lst), cell##__state.l->length - 1}; \
- (cell##__state.l != NIL && \
- cell##__state.i >= 0) ? \
- (cell = &cell##__state.l->elements[cell##__state.i], true) : \
- (cell = NULL, false); \
- cell##__state.i--)
-
/*
* foreach_delete_current -
* delete the current list element from the List associated with a
- * surrounding foreach() or foreach_reverse() loop, returning the new List
-* pointer.
+ * surrounding foreach() loop, returning the new List pointer.
*
* This is equivalent to list_delete_cell(), but it also adjusts the foreach
- * or foreach_reverse loop's state so that no list elements will be missed.
- * Do not delete elements from an active foreach loop's list in any other way!
+ * loop's state so that no list elements will be missed. Do not delete
+ * elements from an active foreach loop's list in any other way!
*/
#define foreach_delete_current(lst, cell) \
(cell##__state.i--, \
@@ -409,9 +393,8 @@ lnext(const List *l, const ListCell *c)
/*
* foreach_current_index -
- * get the zero-based index within the list of the current element in the
- * surrounding foreach() or foreach_reverse() loop; pass the name of the
- * "ListCell *" iterator variable.
+ * get the zero-based list index of a surrounding foreach() loop's
+ * current element; pass the name of the "ListCell *" iterator variable.
*
* Beware of using this after foreach_delete_current(); the value will be
* out of sync for the rest of the current loop iteration. Anyway, since
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 1833218efc..02305ef902 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -44,7 +44,6 @@ extern IndexPath *create_index_path(PlannerInfo *root,
List *indexorderbys,
List *indexorderbycols,
List *pathkeys,
- List *indexpathkeys,
ScanDirection indexscandir,
bool indexonly,
Relids required_outer,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 1204e9623f..65a3c35611 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -248,7 +248,6 @@ extern List *truncate_useless_pathkeys(PlannerInfo *root,
RelOptInfo *rel,
List *pathkeys);
extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo *rel);
-extern bool is_pathkey_subset(List *keys1, List* keys2);
extern List *append_pathkeys(List *target, List *source);
extern PathKey *make_canonical_pathkey(PlannerInfo *root,
EquivalenceClass *eclass, Oid opfamily,
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index ab94f810fc..b2c6605e60 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3984,210 +3984,7 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(12 rows)
--- Do not perform sort pushdown if column is presorted
-CREATE INDEX depname_idx ON empsalary(depname);
-SET enable_seqscan=0;
-EXPLAIN (COSTS OFF)
-SELECT empno,
- min(salary) OVER (PARTITION BY depname) depminsalary
-FROM empsalary
-ORDER BY depname, empno;
- QUERY PLAN
--------------------------------------------------------
- Incremental Sort
- Sort Key: depname, empno
- Presorted Key: depname
- -> WindowAgg
- -> Index Scan using depname_idx on empsalary
-(5 rows)
-
--- Do perform sort pushdown if columns are only partially sorted
-EXPLAIN (COSTS OFF)
-SELECT empno,
- min(salary) OVER (PARTITION BY depname, empno) depminsalary
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
- QUERY PLAN
--------------------------------------------------------
- WindowAgg
- -> Incremental Sort
- Sort Key: depname, empno, enroll_date
- Presorted Key: depname
- -> Index Scan using depname_idx on empsalary
-(5 rows)
-
-DROP INDEX depname_idx;
-RESET enable_seqscan;
RESET enable_hashagg;
--- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
--- QUERY's ORDER BY to save additional sorting in the end
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (ORDER BY depname) depminsalary
-FROM empsalary
-ORDER BY depname, empno;
- QUERY PLAN
------------------------------------
- WindowAgg
- -> Sort
- Sort Key: depname, empno
- -> Seq Scan on empsalary
-(4 rows)
-
--- Same as above, but PARTITION BY clause is included as well
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
- QUERY PLAN
------------------------------------------------
- WindowAgg
- -> Sort
- Sort Key: depname, empno, enroll_date
- -> Seq Scan on empsalary
-(4 rows)
-
--- Negation of above, when sort optimization is not needed to be performed
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY empno, enroll_date;
- QUERY PLAN
------------------------------------------
- Sort
- Sort Key: empno, enroll_date
- -> WindowAgg
- -> Sort
- Sort Key: depname, empno
- -> Seq Scan on empsalary
-(6 rows)
-
--- ORDER BY's in multiple Window functions can be combined into one
--- if they are subset of QUERY's ORDER BY
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- count(*) OVER (ORDER BY enroll_date DESC) c
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
- QUERY PLAN
-------------------------------------------------------
- WindowAgg
- -> WindowAgg
- -> Sort
- Sort Key: depname, empno, enroll_date
- -> WindowAgg
- -> Sort
- Sort Key: enroll_date DESC
- -> Seq Scan on empsalary
-(8 rows)
-
--- order of window function shouldn't affect query's ORDER BY pushdown
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- count(*) OVER (ORDER BY enroll_date DESC) c,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
- QUERY PLAN
-------------------------------------------------------
- WindowAgg
- -> WindowAgg
- -> Sort
- Sort Key: depname, empno, enroll_date
- -> WindowAgg
- -> Sort
- Sort Key: enroll_date DESC
- -> Seq Scan on empsalary
-(8 rows)
-
--- SORT OPTIMIZATION (as in cases above) is not needed to be performed
--- in cases, as mentioned below.
--- Case #1: When distinct clause is included
-EXPLAIN (COSTS OFF)
-SELECT DISTINCT empno,
- depname,
- min(salary) OVER (PARTITION BY depname) depminsalary,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- count(*) OVER (ORDER BY enroll_date DESC) c
-FROM empsalary
-ORDER BY depname, empno;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------
- Unique
- -> Sort
- Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
- -> WindowAgg
- -> Sort
- Sort Key: depname
- -> WindowAgg
- -> Sort
- Sort Key: enroll_date DESC
- -> Seq Scan on empsalary
-(10 rows)
-
--- Case #2: When Limit clause is specified
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY empno) depminsalary,
- sum(salary) OVER (PARTITION BY empno) depsalary
-FROM empsalary
-ORDER BY empno, depname LIMIT 1;
- QUERY PLAN
------------------------------------------------
- Limit
- -> Incremental Sort
- Sort Key: empno, depname
- Presorted Key: empno
- -> WindowAgg
- -> Sort
- Sort Key: empno
- -> Seq Scan on empsalary
-(8 rows)
-
--- Case #3: When ORDER BY contains any WindowFuncs
-EXPLAIN (COSTS OFF)
-SELECT empno,
- row_number() over (PARTITION BY empno) esalary
-FROM empsalary
-ORDER BY empno, esalary;
- QUERY PLAN
---------------------------------------------
- Incremental Sort
- Sort Key: empno, (row_number() OVER (?))
- Presorted Key: empno
- -> WindowAgg
- -> Sort
- Sort Key: empno
- -> Seq Scan on empsalary
-(7 rows)
-
--- Case #4: When WindowFunc ORDER BY is superset of QUERY's ORDER BY
-EXPLAIN (COSTS OFF)
-SELECT empno,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY depname;
- QUERY PLAN
------------------------------------------
- WindowAgg
- -> WindowAgg
- -> Sort
- Sort Key: depname, empno
- -> Seq Scan on empsalary
-(5 rows)
-
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index 665e156920..b7bd0a83da 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,110 +1324,8 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
--- Do not perform sort pushdown if column is presorted
-CREATE INDEX depname_idx ON empsalary(depname);
-SET enable_seqscan=0;
-
-EXPLAIN (COSTS OFF)
-SELECT empno,
- min(salary) OVER (PARTITION BY depname) depminsalary
-FROM empsalary
-ORDER BY depname, empno;
-
--- Do perform sort pushdown if columns are only partially sorted
-EXPLAIN (COSTS OFF)
-SELECT empno,
- min(salary) OVER (PARTITION BY depname, empno) depminsalary
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
-
-DROP INDEX depname_idx;
-RESET enable_seqscan;
RESET enable_hashagg;
--- If window function's ORDER BY is subset of QUERY's ORDER BY, sort as per as
--- QUERY's ORDER BY to save additional sorting in the end
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (ORDER BY depname) depminsalary
-FROM empsalary
-ORDER BY depname, empno;
-
--- Same as above, but PARTITION BY clause is included as well
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
-
--- Negation of above, when sort optimization is not needed to be performed
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY empno, enroll_date;
-
--- ORDER BY's in multiple Window functions can be combined into one
--- if they are subset of QUERY's ORDER BY
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- count(*) OVER (ORDER BY enroll_date DESC) c
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
-
--- order of window function shouldn't affect query's ORDER BY pushdown
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- count(*) OVER (ORDER BY enroll_date DESC) c,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY depname, empno, enroll_date;
-
--- SORT OPTIMIZATION (as in cases above) is not needed to be performed
--- in cases, as mentioned below.
-
--- Case #1: When distinct clause is included
-EXPLAIN (COSTS OFF)
-SELECT DISTINCT empno,
- depname,
- min(salary) OVER (PARTITION BY depname) depminsalary,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- count(*) OVER (ORDER BY enroll_date DESC) c
-FROM empsalary
-ORDER BY depname, empno;
-
--- Case #2: When Limit clause is specified
-EXPLAIN (COSTS OFF)
-SELECT empno,
- depname,
- min(salary) OVER (PARTITION BY empno) depminsalary,
- sum(salary) OVER (PARTITION BY empno) depsalary
-FROM empsalary
-ORDER BY empno, depname LIMIT 1;
-
--- Case #3: When ORDER BY contains any WindowFuncs
-EXPLAIN (COSTS OFF)
-SELECT empno,
- row_number() over (PARTITION BY empno) esalary
-FROM empsalary
-ORDER BY empno, esalary;
-
--- Case #4: When WindowFunc ORDER BY is superset of QUERY's ORDER BY
-EXPLAIN (COSTS OFF)
-SELECT empno,
- sum(salary) OVER (PARTITION BY depname) depsalary,
- min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
-FROM empsalary
-ORDER BY depname;
-
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
SELECT
On Mon, 16 Jan 2023 at 06:52, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Hence, input_rel->pathlist returns null for select distinct b,a from ab
where a < 10; even if index is created on a.In order to tackle this, I have added index_pathkeys in indexpath node
itself.
I don't think we should touch this. It could significantly increase
the number of indexes that we consider when generating paths on base
relations and therefore *significantly* increase the number of paths
we consider during the join search. What I had in mind was just
making better use of existing paths to see if we can find a cheaper
way to perform the DISTINCT. That'll only possibly increase the
number of paths for the distinct upper relation which really only
increases the number of paths which are considered in
create_ordered_paths(). That's unlikely to cause much of a slowdown in
the planner.
Although I started this patch from master, I merged changes to window sort
optimizations.
I'm seeing these two things as separate patches. I don't think there's
any need to add further complexity to the patch that tries to reduce
the number of sorts for WindowAggs. I think you'd better start a new
thread for this.
Also, I am assuming distinct pathkeys can be changed without any issues.
As changes are limited to modification in distinct path only,
As far as I see it, you shouldn't be touching the distinct_pathkeys.
Those are set in such a way as to minimise the likelihood of an
additional sort for the ORDER BY. If you've fiddled with that, then I
imagine this is why the plan below has an additional Incremental Sort
that didn't exist before.
I've not looked at your patch, but all I imagine you need to do for it
is to invent a function in pathkeys.c which is along the lines of what
pathkeys_count_contained_in() does, but returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1. In
create_final_distinct_paths(), you can then perform an incremental
sort on any input_path which has a non-empty return list and in
create_incremental_sort_path(), you'll pass presorted_keys as the
number of pathkeys in the path, and the required pathkeys the
input_path->pathkeys + the pathkeys returned from the new function.
As an optimization, you might want to consider that the
distinct_pathkeys list might be long and that the new function, if you
code the lookup as a nested loop, might be slow. You might want to
consider hashing the distinct_pathkeys once in
create_final_distinct_paths(), then for each input_path, perform a
series of hash lookups to see which of the input_path->pathkeys are in
the hash table. That might require adding two functions to pathkeys.c,
one to build the hash table and then another to probe it and return
the remaining pathkeys list. I'd go and make sure it all works as we
expect before going to the trouble of trying to optimize this. A
simple nested loop lookup will allow us to review that this works as
we expect.
EXPLAIN (COSTS OFF)
SELECT DISTINCT
empno,
enroll_date,
depname,
sum(salary) OVER (PARTITION BY depname order by empno) depsalary,
min(salary) OVER (PARTITION BY depname order by enroll_date) depminsalary
FROM empsalary
ORDER BY depname, empno;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> Unique
-> Incremental Sort
Sort Key: depname, enroll_date, empno, (sum(salary) OVER (?)), (min(salary) OVER (?))
Presorted Key: depname
-> WindowAgg
-> Incremental Sort
Sort Key: depname, empno
Presorted Key: depname
-> WindowAgg
-> Sort
Sort Key: depname, enroll_date
-> Seq Scan on empsalary
(15 rows)
David
On 16/01/23 09:48, David Rowley wrote:
I don't think we should touch this. It could significantly increase
the number of indexes that we consider when generating paths on base
relations and therefore *significantly* increase the number of paths
we consider during the join search. What I had in mind was just
making better use of existing paths to see if we can find a cheaper
way to perform the DISTINCT. That'll only possibly increase the
number of paths for the distinct upper relation which really only
increases the number of paths which are considered in
create_ordered_paths(). That's unlikely to cause much of a slowdown in
the planner.
Okay, I see the issue. Makes sense
I'm seeing these two things as separate patches. I don't think there's
any need to add further complexity to the patch that tries to reduce
the number of sorts for WindowAggs. I think you'd better start a new
thread for this.
Will be starting new thread for this and separate patch.
As far as I see it, you shouldn't be touching the distinct_pathkeys.
Those are set in such a way as to minimise the likelihood of an
additional sort for the ORDER BY. If you've fiddled with that, then I
imagine this is why the plan below has an additional Incremental Sort
that didn't exist before.
I've not looked at your patch, but all I imagine you need to do for it
is to invent a function in pathkeys.c which is along the lines of what
pathkeys_count_contained_in() does, but returns a List of pathkeys
which are in keys1 but not in keys2 and NIL if keys2 has a pathkey
that does not exist as a pathkey in keys1. In
create_final_distinct_paths(), you can then perform an incremental
sort on any input_path which has a non-empty return list and in
create_incremental_sort_path(), you'll pass presorted_keys as the
number of pathkeys in the path, and the required pathkeys the
input_path->pathkeys + the pathkeys returned from the new function.
Okay, this should be straightforward. Let me try this.
As an optimization, you might want to consider that the
distinct_pathkeys list might be long and that the new function, if you
code the lookup as a nested loop, might be slow. You might want to
consider hashing the distinct_pathkeys once in
create_final_distinct_paths(), then for each input_path, perform a
series of hash lookups to see which of the input_path->pathkeys are in
the hash table. That might require adding two functions to pathkeys.c,
one to build the hash table and then another to probe it and return
the remaining pathkeys list. I'd go and make sure it all works as we
expect before going to the trouble of trying to optimize this. A
simple nested loop lookup will allow us to review that this works as
we expect.
Okay make sense, will start with nested loop while it is in review and then
optimal version once it is all good to go.
Thanks,
Ankit
On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
On 09/01/23 17:53, David Rowley wrote:
I gave some thought to whether doing foreach_delete_current() is safe
within a foreach_reverse() loop. I didn't try it, but I couldn't see
any reason why not. It is pretty late here and I'd need to test that
to be certain. If it turns out not to be safe then we need to document
that fact in the comments of the foreach_reverse() macro and the
foreach_delete_current() macro.I tested foreach_delete_current inside foreach_reverse loop.
It worked fine.
I also thought I'd better test that foreach_delete_current() works
with foreach_reverse(). I can confirm that it *does not* work
correctly. I guess maybe you only tested the fact that it deleted the
current item and not that the subsequent loop correctly went to the
item directly before the deleted item. There's a problem with that. We
skip an item.
Instead of fixing that, I think it's likely better just to loop
backwards manually with a for() loop, so I've adjusted the patch to
work that way. It's quite likely that the additional code in
foreach() and what was in foreach_reverse() is slower than looping
manually due to the additional work those macros do to set the cell to
NULL when we run out of cells to loop over.
I have attached patch with one extra test case (as mentioned above).
Rest of the changes are looking fine.
I made another pass over the v7 patch and fixed a bug that was
disabling the optimization when the deepest WindowAgg had a
runCondition. This should have been using llast_node instead of
linitial_node. The top-level WindowAgg is the last in the list.
I also made a pass over the tests and added a test case for the
runCondition check to make sure we disable the optimization when the
top-level WindowAgg has one of those. I wasn't sure what your test
comments case numbers were meant to represent. They were not aligned
with the code comments that document when the optimisation is
disabled, they started out aligned, but seemed to go off the rails at
#3. I've now made them follow the comments in create_one_window_path()
and made it more clear what we expect the test outcome to be in each
case.
I've attached the v9 patch. I feel like this patch is quite
self-contained and I'm quite happy with it now. If there are no
objections soon, I'm planning on pushing it.
David
Attachments:
better_windowclause_sorting_dgr-v9.patchtext/plain; charset=US-ASCII; name=better_windowclause_sorting_dgr-v9.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 093ace0864..c110696403 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4464,8 +4464,11 @@ create_one_window_path(PlannerInfo *root,
List *activeWindows)
{
PathTarget *window_target;
+ Query *parse = root->parse;
ListCell *l;
List *topqual = NIL;
+ List *orderby_pathkeys = NIL;
+ int sort_pushdown_idx = -1;
/*
* Since each window clause could require a different sort order, we stack
@@ -4484,6 +4487,75 @@ create_one_window_path(PlannerInfo *root,
*/
window_target = input_target;
+ /*
+ * Here we attempt to minimize the number of sorts which must be performed
+ * for queries with an ORDER BY clause.
+ *
+ * It's possible due to select_active_windows() putting the WindowClauses
+ * with the lowest tleSortGroupRef last in the activeWindows list that the
+ * final WindowClause has a subset of the sort requirements that the
+ * query's ORDER BY clause has. Below we try to detect this case and if
+ * we find this is true, we consider adjusting the sort that's done for
+ * WindowAggs and include the additional clauses so that no additional
+ * sorting is required for the query's ORDER BY clause.
+ *
+ * We don't try this optimization for the following cases:
+ *
+ * 1. If the query has a DISTINCT clause. We needn't waste any additional
+ * effort for the more strict sort order as if DISTINCT is done via Hash
+ * Aggregate then that's going to undo this work.
+ *
+ * 2. If the query has a LIMIT clause. The top-level sort will be able to
+ * use a top-n sort which might be more efficient than sorting by the
+ * additional columns. If the LIMIT does not reduce the number of rows
+ * significantly then this might not be true, but we don't try to consider
+ * that here.
+ *
+ * 3. If the top-level WindowClause has a runCondition then this can
+ * filter out tuples and make the final sort cheaper. If we pushed the
+ * sort down below the WindowAgg then we'd need to sort all rows including
+ * ones that the runCondition might filter out. This may waste effort so
+ * we just don't try to push down the sort for this case.
+ *
+ * 4. If the ORDER BY contains any expressions containing WindowFuncs then
+ * we can't push down the sort as these obviously must be evaluated before
+ * they can be sorted.
+ */
+ if (parse->sortClause != NIL && parse->distinctClause == NIL &&
+ parse->limitCount == NULL &&
+ llast_node(WindowClause, activeWindows)->runCondition == NIL &&
+ !contain_window_function((Node *) get_sortgrouplist_exprs(parse->sortClause,
+ root->processed_tlist)))
+ {
+ orderby_pathkeys = make_pathkeys_for_sortclauses(root,
+ parse->sortClause,
+ root->processed_tlist);
+
+ /*
+ * Loop backwards over the WindowClauses looking for the first
+ * WindowClause which has pathkeys not contained in the
+ * orderby_pathkeys. What we're looking for here is the lowest level
+ * that we push the additional sort requirements down into. If we
+ * can't find any WindowClause with suitable pathkeys here then
+ * sort_pushdown_idx will remain at -1 which will effectively disable
+ * the pushdown attempt later in this function.
+ */
+ for (int i = list_length(activeWindows) - 1; i >= 0; i--)
+ {
+ WindowClause *wc = list_nth_node(WindowClause, activeWindows, i);
+ List *window_pathkeys;
+
+ window_pathkeys = make_pathkeys_for_window(root,
+ wc,
+ root->processed_tlist);
+
+ if (pathkeys_contained_in(window_pathkeys, orderby_pathkeys))
+ sort_pushdown_idx = i;
+ else
+ break;
+ }
+ }
+
foreach(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
@@ -4496,6 +4568,25 @@ create_one_window_path(PlannerInfo *root,
wc,
root->processed_tlist);
+ /*
+ * When we get to the sort_pushdown_idx WindowClause, if the path is
+ * not already correctly sorted, then just use the pathkeys for the
+ * query's ORDER BY clause instead of the WindowClause's pathkeys.
+ * When the path is already correctly sorted there's no point in
+ * adjusting the pathkeys as this just moves the sort down without
+ * actually reducing the number of sorts which are required in the
+ * plan overall. sort_pushdown_idx will be -1 and never match when
+ * this optimization was disabled above.
+ */
+ if (foreach_current_index(l) == sort_pushdown_idx &&
+ !pathkeys_contained_in(window_pathkeys, path->pathkeys))
+ {
+ /* check to make sure sort_pushdown_idx was set correctly */
+ Assert(pathkeys_contained_in(window_pathkeys, orderby_pathkeys));
+
+ window_pathkeys = orderby_pathkeys;
+ }
+
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys,
&presorted_keys);
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index b2c6605e60..1a141413f8 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -3984,6 +3984,226 @@ ORDER BY depname, empno;
-> Seq Scan on empsalary
(12 rows)
+CREATE INDEX empsalary_depname_idx ON empsalary(depname);
+SET enable_seqscan TO off;
+-- Ensure we don't push the ORDER BY sort below the WindowAgg when the input
+-- node to the WindowAgg is already sorted correctly. This would just move
+-- the sort down and not reduce the number of sorts.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Incremental Sort
+ Sort Key: depname, empno
+ Presorted Key: depname
+ -> WindowAgg
+ -> Index Scan using empsalary_depname_idx on empsalary
+(5 rows)
+
+DROP INDEX empsalary_depname_idx;
+RESET enable_seqscan;
+-- Ensure we push the ORDER BY sort below the WindowAgg
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- A more complex permutation of the above. Ensure the ORDER BY sort is
+-- pushed down below the WindowAgg when there is a PARTITION BY and ORDER BY
+-- clause in the WindowFunc
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+-----------------------------------------------
+ WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> Seq Scan on empsalary
+(4 rows)
+
+-- Try more complex permutations of pushing the ORDER BY's sort below the
+-- WindowAgg. With multiple WindowAggs sharing the same sort, ensure we push
+-- the sort to the correct level.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- As above, but swap the order of the WindowFuncs to ensure their order is
+-- not a factor in the plan choice.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+ QUERY PLAN
+------------------------------------------------------
+ WindowAgg
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname, empno, enroll_date
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Ensure that we don't push the ORDER BY's sort down for the following cases.
+-- See the comment in create_one_window_path() for details of each case that
+-- we don't perform the optimization.
+-- Case #1: Don't push the ORDER BY's sort down when there's a DISTINCT clause
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------
+ Unique
+ -> Incremental Sort
+ Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?)), (count(*) OVER (?))
+ Presorted Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: enroll_date DESC
+ -> Seq Scan on empsalary
+(11 rows)
+
+-- Case #2: Don't push the ORDER BY's sort down when there's a LIMIT clause
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname
+LIMIT 1;
+ QUERY PLAN
+-----------------------------------------------
+ Limit
+ -> Incremental Sort
+ Sort Key: empno, depname
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(8 rows)
+
+-- Case #3: Don't push the ORDER BY's sort down when the top-level WindowAgg
+-- has a runCondition.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+ SELECT depname,
+ empno,
+ row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+ sum(salary) OVER (PARTITION BY depname ORDER BY empno) depsalary
+ FROM empsalary
+ ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn < 3;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Subquery Scan on e
+ -> Incremental Sort
+ Sort Key: empsalary.depname, empsalary.enroll_date, empsalary.empno
+ Presorted Key: empsalary.depname, empsalary.enroll_date
+ -> WindowAgg
+ Run Condition: (row_number() OVER (?) < 3)
+ -> Incremental Sort
+ Sort Key: empsalary.depname, empsalary.enroll_date
+ Presorted Key: empsalary.depname
+ -> WindowAgg
+ -> Sort
+ Sort Key: empsalary.depname, empsalary.empno
+ -> Seq Scan on empsalary
+(13 rows)
+
+-- Try a positive version of case #3 where the runCondition exists but it's
+-- not on the final WindowAgg. Ensure the sort is pushed below the WindowAgg
+-- in this case.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+ SELECT depname,
+ empno,
+ row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+ row_number() OVER (PARTITION BY depname ORDER BY empno) emprn2
+ FROM empsalary
+ ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn2 < 3;
+ QUERY PLAN
+-----------------------------------------------------------------------------------
+ Subquery Scan on e
+ -> WindowAgg
+ Filter: ((row_number() OVER (?)) < 3)
+ -> Incremental Sort
+ Sort Key: empsalary.depname, empsalary.enroll_date, empsalary.empno
+ Presorted Key: empsalary.depname
+ -> WindowAgg
+ Run Condition: (row_number() OVER (?) < 3)
+ -> Sort
+ Sort Key: empsalary.depname, empsalary.empno
+ -> Seq Scan on empsalary
+(11 rows)
+
+-- Case #4: Don't push the ORDER BY's sort down when the ORDER BY contains
+-- WindowFunc results
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+ QUERY PLAN
+--------------------------------------------
+ Incremental Sort
+ Sort Key: empno, (row_number() OVER (?))
+ Presorted Key: empno
+ -> WindowAgg
+ -> Sort
+ Sort Key: empno
+ -> Seq Scan on empsalary
+(7 rows)
+
RESET enable_hashagg;
-- Test Sort node reordering
EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index b7bd0a83da..6c7de48a2b 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -1324,6 +1324,120 @@ SELECT DISTINCT
FROM empsalary
ORDER BY depname, empno;
+CREATE INDEX empsalary_depname_idx ON empsalary(depname);
+SET enable_seqscan TO off;
+
+-- Ensure we don't push the ORDER BY sort below the WindowAgg when the input
+-- node to the WindowAgg is already sorted correctly. This would just move
+-- the sort down and not reduce the number of sorts.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname) depminsalary
+FROM empsalary
+ORDER BY depname, empno;
+
+DROP INDEX empsalary_depname_idx;
+RESET enable_seqscan;
+
+-- Ensure we push the ORDER BY sort below the WindowAgg
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ min(salary) OVER (PARTITION BY depname, empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- A more complex permutation of the above. Ensure the ORDER BY sort is
+-- pushed down below the WindowAgg when there is a PARTITION BY and ORDER BY
+-- clause in the WindowFunc
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Try more complex permutations of pushing the ORDER BY's sort below the
+-- WindowAgg. With multiple WindowAggs sharing the same sort, ensure we push
+-- the sort to the correct level.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- As above, but swap the order of the WindowFuncs to ensure their order is
+-- not a factor in the plan choice.
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ count(*) OVER (ORDER BY enroll_date DESC) c,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
+FROM empsalary
+ORDER BY depname, empno, enroll_date;
+
+-- Ensure that we don't push the ORDER BY's sort down for the following cases.
+-- See the comment in create_one_window_path() for details of each case that
+-- we don't perform the optimization.
+
+-- Case #1: Don't push the ORDER BY's sort down when there's a DISTINCT clause
+EXPLAIN (COSTS OFF)
+SELECT DISTINCT empno,
+ depname,
+ min(salary) OVER (PARTITION BY depname) depminsalary,
+ sum(salary) OVER (PARTITION BY depname) depsalary,
+ count(*) OVER (ORDER BY enroll_date DESC) c
+FROM empsalary
+ORDER BY depname, empno;
+
+-- Case #2: Don't push the ORDER BY's sort down when there's a LIMIT clause
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ depname,
+ min(salary) OVER (PARTITION BY empno) depminsalary,
+ sum(salary) OVER (PARTITION BY empno) depsalary
+FROM empsalary
+ORDER BY empno, depname
+LIMIT 1;
+
+-- Case #3: Don't push the ORDER BY's sort down when the top-level WindowAgg
+-- has a runCondition.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+ SELECT depname,
+ empno,
+ row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+ sum(salary) OVER (PARTITION BY depname ORDER BY empno) depsalary
+ FROM empsalary
+ ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn < 3;
+
+-- Try a positive version of case #3 where the runCondition exists but it's
+-- not on the final WindowAgg. Ensure the sort is pushed below the WindowAgg
+-- in this case.
+EXPLAIN (COSTS OFF)
+SELECT * FROM (
+ SELECT depname,
+ empno,
+ row_number() OVER (PARTITION BY depname ORDER BY enroll_date) emprn,
+ row_number() OVER (PARTITION BY depname ORDER BY empno) emprn2
+ FROM empsalary
+ ORDER BY depname, enroll_date, empno
+) e
+WHERE emprn2 < 3;
+
+-- Case #4: Don't push the ORDER BY's sort down when the ORDER BY contains
+-- WindowFunc results
+EXPLAIN (COSTS OFF)
+SELECT empno,
+ row_number() over (PARTITION BY empno) esalary
+FROM empsalary
+ORDER BY empno, esalary;
+
RESET enable_hashagg;
-- Test Sort node reordering
On 18/01/23 15:12, David Rowley wrote:
I also thought I'd better test that foreach_delete_current() works
with foreach_reverse(). I can confirm that it *does not* work
correctly. I guess maybe you only tested the fact that it deleted the
current item and not that the subsequent loop correctly went to the
item directly before the deleted item. There's a problem with that. We
skip an item.
Hmm, not really sure why did I miss that. I tried this again (added
following in postgres.c above
PortalStart)
List* l = NIL;
l = lappend(l, 1);
l = lappend(l, 2);
l = lappend(l, 3);
l = lappend(l, 4);
ListCell *lc;
foreach_reverse(lc, l)
{
if (foreach_current_index(lc) == 2) // delete 3
{
foreach_delete_current(l, lc);
}
}
foreach(lc, l)
{
int i = (int) lfirst(lc);
ereport(LOG,(errmsg("%d", i)));
}
Got result:
2023-01-18 20:23:28.115 IST [51007] LOG: 1
2023-01-18 20:23:28.115 IST [51007] STATEMENT: select pg_backend_pid();
2023-01-18 20:23:28.115 IST [51007] LOG: 2
2023-01-18 20:23:28.115 IST [51007] STATEMENT: select pg_backend_pid();
2023-01-18 20:23:28.115 IST [51007] LOG: 4
2023-01-18 20:23:28.115 IST [51007] STATEMENT: select pg_backend_pid();
I had expected list_delete_cell to take care of rest.
Instead of fixing that, I think it's likely better just to loop
backwards manually with a for() loop, so I've adjusted the patch to
work that way. It's quite likely that the additional code in
foreach() and what was in foreach_reverse() is slower than looping
manually due to the additional work those macros do to set the cell to
NULL when we run out of cells to loop over.
Okay, current version looks fine as well.
I made another pass over the v7 patch and fixed a bug that was
disabling the optimization when the deepest WindowAgg had a
runCondition. This should have been using llast_node instead of
linitial_node. The top-level WindowAgg is the last in the list.
I also made a pass over the tests and added a test case for the
runCondition check to make sure we disable the optimization when the
top-level WindowAgg has one of those.
I wasn't sure what your test comments case numbers were meant to represent.
They were not aligned with the code comments that document when the optimisation is
disabled, they started out aligned, but seemed to go off the rails at
#3. I've now made them follow the comments in create_one_window_path()
and made it more clear what we expect the test outcome to be in each
case.
Those were just numbering for exceptional cases, making them in sync
with comments
wasn't really on my mind, but now they looks better.
I've attached the v9 patch. I feel like this patch is quite
self-contained and I'm quite happy with it now. If there are no
objections soon, I'm planning on pushing it.
Patch is already rebased with latest master, tests are all green.
Tried some basic profiling and it looked good.
I also tried a bit unrealistic case.
create table abcdefgh(a int, b int, c int, d int, e int, f int, g int, h int);
insert into abcdefgh select a,b,c,d,e,f,g,h from
generate_series(1,7) a,
generate_series(1,7) b,
generate_series(1,7) c,
generate_series(1,7) d,
generate_series(1,7) e,
generate_series(1,7) f,
generate_series(1,7) g,
generate_series(1,7) h;
explain analyze select count(*) over (order by a),
row_number() over (partition by a order by b) from abcdefgh order by a,b,c,d,e,f,g,h;
In patch version
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
---------------
WindowAgg (cost=1023241.14..1225007.67 rows=5764758 width=48) (actual
time=64957.894..81950.352 rows=5764801 loops=1)
-> WindowAgg (cost=1023241.14..1138536.30 rows=5764758 width=40)
(actual time=37959.055..60391.799 rows=5764801 loops
=1)
-> Sort (cost=1023241.14..1037653.03 rows=5764758 width=32)
(actual time=37959.045..52968.791 rows=5764801 loop
s=1)
Sort Key: a, b, c, d, e, f, g, h
Sort Method: external merge Disk: 237016kB
-> Seq Scan on abcdefgh (cost=0.00..100036.58
rows=5764758 width=32) (actual time=0.857..1341.107 rows=57
64801 loops=1)
Planning Time: 0.168 ms
Execution Time: 82748.789 ms
(8 rows)
In Master
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
---------------------
Incremental Sort (cost=1040889.72..1960081.97 rows=5764758 width=48)
(actual time=23461.815..69654.700 rows=5764801 loop
s=1)
Sort Key: a, b, c, d, e, f, g, h
Presorted Key: a, b
Full-sort Groups: 49 Sort Method: quicksort Average Memory: 30kB
Peak Memory: 30kB
Pre-sorted Groups: 49 Sort Method: external merge Average Disk:
6688kB Peak Disk: 6688kB
-> WindowAgg (cost=1023241.14..1225007.67 rows=5764758 width=48)
(actual time=22729.171..40189.407 rows=5764801 loops
=1)
-> WindowAgg (cost=1023241.14..1138536.30 rows=5764758
width=40) (actual time=8726.562..18268.663 rows=5764801
loops=1)
-> Sort (cost=1023241.14..1037653.03 rows=5764758
width=32) (actual time=8726.551..11291.494 rows=5764801
loops=1)
Sort Key: a, b
Sort Method: external merge Disk: 237016kB
-> Seq Scan on abcdefgh (cost=0.00..100036.58
rows=5764758 width=32) (actual time=0.029..1600.042 r
ows=5764801 loops=1)
Planning Time: 2.742 ms
Execution Time: 71172.586 ms
(13 rows)
Patch version is approx 11 sec slower.
Patch version sort took around 15 sec, whereas master sort took ~2 + 46
= around 48 sec
BUT somehow master version is faster as Window function took 10 sec less
time.
Maybe I am interpreting it wrong but still wanted to bring this to notice.
Thanks,
Ankit
On Thu, 19 Jan 2023 at 06:27, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
Hmm, not really sure why did I miss that. I tried this again (added
following in postgres.c abovePortalStart)
List* l = NIL;
l = lappend(l, 1);
l = lappend(l, 2);
l = lappend(l, 3);
l = lappend(l, 4);ListCell *lc;
foreach_reverse(lc, l)
{
if (foreach_current_index(lc) == 2) // delete 3
{
foreach_delete_current(l, lc);
}
}
The problem is that the next item looked at is 1 and the value 2 is skipped.
I also tried a bit unrealistic case.
create table abcdefgh(a int, b int, c int, d int, e int, f int, g int, h int);
insert into abcdefgh select a,b,c,d,e,f,g,h from
generate_series(1,7) a,
generate_series(1,7) b,
generate_series(1,7) c,
generate_series(1,7) d,
generate_series(1,7) e,
generate_series(1,7) f,
generate_series(1,7) g,
generate_series(1,7) h;explain analyze select count(*) over (order by a),
row_number() over (partition by a order by b) from abcdefgh order by a,b,c,d,e,f,g,h;In patch version
Execution Time: 82748.789 ms
In Master
Execution Time: 71172.586 ms
Patch version sort took around 15 sec, whereas master sort took ~2 + 46
= around 48 sec
Maybe I am interpreting it wrong but still wanted to bring this to notice.
I think you are misinterpreting the results, but the main point
remains - it's slower. The explain analyze timing shows the time
between outputting the first row and the last row. For sort, there's a
lot of work that needs to be done before you output the first row.
I looked into this a bit further and using the same table as you, and
the attached set of hacks that adjust the ORDER BY path generation to
split a Sort into a Sort and Incremental Sort when the number of
pathkeys to sort by is > 2.
work_mem = '1GB' in all cases below:
I turned off the timing in EXPLAIN so that wasn't a factor in the
results. Generally having more nodes means more timing requests and
that's got > 0 overhead.
explain (analyze,timing off) select * from abcdefgh order by a,b,c,d,e,f,g,h;
7^8 rows
Master: Execution Time: 7444.479 ms
master + sort_hacks.diff: Execution Time: 5147.937 ms
So I'm also seeing Incremental Sort - > Sort faster than a single Sort
for 7^8 rows.
With 5^8 rows:
master: Execution Time: 299.949 ms
master + sort_hacks: Execution Time: 239.447 ms
and 4^8 rows:
master: Execution Time: 62.596 ms
master + sort_hacks: Execution Time: 67.900 ms
So at 4^8 sort_hacks becomes slower. I suspect this might be to do
with having to perform more swaps in the array elements that we're
sorting by with the full sort. When work_mem is large this array no
longer fits in CPU cache and I suspect those swaps become more costly
when there are more cache lines having to be fetched from RAM.
I think we really should fix tuplesort.c so that it batches sorts into
about L3 CPU cache-sized chunks in RAM rather than trying to sort much
larger arrays.
I'm just unsure if we should write this off as the expected behaviour
of Sort and continue with the patch or delay the whole thing until we
make some improvements to sort. I think more benchmarking is required
so we can figure out if this is a corner case or a common case. On the
other hand, we already sort WindowClauses with the most strict sort
order first which results in a full Sort and no additional sort for
subsequent WindowClauses that can make use of that sort order. It
would be bizarre to reverse the final few lines of common_prefix_cmp
so that we sort the least strict order first so we end up injecting
Incremental Sorts into the plan to make it faster!
David
Attachments:
sort_hacks.difftext/plain; charset=US-ASCII; name=sort_hacks.diffDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 093ace0864..a714e83a69 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5085,11 +5085,30 @@ create_ordered_paths(PlannerInfo *root,
* incremental sort when there are presorted keys.
*/
if (presorted_keys == 0 || !enable_incremental_sort)
- sorted_path = (Path *) create_sort_path(root,
- ordered_rel,
- input_path,
- root->sort_pathkeys,
- limit_tuples);
+ {
+ if (list_length(root->sort_pathkeys) > 2)
+ {
+ sorted_path = (Path *) create_sort_path(root,
+ ordered_rel,
+ input_path,
+ list_copy_head(root->sort_pathkeys,2),
+ limit_tuples);
+ sorted_path = (Path *) create_incremental_sort_path(root,
+ ordered_rel,
+ sorted_path,
+ root->sort_pathkeys,
+ 2,
+ limit_tuples);
+ }
+ else
+ {
+ sorted_path = (Path *) create_sort_path(root,
+ ordered_rel,
+ input_path,
+ root->sort_pathkeys,
+ limit_tuples);
+ }
+ }
else
sorted_path = (Path *) create_incremental_sort_path(root,
ordered_rel,
On 19/01/23 08:58, David Rowley wrote:
The problem is that the next item looked at is 1 and the value 2 is skipped.
Okay, I see the issue.
I think you are misinterpreting the results, but the main point
remains - it's slower. The explain analyze timing shows the time
between outputting the first row and the last row. For sort, there's a
lot of work that needs to be done before you output the first row.
Okay got it.
I looked into this a bit further and using the same table as you, and
the attached set of hacks that adjust the ORDER BY path generation to
split a Sort into a Sort and Incremental Sort when the number of
pathkeys to sort by is > 2.
Okay, so this is issue with strict sort itself.
I will play around with current patch but it should be fine for
current work and would account for issue in strict sort.
I suspect this might be to do with having to perform more swaps in the
array elements that we'resorting by with the full sort. When work_mem is
large this array no longer fits in CPU cache and I suspect those swaps become
more costly when there are more cache lines having to be fetched from RAM.
Looks like possible reason.
I think we really should fix tuplesort.c so that it batches sorts into
about L3 CPU cache-sized chunks in RAM rather than trying to sort much
larger arrays.
I assume same thing we are doing for incremental sort and that's why it
perform
better here?
Or is it due to shorter tuple width and thus being able to fit into
memory easily?
On the
other hand, we already sort WindowClauses with the most strict sort
order first which results in a full Sort and no additional sort for
subsequent WindowClauses that can make use of that sort order. It
would be bizarre to reverse the final few lines of common_prefix_cmp
so that we sort the least strict order first so we end up injecting
Incremental Sorts into the plan to make it faster!
I would see this as beneficial when tuple size is too huge for strict sort.
Basically
cost(sort(a,b,c,d,e)) > cost(sort(a)+sort(b)+sort(c)+ sort(d,e))
Don't know if cost based decision is realistic or not in current
state of things.
I think more benchmarking is required
so we can figure out if this is a corner case or a common case
I will do few more benchmarks. I assume all scaling issue with strict sort
to remain as it is but no further issue due to this change itself comes up.
I'm just unsure if we should write this off as the expected behaviour
of Sort and continue with the patch or delay the whole thing until we
make some improvements to sort.
Unless we found more issues, we might be good with the former but
you can make better call on this. As much as I would love my first patch
to get
merged, I would want it to be right.
Thanks,
Ankit
I think more benchmarking is required
so we can figure out if this is a corner case or a common case
I did some more benchmarks:
#1. AIM: Pushdown column whose size is very high
create table test(a int, b int, c text);
insert into test select a,b,c from generate_series(1,1000)a, generate_series(1,1000)b, repeat(md5(random()::text), 999)c;
explain (analyze, costs off) select count(*) over (order by a), row_number() over (order by a, b) from test order by a,b,c;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Incremental Sort (actual time=1161.605..6577.141 rows=1000000 loops=1)
Sort Key: a, b, c
Presorted Key: a, b
Full-sort Groups: 31250 Sort Method: quicksort Average Memory: 39kB Peak Memory: 39kB
-> WindowAgg (actual time=1158.896..5819.460 rows=1000000 loops=1)
-> WindowAgg (actual time=1154.614..3391.537 rows=1000000 loops=1)
-> Gather Merge (actual time=1154.602..2404.125 rows=1000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (actual time=1118.326..1295.743 rows=333333 loops=3)
Sort Key: a, b
Sort Method: external merge Disk: 145648kB
Worker 0: Sort Method: external merge Disk: 140608kB
Worker 1: Sort Method: external merge Disk: 132792kB
-> Parallel Seq Scan on test (actual time=0.018..169.319 rows=333333 loops=3)
Planning Time: 0.091 ms
Execution Time: 6816.616 ms
(17 rows)
Planner choose faster path correctly (which was not path which had pushed down column).
#2. AIM: Check strict vs incremental sorts wrt to large size data
Patch version is faster as for external merge sort, disk IO is main bottleneck and if we sort an extra column,
it doesn't have major impact. This is when work mem is very small.
For larger work_mem, difference between patched version and master is minimal and
they both provide somewhat comparable performance.
Tried permutation of few cases which we have already covered but I did not see anything alarming in those.
I'm just unsure if we should write this off as the expected behaviour
of Sort and continue with the patch or delay the whole thing until we
make some improvements to sort.
I am not seeing other cases where patch version is consistently slower.
Thanks,
Ankit
On Wed, 25 Jan 2023 at 06:32, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
I am not seeing other cases where patch version is consistently slower.
I just put together a benchmark script to try to help paint a picture
of under what circumstances reducing the number of sorts slows down
performance.
The benchmark I used creates a table with 2 int4 columns, "a" and "b".
I have a single WindowClause that sorts on "a" and then ORDER BY a,b.
In each test, I change the number of distinct values in "a" starting
with 1 distinct value then in each subsequent test I multiply that
number by 10 each time all the way up to 1 million. The idea here is
to control how often the sort that's performing a sort by both columns
must call the tiebreak function. I did 3 subtests for each number of
distinct rows in "a". 1) Unsorted case where the rows are put into the
tuples store in an order that's not presorted, 2) tuplestore rows are
already sorted leading to hitting our qsort fast path. 3) random
order.
You can see from the results that the patched version is not looking
particularly great. I did a 1 million row test and a 10 million row
test. work_mem was 4GB for each, so the sorts never went to disk.
Overall, the 1 million row test on master took 11.411 seconds and the
patched version took 11.282 seconds, so there was a *small* gain
overall there. With the 10 million row test, overall, master took
121.063 seconds, whereas patched took 130.727 seconds. So quite a
large performance regression there.
I'm unsure if 69749243 might be partially to blame here as it favours
single-key sorts. If you look at qsort_tuple_signed_compare(), you'll
see that the tiebreak function will be called only when it's needed
and there are > 1 sort keys. The comparetup function will re-compare
the first key all over again. If I get some more time I'll run the
tests again with the sort specialisation code disabled to see if the
situation is the same or not.
Another way to test that would be to have a table with 3 columns and
always sort by at least 2.
I've attached the benchmark script that I used and also a copy of the
patch with a GUC added solely to allow easier benchmarking of patched
vs unpatched.
I think we need to park this patch until we can figure out what can be
done to stop these regressions. We might want to look at 1) Expanding
on what 69749243 did and considering if we want sort specialisations
that are specifically for 1 column and another set for multi-columns.
The multi-column ones don't need to re-compare key[0] again. 2)
Sorting in smaller batches that better fit into CPU cache. Both of
these ideas would require a large amount of testing and discussion.
For #1 we're considering other specialisations, for example NOT NULL,
and we don't want to explode the number of specialisations we have to
compile into the binary.
David
Attachments:
1million_row_test.pngimage/png; name=1million_row_test.pngDownload
�PNG
IHDR � -ogG sRGB ��� gAMA ���a pHYs � ����e ��IDATx^�� �\U�����~3��<����~����
���
#�ATYp7APv�,��� ����v�%$��}�N��s���'��}�S��*��<�Q�n��[U����:�
M$>