From 69d7de5ea2a7fbb1aa272b8b7b32017ea053160e Mon Sep 17 00:00:00 2001
From: "David G. Johnston" <David.G.Johnston@Gmail.com>
Date: Fri, 7 Mar 2025 20:07:35 -0700
Subject: [PATCH] doc: more thoroughly explain window function processing.

---
 doc/src/sgml/queries.sgml | 98 +++++++++++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 372cce1a48..c59dfaa3cb 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1460,25 +1460,103 @@ GROUP BY GROUPING SETS (
    </para>
 
    <para>
-    When multiple window functions are used, all the window functions having
-    syntactically equivalent <literal>PARTITION BY</literal> and <literal>ORDER BY</literal>
-    clauses in their window definitions are guaranteed to be evaluated in a
-    single pass over the data. Therefore they will see the same sort ordering,
+    The following is a query using multiple window functions to provide
+    a concrete example to refer to when explaining the execution behaviors
+    of such a query.  The result is shown for reference while
+    the <command>EXPLAIN</command> output is needed for the subsequent discussion.
+    A full understanding of explain output is not necessary here; the pertinent
+    details are described below.
+
+<programlisting>
+WITH vals (i,j) AS ( VALUES ('A',1), ('B',2),('B',3) )
+SELECT *,
+  COUNT(i) OVER (PARTITION BY i) AS count_i_noframe,
+  COUNT(j) OVER (PARTITION BY j) AS count_j,
+  SUM(j) OVER (PARTITION BY j) AS sum_j,
+  COUNT(i) OVER (
+    PARTITION BY i
+    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+  ) AS count_i_frame
+FROM vals;
+
+ i | j | count_i_noframe | count_j | sum_j | count_i_frame
+---+---+-----------------+---------+-------+---------------
+ A | 1 |               1 |       1 |     1 |             1
+ B | 2 |               2 |       1 |     2 |             2
+ B | 3 |               2 |       1 |     3 |             2
+
+                       QUERY PLAN
+---------------------------------------------------------
+ WindowAgg
+   ->  WindowAgg
+         ->  Sort
+               Sort Key: "*VALUES*".column1
+               ->  WindowAgg
+                     ->  Sort
+                           Sort Key: "*VALUES*".column2
+                           ->  Values Scan on "*VALUES*"
+</programlisting>
+
+   </para>
+
+   <para>
+    In the explain output example above the bottom-most node is the unsorted data-producing
+    values scan which adds columns <literal>i</literal> and <literal>j</literal> to the output.
+    (This would  be replaced with group by results or a normal join and filter for most real queries.)
+    Next, in order to add columns <literal>count_j</literal> and <literal>sum_j</literal> to the output,
+    the data is sorted on <literal>j</literal> to match their shared partition specification.
+    Next, the data is re-sorted using <literal>i</literal> so that <literal>count_i_noframe</literal>
+    can be added; and lastly the same ordered data is traversed again to add <literal>count_i_frame</literal> to the result.
+    (Written ordering assumed but they could be swapped.)
+   </para>
+
+   <para>
+    As seen in the explain output, the presence of window functions will result
+    in a stack of <literal>WindowAgg</literal> nodes with either an inital data producer plus explicit
+    sort node (see example) or a leaf node that produces ordered results
+    (e.g., an Index Only Scan).
+    Additional sort nodes (see example) may be added if there are multiple window
+    functions each specifying different orderings.
+    Each window function is associated with one <literal>WindowAgg</literal> node during planning,
+    though the same <literal>WindowAgg</literal> node may be used for multiple window functions.
+    (This is why there is only a single <literal>WindowAgg</literal> for column2/partition by j:
+    the <literal>sum_j</literal> and <literal>count_j</literal> columns are sharing.)
+    In particular, de-duplication of <literal>WindowAgg</literal> nodes happens during parsing and
+    planning (the later taking into account any frame clause optimizations actual
+    window functions may allow).  Note that while there is an implicit default window
+    frame for any over clause that does not specify one, for purposes of uniqueness the
+    absence of a frame clause and one specifying an equivalent frame explicitly are
+    considered different.
+    (This is why there are still two <literal>WindowAgg</literal> nodes for the two window
+    functions specifying <literal>partition by i</literal>.)
+   </para>
+
+   <para>
+    When multiple window functions are used the SQL standard requires that all
+    order-equivalent invocations see the same ordering of peer rows.
+    In practice this means that <literal>WindowAgg</literal> nodes corresponding to
+    specifications containing the same
+    <literal>PARTITION BY</literal> and <literal>ORDER BY</literal>
+    clauses in their window definitions will appear together in the stack,
+    without any intervening sort node. Thereby guaranteeing they will see the same sort ordering,
     even if the <literal>ORDER BY</literal> does not uniquely determine an ordering.
     However, no guarantees are made about the evaluation of functions having
     different <literal>PARTITION BY</literal> or <literal>ORDER BY</literal> specifications.
     (In such cases a sort step is typically required between the passes of
     window function evaluations, and the sort is not guaranteed to preserve
     ordering of rows that its <literal>ORDER BY</literal> sees as equivalent.)
+    However, the <productname>PostgreSQL</productname> planner does attempt to determine
+    whether, given two <literal>ORDER BY</literal> specifications, one is a prefix of the other,
+    in which case both window functions will see the ordering required for the more specific one.
    </para>
 
    <para>
-    Currently, window functions always require presorted data, and so the
-    query output will be ordered according to one or another of the window
-    functions' <literal>PARTITION BY</literal>/<literal>ORDER BY</literal> clauses.
-    It is not recommended to rely on this, however.  Use an explicit
-    top-level <literal>ORDER BY</literal> clause if you want to be sure the
-    results are sorted in a particular way.
+    Currently, a query without an order by clause, but containing one or more order-specifying
+    window functions (i.e., e.g., not <literal>count(*) over ()</literal>), will still end
+    up producing sorted output since the <literal>WindowAgg</literal> node will be the the
+    row-producing node and it usually outputs sorted data.  It is not recommended to rely on this,
+    however.  Use an explicit top-level <literal>ORDER BY</literal> clause if you want to be sure
+    the results are sorted in a particular way.
    </para>
   </sect2>
  </sect1>
-- 
2.34.1