pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.

Started by Tatsuo Ishiiover 1 year ago6 messages
#1Tatsuo Ishii
ishii@postgresql.org

Add memory/disk usage for Window aggregate nodes in EXPLAIN.

This commit is similar to 1eff8279d and expands the idea to Window
aggregate nodes so that users can know how much memory or disk the
tuplestore used.

This commit uses newly introduced tuplestore_get_stats() to inquire this
information and add some additional output in EXPLAIN ANALYZE to
display the information for the Window aggregate node.

Reviewed-by: David Rowley, Ashutosh Bapat, Maxim Orlov, Jian He
Discussion: /messages/by-id/20240706.202254.89740021795421286.ishii@postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/95d6e9af07d2e5af2fdd272e72b5b552bad3ea0a

Modified Files
--------------
src/backend/commands/explain.c | 68 +++++++++++++++++++++++++----------
src/test/regress/expected/explain.out | 38 ++++++++++++++++++++
src/test/regress/sql/explain.sql | 9 +++++
3 files changed, 97 insertions(+), 18 deletions(-)

#2Christoph Berg
myon@debian.org
In reply to: Tatsuo Ishii (#1)
Re: pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.

Re: Tatsuo Ishii

Add memory/disk usage for Window aggregate nodes in EXPLAIN.

This is failing for PG18 on Debian unstable on 32-bit i386:

******** build/src/test/regress/regression.diffs ********
diff -U3 /build/reproducible-path/postgresql-18-18~~devel.20250331/src/test/regress/expected/explain.out /build/reproducible-path/postgresql-18-18~~devel.20250331/build/src/test/regress/results/explain.out
--- /build/reproducible-path/postgresql-18-18~~devel.20250331/src/test/regress/expected/explain.out	2025-03-31 06:17:21.000000000 +0000
+++ /build/reproducible-path/postgresql-18-18~~devel.20250331/build/src/test/regress/results/explain.out	2025-03-31 15:18:19.914783369 +0000
@@ -792,7 +792,7 @@
 ----------------------------------------------------------------------------------
  WindowAgg (actual time=N.N..N.N rows=N.N loops=N)
    Window: w1 AS ()
-   Storage: Disk  Maximum Storage: NkB
+   Storage: Memory  Maximum Storage: NkB
    ->  Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)
  Planning Time: N.N ms
  Execution Time: N.N ms

This is the first build of PG18 on i386, so I cannot say if it broke with that commit or later.

Full log: https://buildd.debian.org/status/fetch.php?pkg=postgresql-18&arch=i386&ver=18%7E%7Edevel.20250331-1&stamp=1743434305&raw=0

Christoph

#3David Rowley
dgrowleyml@gmail.com
In reply to: Christoph Berg (#2)
Re: pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.

On Tue, 1 Apr 2025 at 04:40, Christoph Berg <myon@debian.org> wrote:

-   Storage: Disk  Maximum Storage: NkB
+   Storage: Memory  Maximum Storage: NkB
->  Function Scan on generate_series a (actual time=N.N..N.N rows=N.N loops=N)

We'll probably just need to bump that 2000 row count to something a
bit more for 32-bit.

Any chance you could share the output of:

explain (analyze,buffers off,costs off) select sum(n) over() from
generate_series(1,2000) a(n);

Could you maybe also do a binary search for the number of rows where
it goes to disk by adjusting the 2000 up in some increments until the
Storage method is disk? (Not that I think we should set it to the
minimum, but it would be good to not set it too much higher than we
need to)

David

#4Christoph Berg
myon@debian.org
In reply to: David Rowley (#3)
Re: pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.

Re: David Rowley

Any chance you could share the output of:

explain (analyze,buffers off,costs off) select sum(n) over() from
generate_series(1,2000) a(n);

PostgreSQL 18devel on x86-linux, compiled by gcc-14.2.0, 32-bit

=# explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2000) a(n);
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────
WindowAgg (actual time=1.248..1.731 rows=2000.00 loops=1)
Window: w1 AS ()
Storage: Memory Maximum Storage: 63kB
-> Function Scan on generate_series a (actual time=0.301..0.536 rows=2000.00 loops=1)
Planning Time: 0.066 ms
Execution Time: 1.913 ms
(6 rows)

Could you maybe also do a binary search for the number of rows where
it goes to disk by adjusting the 2000 up in some increments until the
Storage method is disk? (Not that I think we should set it to the
minimum, but it would be good to not set it too much higher than we
need to)

The test has a `set work_mem = 64;` which I used here:

=# explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2047) a(n);
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────
WindowAgg (actual time=1.037..1.429 rows=2047.00 loops=1)
Window: w1 AS ()
Storage: Memory Maximum Storage: 64kB
-> Function Scan on generate_series a (actual time=0.262..0.457 rows=2047.00 loops=1)
Planning Time: 0.058 ms
Execution Time: 1.594 ms
(6 rows)

=# explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2048) a(n);
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────
WindowAgg (actual time=2.073..2.686 rows=2048.00 loops=1)
Window: w1 AS ()
Storage: Disk Maximum Storage: 65kB
-> Function Scan on generate_series a (actual time=0.624..1.064 rows=2048.00 loops=1)
Planning Time: 0.064 ms
Execution Time: 2.934 ms
(6 rows)

(With the default work_mem, the tipping point is around 149500)

Christoph

#5David Rowley
dgrowleyml@gmail.com
In reply to: Christoph Berg (#4)
Re: pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.

On Tue, 1 Apr 2025 at 09:40, Christoph Berg <myon@debian.org> wrote:

=# explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2048) a(n);
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────
WindowAgg (actual time=2.073..2.686 rows=2048.00 loops=1)
Window: w1 AS ()
Storage: Disk Maximum Storage: 65kB

Thank you for testing that. I've just pushed a patch to bump it up to 2500.

I suspect the buildfarm didn't catch this due to the tuplestore
consuming enough memory in MEMORY_CONTEXT_CHECKING builds.

David

#6Tatsuo Ishii
ishii@postgresql.org
In reply to: David Rowley (#5)
Re: pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.

From: David Rowley <dgrowleyml@gmail.com>
Subject: Re: pgsql: Add memory/disk usage for Window aggregate nodes in EXPLAIN.
Date: Tue, 1 Apr 2025 11:09:11 +1300
Message-ID: <CAApHDvoOHfFYXUryAymxiZjvyvhEt0ueeBOJRUOJWn1W7e3eyA@mail.gmail.com>

On Tue, 1 Apr 2025 at 09:40, Christoph Berg <myon@debian.org> wrote:

=# explain (analyze,buffers off,costs off) select sum(n) over() from generate_series(1,2048) a(n);
QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────
WindowAgg (actual time=2.073..2.686 rows=2048.00 loops=1)
Window: w1 AS ()
Storage: Disk Maximum Storage: 65kB

Thank you for testing that. I've just pushed a patch to bump it up to 2500.

I suspect the buildfarm didn't catch this due to the tuplestore
consuming enough memory in MEMORY_CONTEXT_CHECKING builds.

David,
Christoph,

Thank you for fixing this!
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp