Add memory/disk usage for WindowAgg nodes in EXPLAIN
1eff8279d4 added memory/disk usage for materialize nodes in EXPLAIN
ANALYZE.
In the commit message:
There are a few other executor node types that use tuplestores, so we
could also consider adding these details to the EXPLAIN ANALYZE for
those nodes too.
So I wanted to Add memory/disk usage for WindowAgg. Patch attached.
Since WindowAgg node could create multiple tuplestore for each Window
partition, we need to track each tuplestore storage usage so that the
maximum storage usage is determined. For this purpose I added new
fields to the WindowAggState.
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Attachments:
v1-0001-Add-memory-disk-usage-for-WindowAgg-nodes-in-EXPL.patchtext/x-patch; charset=us-asciiDownload+58-1
On Sat, 6 Jul 2024 at 23:23, Tatsuo Ishii <ishii@postgresql.org> wrote:
So I wanted to Add memory/disk usage for WindowAgg. Patch attached.
Thanks for working on that.
Since WindowAgg node could create multiple tuplestore for each Window
partition, we need to track each tuplestore storage usage so that the
maximum storage usage is determined. For this purpose I added new
fields to the WindowAggState.
I'd recently been looking at the code that recreates the tuplestore
for each partition and thought we could do a bit better. In [1]/messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com, I
proposed a patch to make this better.
If you based your patch on [1]/messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com, maybe a better way of doing this is
having tuplestore.c track the maximum space used on disk in an extra
field which is updated with tuplestore_clear(). It's probably ok to
update a new field called maxDiskSpace in tuplestore_clear() if
state->status != TSS_INMEM. If the tuplestore went to disk then an
extra call to BufFileSize() isn't going to be noticeable, even in
cases where we only just went over work_mem. You could then adjust
tuplestore_space_used() to look at maxDiskSpace and return that value
if it's larger than BufFileSize(state->myfile) and state->maxSpace.
You could check if maxDiskSpace == 0 to determine if the tuplestore
has ever gone to disk. tuplestore_storage_type_name() would also need
to check maxDiskSpace and return "Disk" if that's non-zero.
David
[1]: /messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com
On Sat, 6 Jul 2024 at 23:23, Tatsuo Ishii <ishii@postgresql.org> wrote:
So I wanted to Add memory/disk usage for WindowAgg. Patch attached.
Thanks for working on that.
Thank you for the infrastructure you created in tuplestore.c and explain.c.
BTW, it seems these executor nodes (other than Materialize and Window
Aggregate node) use tuplestore for their own purpose.
CTE Scan
Recursive Union
Table Function Scan
I have already implemented that for CTE Scan. Do you think other two
nodes are worth to add the information? I think for consistency sake,
it will better to add the info Recursive Union and Table Function
Scan.
Since WindowAgg node could create multiple tuplestore for each Window
partition, we need to track each tuplestore storage usage so that the
maximum storage usage is determined. For this purpose I added new
fields to the WindowAggState.I'd recently been looking at the code that recreates the tuplestore
for each partition and thought we could do a bit better. In [1], I
proposed a patch to make this better.If you based your patch on [1], maybe a better way of doing this is
having tuplestore.c track the maximum space used on disk in an extra
field which is updated with tuplestore_clear(). It's probably ok to
update a new field called maxDiskSpace in tuplestore_clear() if
state->status != TSS_INMEM. If the tuplestore went to disk then an
extra call to BufFileSize() isn't going to be noticeable, even in
cases where we only just went over work_mem. You could then adjust
tuplestore_space_used() to look at maxDiskSpace and return that value
if it's larger than BufFileSize(state->myfile) and state->maxSpace.
You could check if maxDiskSpace == 0 to determine if the tuplestore
has ever gone to disk. tuplestore_storage_type_name() would also need
to check maxDiskSpace and return "Disk" if that's non-zero.
Thank you for the suggestion. Yes, I noticed [1] and once it is
committed, I will start to study tuplestore.c in this direction.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Tue, 9 Jul 2024 at 14:44, Tatsuo Ishii <ishii@postgresql.org> wrote:
BTW, it seems these executor nodes (other than Materialize and Window
Aggregate node) use tuplestore for their own purpose.CTE Scan
Recursive Union
Table Function ScanI have already implemented that for CTE Scan. Do you think other two
nodes are worth to add the information?
Yes, I think so. I'd keep each as a separate patch so they can be
considered independently. Doing all of them should hopefully ensure we
strike the right balance of what code to put in explain.c and what
code to put in tuplestore.c. I think the WindowAgg's tuplestore usage
pattern might show that the API I picked isn't well suited when a
tuplestore is cleared and refilled over and over.
David
On Tue, Jul 9, 2024 at 8:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 9 Jul 2024 at 14:44, Tatsuo Ishii <ishii@postgresql.org> wrote:
BTW, it seems these executor nodes (other than Materialize and Window
Aggregate node) use tuplestore for their own purpose.CTE Scan
Recursive Union
Table Function ScanI have already implemented that for CTE Scan. Do you think other two
nodes are worth to add the information?Yes, I think so. I'd keep each as a separate patch so they can be
considered independently. Doing all of them should hopefully ensure we
strike the right balance of what code to put in explain.c and what
code to put in tuplestore.c.
+1
+ if (es->format != EXPLAIN_FORMAT_TEXT)
+ {
+ ExplainPropertyText("Storage", storageType, es);
+ ExplainPropertyInteger("Maximum Storage", "kB", spaceUsedKB, es);
+ }
+ else
+ {
+ ExplainIndentText(es);
+ appendStringInfo(es->str,
+ "Storage: %s Maximum Storage: " INT64_FORMAT "kB\n",
+ storageType,
+ spaceUsedKB);
+ }
It will be good to move this code to a function which will be called
by show_*_info functions(). We might even convert it into a tuplestore
specific implementation hook after David's work.
--
Best Wishes,
Ashutosh Bapat
Yes, I think so. I'd keep each as a separate patch so they can be
considered independently. Doing all of them should hopefully ensure we
strike the right balance of what code to put in explain.c and what
code to put in tuplestore.c.+1
+ if (es->format != EXPLAIN_FORMAT_TEXT) + { + ExplainPropertyText("Storage", storageType, es); + ExplainPropertyInteger("Maximum Storage", "kB", spaceUsedKB, es); + } + else + { + ExplainIndentText(es); + appendStringInfo(es->str, + "Storage: %s Maximum Storage: " INT64_FORMAT "kB\n", + storageType, + spaceUsedKB); + }It will be good to move this code to a function which will be called
by show_*_info functions().
I have already implemented that in this direction in my working in
progress patch:
/*
* Show information regarding storage method and maximum memory/disk space
* used.
*/
static void
show_storage_info(Tuplestorestate *tupstore, ExplainState *es)
Which can be shared by Material and CTE scan node. I am going to post
it after I take care Recursive Union and Table Function Scan node.
We might even convert it into a tuplestore
specific implementation hook after David's work.
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Yes, I think so. I'd keep each as a separate patch so they can be
considered independently. Doing all of them should hopefully ensure we
strike the right balance of what code to put in explain.c and what
code to put in tuplestore.c.+1
+ if (es->format != EXPLAIN_FORMAT_TEXT) + { + ExplainPropertyText("Storage", storageType, es); + ExplainPropertyInteger("Maximum Storage", "kB", spaceUsedKB, es); + } + else + { + ExplainIndentText(es); + appendStringInfo(es->str, + "Storage: %s Maximum Storage: " INT64_FORMAT "kB\n", + storageType, + spaceUsedKB); + }It will be good to move this code to a function which will be called
by show_*_info functions().I have already implemented that in this direction in my working in
progress patch:
Attached are the v2 patches. As suggested by David, I split them
into multiple patches so that each patch implements the feature for
each node. You need to apply the patches in the order of patch number
(if you want to apply all of them, "git apply v2-*.patch" should
work).
v2-0001-Refactor-show_material_info.patch:
This refactors show_material_info(). The guts are moved to new
show_storage_info() so that it can be shared by not only Materialized
node.
v2-0002-Add-memory-disk-usage-for-CTE-Scan-nodes-in-EXPLA.patch:
This adds memory/disk usage for CTE Scan nodes in EXPLAIN (ANALYZE) command.
v2-0003-Add-memory-disk-usage-for-Table-Function-Scan-nod.patch:
This adds memory/disk usage for Table Function Scan nodes in EXPLAIN (ANALYZE) command.
v2-0004-Add-memory-disk-usage-for-Recursive-Union-nodes-i.patch:
This adds memory/disk usage for Recursive Union nodes in EXPLAIN
(ANALYZE) command. Also show_storage_info() is changed so that it
accepts int64 storage_used, char *storage_type arguments. They are
used if the target node uses multiple tuplestores, in case a simple
call to tuplestore_space_used() does not work. Such executor nodes
need to collect storage_used while running the node. This type of node
includes Recursive Union and Window Aggregate.
v2-0005-Add-memory-disk-usage-for-Window-Aggregate-nodes-.patch: This
adds memory/disk usage for Window Aggregate nodes in EXPLAIN (ANALYZE)
command. Note that if David's proposal
/messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com
is committed, this will need to be adjusted.
For a demonstration, how storage/memory usage is shown in EXPLAIN
(notice "Storage: Memory Maximum Storage: 120kB" etc.). The script
used is attached (test.sql.txt). The SQLs are shamelessly copied from
David's example and the regression test (some of them were modified by
me).
EXPLAIN (ANALYZE, COSTS OFF)
SELECT count(t1.b) FROM (VALUES(1),(2)) t2(x) LEFT JOIN (SELECT * FROM t1 WHERE a <= 100) t1 ON TRUE;
QUERY PLAN
---------------------------------------------------------------------------------
Aggregate (actual time=0.345..0.346 rows=1 loops=1)
-> Nested Loop Left Join (actual time=0.015..0.330 rows=200 loops=1)
-> Values Scan on "*VALUES*" (actual time=0.001..0.003 rows=2 loops=1)
-> Materialize (actual time=0.006..0.152 rows=100 loops=2)
Storage: Memory Maximum Storage: 120kB
-> Seq Scan on t1 (actual time=0.007..0.213 rows=100 loops=1)
Filter: (a <= 100)
Rows Removed by Filter: 900
Planning Time: 0.202 ms
Execution Time: 0.377 ms
(10 rows)
-- CTE Scan node
EXPLAIN (ANALYZE, COSTS OFF)
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) OVER() FROM t;
QUERY PLAN
-----------------------------------------------------------------------------------
WindowAgg (actual time=0.151..0.169 rows=100 loops=1)
Storage: Memory Maximum Storage: 20kB
CTE t
-> Recursive Union (actual time=0.001..0.105 rows=100 loops=1)
Storage: Memory Maximum Storage: 17kB
-> Result (actual time=0.001..0.001 rows=1 loops=1)
-> WorkTable Scan on t t_1 (actual time=0.000..0.000 rows=1 loops=100)
Filter: (n < 100)
Rows Removed by Filter: 0
-> CTE Scan on t (actual time=0.002..0.127 rows=100 loops=1)
Storage: Memory Maximum Storage: 20kB
Planning Time: 0.053 ms
Execution Time: 0.192 ms
(13 rows)
-- Table Function Scan node
CREATE OR REPLACE VIEW public.jsonb_table_view6 AS
SELECT js2,
jsb2w,
jsb2q,
ia,
ta,
jba
FROM JSON_TABLE(
'null'::jsonb, '$[*]' AS json_table_path_0
PASSING
1 + 2 AS a,
'"foo"'::json AS "b c"
COLUMNS (
js2 json PATH '$' WITHOUT WRAPPER KEEP QUOTES,
jsb2w jsonb PATH '$' WITH UNCONDITIONAL WRAPPER KEEP QUOTES,
jsb2q jsonb PATH '$' WITHOUT WRAPPER OMIT QUOTES,
ia integer[] PATH '$' WITHOUT WRAPPER KEEP QUOTES,
ta text[] PATH '$' WITHOUT WRAPPER KEEP QUOTES,
jba jsonb[] PATH '$' WITHOUT WRAPPER KEEP QUOTES
)
);
CREATE VIEW
EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM jsonb_table_view6;
QUERY PLAN
-------------------------------------------------------------------------------
Table Function Scan on "json_table" (actual time=0.024..0.025 rows=1 loops=1)
Storage: Memory Maximum Storage: 17kB
Planning Time: 0.100 ms
Execution Time: 0.054 ms
(4 rows)
Attachments:
v2-0001-Refactor-show_material_info.patchtext/x-patch; charset=us-asciiDownload+15-5
v2-0002-Add-memory-disk-usage-for-CTE-Scan-nodes-in-EXPLA.patchtext/x-patch; charset=us-asciiDownload+15-1
v2-0003-Add-memory-disk-usage-for-Table-Function-Scan-nod.patchtext/x-patch; charset=us-asciiDownload+14-1
v2-0004-Add-memory-disk-usage-for-Recursive-Union-nodes-i.patchtext/x-patch; charset=us-asciiDownload+68-12
v2-0005-Add-memory-disk-usage-for-Window-Aggregate-nodes-.patchtext/x-patch; charset=us-asciiDownload+33-1
test.sql.txttext/plain; charset=us-asciiDownload
Hi!
+1 for the idea of the patch. Consider it useful.
I looked at the patch set and don't see any obvious defects. It applies
without any problems and looks pretty good for me.
Only one thing is left to do. Add basic tests for the added functionality
to make it committable. For example, as in the
mentioned 1eff8279d494b9.
--
Best regards,
Maxim Orlov.
Hi!
+1 for the idea of the patch. Consider it useful.
I looked at the patch set and don't see any obvious defects. It applies
without any problems and looks pretty good for me.
Thank you for reviewing my patch.
Only one thing is left to do. Add basic tests for the added functionality
to make it committable. For example, as in the
mentioned 1eff8279d494b9.
Agreed. Probably add to explain.sql?
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Wed, 4 Sept 2024 at 03:07, Tatsuo Ishii <ishii@postgresql.org> wrote:
Agreed. Probably add to explain.sql?
Yeah, I think this is an appropriate place.
--
Best regards,
Maxim Orlov.
On Wed, Jul 10, 2024 at 5:36 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
Attached are the v2 patches. As suggested by David, I split them
into multiple patches so that each patch implements the feature for
each node. You need to apply the patches in the order of patch number
(if you want to apply all of them, "git apply v2-*.patch" should
work).v2-0001-Refactor-show_material_info.patch:
This refactors show_material_info(). The guts are moved to new
show_storage_info() so that it can be shared by not only Materialized
node.v2-0002-Add-memory-disk-usage-for-CTE-Scan-nodes-in-EXPLA.patch:
This adds memory/disk usage for CTE Scan nodes in EXPLAIN (ANALYZE) command.v2-0003-Add-memory-disk-usage-for-Table-Function-Scan-nod.patch:
This adds memory/disk usage for Table Function Scan nodes in EXPLAIN (ANALYZE) command.v2-0004-Add-memory-disk-usage-for-Recursive-Union-nodes-i.patch:
This adds memory/disk usage for Recursive Union nodes in EXPLAIN
(ANALYZE) command. Also show_storage_info() is changed so that it
accepts int64 storage_used, char *storage_type arguments. They are
used if the target node uses multiple tuplestores, in case a simple
call to tuplestore_space_used() does not work. Such executor nodes
need to collect storage_used while running the node. This type of node
includes Recursive Union and Window Aggregate.v2-0005-Add-memory-disk-usage-for-Window-Aggregate-nodes-.patch: This
adds memory/disk usage for Window Aggregate nodes in EXPLAIN (ANALYZE)
command. Note that if David's proposal
/messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com
is committed, this will need to be adjusted.For a demonstration, how storage/memory usage is shown in EXPLAIN
(notice "Storage: Memory Maximum Storage: 120kB" etc.). The script
used is attached (test.sql.txt). The SQLs are shamelessly copied from
David's example and the regression test (some of them were modified by
me).
hi. I can roughly understand it.
I have one minor issue with the comment.
typedef struct RecursiveUnionState
{
PlanState ps; /* its first field is NodeTag */
bool recursing;
bool intermediate_empty;
Tuplestorestate *working_table;
Tuplestorestate *intermediate_table;
int64 storageSize; /* max storage size Tuplestore */
char *storageType; /* the storage type above */
....
}
"/* the storage type above */"
is kind of ambiguous, since there is more than one Tuplestorestate.
i think it roughly means: the storage type of working_table
while the max storage of working_table.
typedef struct WindowAggState
{
ScanState ss; /* its first field is NodeTag */
/* these fields are filled in by ExecInitExpr: */
List *funcs; /* all WindowFunc nodes in targetlist */
int numfuncs; /* total number of window functions */
int numaggs; /* number that are plain aggregates */
WindowStatePerFunc perfunc; /* per-window-function information */
WindowStatePerAgg peragg; /* per-plain-aggregate information */
ExprState *partEqfunction; /* equality funcs for partition columns */
ExprState *ordEqfunction; /* equality funcs for ordering columns */
Tuplestorestate *buffer; /* stores rows of current partition */
int64 storageSize; /* max storage size in buffer */
char *storageType; /* the storage type above */
}
" /* the storage type above */"
I think it roughly means:
" the storage type of WindowAggState->buffer while the max storage of
WindowAggState->buffer".
On Wed, 10 Jul 2024 at 21:36, Tatsuo Ishii <ishii@postgresql.org> wrote:
v2-0005-Add-memory-disk-usage-for-Window-Aggregate-nodes-.patch: This
adds memory/disk usage for Window Aggregate nodes in EXPLAIN (ANALYZE)
command. Note that if David's proposal
/messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com
is committed, this will need to be adjusted.
Hi,
I pushed the changes to WindowAgg so as not to call tuplestore_end()
on every partition. Can you rebase this patch over that change?
It would be good to do this in a way that does not add any new state
to WindowAggState, you can see that I had to shuffle fields around in
that struct because the next_parition field would have caused the
struct to become larger. I've not looked closely, but I expect this
can be done by adding more code to tuplestore_updatemax() to also
track the disk space used if the current storage has gone to disk. I
expect the maxSpace field can be used for both, but we'd need another
bool field to track if the max used was by disk or memory.
I think the performance of this would also need to be tested as it
means doing an lseek() on every tuplestore_clear() when we've gone to
disk. Probably that will be dominated by all the other overheads of a
tuplestore going to disk (i.e. dumptuples() etc), but it would be good
to check this. I suggest setting work_mem = 64 and making a test case
that only just spills to disk. Maybe do a few thousand partitions
worth of that and see if you can measure any slowdown.
David
Hi,
On Wed, 10 Jul 2024 at 21:36, Tatsuo Ishii <ishii@postgresql.org> wrote:
v2-0005-Add-memory-disk-usage-for-Window-Aggregate-nodes-.patch: This
adds memory/disk usage for Window Aggregate nodes in EXPLAIN (ANALYZE)
command. Note that if David's proposal
/messages/by-id/CAHoyFK9n-QCXKTUWT_xxtXninSMEv+gbJN66-y6prM3f4WkEHw@mail.gmail.com
is committed, this will need to be adjusted.Hi,
I pushed the changes to WindowAgg so as not to call tuplestore_end()
on every partition. Can you rebase this patch over that change?It would be good to do this in a way that does not add any new state
to WindowAggState, you can see that I had to shuffle fields around in
that struct because the next_parition field would have caused the
struct to become larger. I've not looked closely, but I expect this
can be done by adding more code to tuplestore_updatemax() to also
track the disk space used if the current storage has gone to disk. I
expect the maxSpace field can be used for both, but we'd need another
bool field to track if the max used was by disk or memory.I think the performance of this would also need to be tested as it
means doing an lseek() on every tuplestore_clear() when we've gone to
disk. Probably that will be dominated by all the other overheads of a
tuplestore going to disk (i.e. dumptuples() etc), but it would be good
to check this. I suggest setting work_mem = 64 and making a test case
that only just spills to disk. Maybe do a few thousand partitions
worth of that and see if you can measure any slowdown.
Thank you for the suggestion. I will look into this.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Hi,
hi. I can roughly understand it.
I have one minor issue with the comment.
typedef struct RecursiveUnionState
{
PlanState ps; /* its first field is NodeTag */
bool recursing;
bool intermediate_empty;
Tuplestorestate *working_table;
Tuplestorestate *intermediate_table;
int64 storageSize; /* max storage size Tuplestore */
char *storageType; /* the storage type above */
....
}"/* the storage type above */"
is kind of ambiguous, since there is more than one Tuplestorestate.i think it roughly means: the storage type of working_table
while the max storage of working_table.typedef struct WindowAggState
{
ScanState ss; /* its first field is NodeTag *//* these fields are filled in by ExecInitExpr: */
List *funcs; /* all WindowFunc nodes in targetlist */
int numfuncs; /* total number of window functions */
int numaggs; /* number that are plain aggregates */WindowStatePerFunc perfunc; /* per-window-function information */
WindowStatePerAgg peragg; /* per-plain-aggregate information */
ExprState *partEqfunction; /* equality funcs for partition columns */
ExprState *ordEqfunction; /* equality funcs for ordering columns */
Tuplestorestate *buffer; /* stores rows of current partition */
int64 storageSize; /* max storage size in buffer */
char *storageType; /* the storage type above */
}" /* the storage type above */"
I think it roughly means:
" the storage type of WindowAggState->buffer while the max storage of
WindowAggState->buffer".
Thank you for looking into my patch. Unfortunately I need to work on
other issue before adjusting the comments because the fields might go
away if I change the tuplestore infrastructure per David's suggestion:
/messages/by-id/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com
After this I will rebase the patches. This commit requires changes.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=908a968612f9ed61911d8ca0a185b262b82f1269
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
Hi David,
I pushed the changes to WindowAgg so as not to call tuplestore_end()
on every partition. Can you rebase this patch over that change?It would be good to do this in a way that does not add any new state
to WindowAggState, you can see that I had to shuffle fields around in
that struct because the next_parition field would have caused the
struct to become larger. I've not looked closely, but I expect this
can be done by adding more code to tuplestore_updatemax() to also
track the disk space used if the current storage has gone to disk. I
expect the maxSpace field can be used for both, but we'd need another
bool field to track if the max used was by disk or memory.
I have created a patch in the direction you suggested. See attached
patch (v1-0001-Enhance-tuplestore.txt). To not confuse CFbot, the
extension is "txt", not "patch".
I think the performance of this would also need to be tested as it
means doing an lseek() on every tuplestore_clear() when we've gone to
disk. Probably that will be dominated by all the other overheads of a
tuplestore going to disk (i.e. dumptuples() etc), but it would be good
to check this. I suggest setting work_mem = 64 and making a test case
that only just spills to disk. Maybe do a few thousand partitions
worth of that and see if you can measure any slowdown.
I copied your shell script and slightly modified it then ran pgbench
with (1 10 100 1000 5000 10000) window partitions (see attached shell
script). In the script I set work_mem to 64kB. It seems for 10000,
1000 and 100 partitions, the performance difference seems
noises. However, for 10, 2, 1 partitions. I see large performance
degradation with the patched version: patched is slower than stock
master in 1.5% (10 partitions), 41% (2 partitions) and 55.7% (1
partition). See the attached graph.
On Fri, 6 Sept 2024 at 16:21, Tatsuo Ishii <ishii@postgresql.org> wrote:
However, for 10, 2, 1 partitions. I see large performance
degradation with the patched version: patched is slower than stock
master in 1.5% (10 partitions), 41% (2 partitions) and 55.7% (1
partition). See the attached graph.
Thanks for making the adjustments to this.
I don't think there is any need to call tuplestore_updatemax() from
within writetup_heap(). That means having to update the maximum space
used every time a tuple is written to disk. That's a fairly massive
overhead.
Instead, it should be fine to modify tuplestore_updatemax() to set a
flag to true if state->status != TSS_INMEM and then record the disk
space used. That flag won't ever be set to false again.
tuplestore_storage_type_name() should just return "Disk" if the new
disk flag is set, even if state->status == TSS_INMEM. Since the
work_mem size won't change between tuplestore_clear() calls, if we've
once spilt to disk, then we shouldn't care about the memory used for
runs that didn't. Those will always have used less memory.
I did this quickly, but playing around with the attached, I didn't see
any slowdown.
Here's the results I got on my Zen2 AMD machine:
parts master yours mine mine_v_master
10000 5.01 5.12 5.09 99%
1000 4.30 4.25 4.24 101%
100 4.17 4.13 4.12 101%
10 4.16 4.12 4.10 101%
2 4.75 7.64 4.73 100%
1 4.75 8.57 4.73 100%
David
Attachments:
tuplestore_track_max_disk_space_used.patch.txttext/plain; charset=US-ASCII; name=tuplestore_track_max_disk_space_used.patch.txtDownload+15-10
Thanks for making the adjustments to this.
I don't think there is any need to call tuplestore_updatemax() from
within writetup_heap(). That means having to update the maximum space
used every time a tuple is written to disk. That's a fairly massive
overhead.Instead, it should be fine to modify tuplestore_updatemax() to set a
flag to true if state->status != TSS_INMEM and then record the disk
space used. That flag won't ever be set to false again.
tuplestore_storage_type_name() should just return "Disk" if the new
disk flag is set, even if state->status == TSS_INMEM. Since the
work_mem size won't change between tuplestore_clear() calls, if we've
once spilt to disk, then we shouldn't care about the memory used for
runs that didn't. Those will always have used less memory.I did this quickly, but playing around with the attached, I didn't see
any slowdown.
Your patch looks good to me and I confirmed that with your patch I
didn't see any slowdown either. Thanks!
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Fri, Sep 6, 2024 at 11:32 AM Tatsuo Ishii <ishii@postgresql.org> wrote:
Thanks for making the adjustments to this.
I don't think there is any need to call tuplestore_updatemax() from
within writetup_heap(). That means having to update the maximum space
used every time a tuple is written to disk. That's a fairly massive
overhead.Instead, it should be fine to modify tuplestore_updatemax() to set a
flag to true if state->status != TSS_INMEM and then record the disk
space used. That flag won't ever be set to false again.
tuplestore_storage_type_name() should just return "Disk" if the new
disk flag is set, even if state->status == TSS_INMEM. Since the
work_mem size won't change between tuplestore_clear() calls, if we've
once spilt to disk, then we shouldn't care about the memory used for
runs that didn't. Those will always have used less memory.I did this quickly, but playing around with the attached, I didn't see
any slowdown.Your patch looks good to me and I confirmed that with your patch I
didn't see any slowdown either. Thanks!
The changes look better. A nitpick though. With their definitions
changed, I think it's better to change the names of the functions
since their purpose has changed. Right now they report the storage
type and size used, respectively, at the time of calling the function.
With this patch, they report maximum space ever used and the storage
corresponding to the maximum space. tuplestore_space_used() may be
changed to tuplestore_maxspace_used(). I am having difficulty with
tuplestore_storage_type_name(); tuplestore_largest_storage_type_name()
seems mouthful and yet not doing justice to the functionality. It
might be better to just have one funciton tuplestore_maxspace_used()
which returns both the maximum space used as well as the storage type
when maximum space was used.
The comments need a bit of grammar fixes, but that can be done when
finalizing the patches.
--
Best Wishes,
Ashutosh Bapat
The changes look better. A nitpick though. With their definitions
changed, I think it's better to change the names of the functions
since their purpose has changed. Right now they report the storage
type and size used, respectively, at the time of calling the function.
With this patch, they report maximum space ever used and the storage
corresponding to the maximum space. tuplestore_space_used() may be
changed to tuplestore_maxspace_used(). I am having difficulty with
tuplestore_storage_type_name(); tuplestore_largest_storage_type_name()
seems mouthful and yet not doing justice to the functionality. It
might be better to just have one funciton tuplestore_maxspace_used()
which returns both the maximum space used as well as the storage type
when maximum space was used.
+1. Returning the storage type by the same function, not by a separate
function looks more natural.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
On Fri, 6 Sept 2024 at 19:08, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
The changes look better. A nitpick though. With their definitions
changed, I think it's better to change the names of the functions
since their purpose has changed. Right now they report the storage
type and size used, respectively, at the time of calling the function.
With this patch, they report maximum space ever used and the storage
corresponding to the maximum space. tuplestore_space_used() may be
changed to tuplestore_maxspace_used(). I am having difficulty with
tuplestore_storage_type_name(); tuplestore_largest_storage_type_name()
seems mouthful and yet not doing justice to the functionality. It
might be better to just have one funciton tuplestore_maxspace_used()
which returns both the maximum space used as well as the storage type
when maximum space was used.
How about just removing tuplestore_storage_type_name() and
tuplestore_space_used() and adding tuplestore_get_stats(). I did take
some inspiration from tuplesort.c for this, so maybe we can defer back
there for further guidance. I'm not so sure it's worth having a stats
struct type like tuplesort.c has. All we need is a char ** and an
int64 * output parameter to pass to the stats function. I don't think
we need to copy the tuplesort_method_name(). It seems fine just to
point the output parameter of the stats function at the statically
allocated constant.
David