Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Hi all ,
It's reported on pgsql-bugs[1]/messages/by-id/20210709084355.GA6251@depesz.com that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.
Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)
After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)
Feedback is very welcome.
Regards,
[1]: /messages/by-id/20210709084355.GA6251@depesz.com
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v1-0001-Track-I-O-timing-for-temp-buffers.patchapplication/x-patch; name=v1-0001-Track-I-O-timing-for-temp-buffers.patchDownload+67-14
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com>
escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)Feedback is very welcome.
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
I think can remove this lines:
+ if (has_temp_timing)
+ appendStringInfoChar(es->str, ',');
regards,
Ranier Vilela
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)Feedback is very welcome.
Thank you for the comments!
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
Yeah, it looks better to add "shared/local".
I think can remove this lines: + if (has_temp_timing) + appendStringInfoChar(es->str, ',');
But I think that it's consistent with buffers statistics in EXPLAIN
command. For example, "Buffers" in the output of EXPLAIN command could
be like:
Buffers: shared hit=1398, temp read=526 written=526
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)Feedback is very welcome.
Thank you for the comments!
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
Yeah, it looks better to add "shared/local".
I've attached the updated patch that incorporates the above comment.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v2-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v2-0001-Track-I-O-timing-for-temp-buffers.patchDownload+68-14
Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com>
escreveu:
On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com>
wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <
sawada.mshk@gmail.com> escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)Feedback is very welcome.
Thank you for the comments!
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487
write=2.073
Yeah, it looks better to add "shared/local".
I've attached the updated patch that incorporates the above comment.
The patch looks fine to me.
regards,
Ranier Vilela
Hi,
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, August 24th, 2021 at 13:20, Ranier Vilela <ranier.vf@gmail.com> wrote:
Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
<snip>
I've attached the updated patch that incorporates the above comment.
The patch looks fine to me.
The patch looks good to me too. However I do wonder why the timing is added only on
the
if (es->format == EXPLAIN_FORMAT_TEXT)
block and is not added when, for example, the format is json. The instrumentation has
clearly recorded the timings regardless of the output format.
Also, it might be worth while to consider adding some regression tests. To my
understanding, explain.sql provides a function, explain_filter, which helps create
a stable result. For example, such a test case can be:
set track_io_timing = 'on';
select explain_filter('explain (analyze, buffers) select count(*) from generate_series(1,100000)');
then it would be enough to verify that the line:
I/O Timings: temp read=N.N write=N.N
is present. The above would apply on the json output via `explain_filter_to_json`
of course.
Thoughts?
Cheers,
//Georgios
On Tue, Sep 7, 2021 at 12:02 AM <gkokolatos@pm.me> wrote:
Hi,
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, August 24th, 2021 at 13:20, Ranier Vilela <ranier.vf@gmail.com> wrote:Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
<snip>
I've attached the updated patch that incorporates the above comment.
The patch looks fine to me.
Thank you for the comments!
The patch looks good to me too. However I do wonder why the timing is added only on
theif (es->format == EXPLAIN_FORMAT_TEXT)
block and is not added when, for example, the format is json. The instrumentation has
clearly recorded the timings regardless of the output format.
Good point. Fixed.
Also, it might be worth while to consider adding some regression tests. To my
understanding, explain.sql provides a function, explain_filter, which helps create
a stable result. For example, such a test case can be:set track_io_timing = 'on';
select explain_filter('explain (analyze, buffers) select count(*) from generate_series(1,100000)');then it would be enough to verify that the line:
I/O Timings: temp read=N.N write=N.N
is present. The above would apply on the json output via `explain_filter_to_json`
of course.
Agreed. I've added regression tests.
I've attached an updated patch. Please review it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v3-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v3-0001-Track-I-O-timing-for-temp-buffers.patchDownload+186-13
On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
Yeah, it looks better to add "shared/local".
Using the patch, I do feel like the EXPLAIN format of
shared/local xxx, temp xxx
is a bit confusing. If temp is going to be its own EXPLAIN IO timing
category (as opposed to being summed with relation data block IO from
local and shared buffers), then it seems like local and shared should
be separated as well.
shared xxx, local xxx, temp xxx
With your patch applied, below is the top of the EXPLAIN output for a
query joining a temporary table (so using local buffers) to a regular
table (using shared buffers) and spilling the hash join (temp files).
Aggregate (actual rows=1 loops=1)
Buffers: shared read=4425, local read=4425 dirtied=4425
written=4423, temp read=5963 written=5963
I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198
I found that using the same terminology as the "EXPLAIN BUFFERS" output
but not using the same categories was kind of confusing.
If it is only meaningful to distinguish between relation data IO and
query temp file IO, then maybe the words used in I/O Timings in EXPLAIN
should be "rel data" and "temp" or something like that.
- Melanie
Hi,
On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote:
I've attached an updated patch. Please review it.
It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.
The failures look like:
diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000
@@ -512,9 +512,10 @@
I/O Timings: temp read=N.N write=N.N
-> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N)
I/O Timings: temp read=N.N write=N.N
+ I/O Timings: shared/local read=N.N
Planning Time: N.N ms
Execution Time: N.N ms
-(6 rows)
+(7 rows)
select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filter
I don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.
Hi,
Sorry for the late reply.
On Fri, Nov 19, 2021 at 7:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
Yeah, it looks better to add "shared/local".
Using the patch, I do feel like the EXPLAIN format of
shared/local xxx, temp xxx
is a bit confusing. If temp is going to be its own EXPLAIN IO timing
category (as opposed to being summed with relation data block IO from
local and shared buffers), then it seems like local and shared should
be separated as well.shared xxx, local xxx, temp xxx
I think the I/O timing shown as shared/local is the time spent on disk
I/O so it doesn't matter if these disk I/O are for shared buffers or
local buffers.
With your patch applied, below is the top of the EXPLAIN output for a
query joining a temporary table (so using local buffers) to a regular
table (using shared buffers) and spilling the hash join (temp files).Aggregate (actual rows=1 loops=1)
Buffers: shared read=4425, local read=4425 dirtied=4425
written=4423, temp read=5963 written=5963
I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198I found that using the same terminology as the "EXPLAIN BUFFERS" output
but not using the same categories was kind of confusing.If it is only meaningful to distinguish between relation data IO and
query temp file IO, then maybe the words used in I/O Timings in EXPLAIN
should be "rel data" and "temp" or something like that.
But if we do that, we end up using different terminology in "I/O
Timing" and "Buffers". I think it's better to use consistent words
used in them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote:
I've attached an updated patch. Please review it.
It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.The failures look like:
diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 @@ -512,9 +512,10 @@ I/O Timings: temp read=N.N write=N.N -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) I/O Timings: temp read=N.N write=N.N + I/O Timings: shared/local read=N.N Planning Time: N.N ms Execution Time: N.N ms -(6 rows) +(7 rows)select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filterI don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.
Thank you for reporting.
You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.
I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v3-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchapplication/octet-stream; name=v3-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchDownload+217-7
v3-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v3-0001-Track-I-O-timing-for-temp-buffers.patchDownload+76-16
Hi,
On Tue, Mar 01, 2022 at 11:35:32AM +0900, Masahiko Sawada wrote:
On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.The failures look like:
diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 @@ -512,9 +512,10 @@ I/O Timings: temp read=N.N write=N.N -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) I/O Timings: temp read=N.N write=N.N + I/O Timings: shared/local read=N.N Planning Time: N.N ms Execution Time: N.N ms -(6 rows) +(7 rows)select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filterI don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.Thank you for reporting.
You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.
Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.
I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.
Great!
Some other considerations:
- should we update pg_stat_statements documentation (and comments) for
blk_(read|write)_time to mention that it's for *data file* blocks rather than
simply blocks? It seems obvious now that we'd have
temp_blk_(read|write)_time, but still. This part should probably be
backpatched.
- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.
On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, Mar 01, 2022 at 11:35:32AM +0900, Masahiko Sawada wrote:
On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.The failures look like:
diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 @@ -512,9 +512,10 @@ I/O Timings: temp read=N.N write=N.N -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) I/O Timings: temp read=N.N write=N.N + I/O Timings: shared/local read=N.N Planning Time: N.N ms Execution Time: N.N ms -(6 rows) +(7 rows)select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filterI don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.Thank you for reporting.
You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.
Fair point. By commit 7e12256b478 we disabled track_io_timing, but
probably we can temporarily enable it and run one query with "buffers"
and "format json" options.
I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.Great!
Some other considerations:
- should we update pg_stat_statements documentation (and comments) for
blk_(read|write)_time to mention that it's for *data file* blocks rather than
simply blocks? It seems obvious now that we'd have
temp_blk_(read|write)_time, but still. This part should probably be
backpatched.
Agreed.
- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.
You mean we should include the time for opening files as write time?
IIUC smgrextend() writes data while extending file whereas
extendBufFile() doesn't do that but just opens a new file.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Apr 05, 2022 at 12:51:12AM +0900, Masahiko Sawada wrote:
On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.Fair point. By commit 7e12256b478 we disabled track_io_timing, but
probably we can temporarily enable it and run one query with "buffers"
and "format json" options.
Yes, enabling it for just this query. It can't really find any problem with
the values themselves but at least the new code path would be partially
executed.
- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.You mean we should include the time for opening files as write time?
Yes. In normal circumstances it shouldn't need a lot of time to do that, but
I'm not so sure with e.g. network filesystems. I'm not strongly in favor of
counting it, especially since smgrextend doesn't either.
IIUC smgrextend() writes data while extending file whereas
extendBufFile() doesn't do that but just opens a new file.
Note that smgrextend can also call register_dirty_segment(), which can also
take some time, so we can't just assume that all the time there is IO time.
On Tue, Apr 5, 2022 at 1:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Apr 05, 2022 at 12:51:12AM +0900, Masahiko Sawada wrote:
On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.Fair point. By commit 7e12256b478 we disabled track_io_timing, but
probably we can temporarily enable it and run one query with "buffers"
and "format json" options.Yes, enabling it for just this query. It can't really find any problem with
the values themselves but at least the new code path would be partially
executed.- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.You mean we should include the time for opening files as write time?
Yes. In normal circumstances it shouldn't need a lot of time to do that, but
I'm not so sure with e.g. network filesystems. I'm not strongly in favor of
counting it, especially since smgrextend doesn't either.
Good point. I think that adding a new place to track I/O timing can be
a separate patch so probably we can work on it for PG16 or later.
I've attached updated patches, please review it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v4-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v4-0001-Track-I-O-timing-for-temp-buffers.patchDownload+145-66
v4-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchapplication/octet-stream; name=v4-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchDownload+219-9
On Tue, Apr 05, 2022 at 10:40:04AM +0900, Masahiko Sawada wrote:
On Tue, Apr 5, 2022 at 1:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Yes. In normal circumstances it shouldn't need a lot of time to do that, but
I'm not so sure with e.g. network filesystems. I'm not strongly in favor of
counting it, especially since smgrextend doesn't either.Good point. I think that adding a new place to track I/O timing can be
a separate patch so probably we can work on it for PG16 or later.
Agreed.
I've attached updated patches, please review it.
It looks good to me, just one minor thing in 002:
@@ -183,8 +184,10 @@ typedef struct Counters
int64 local_blks_written; /* # of local disk blocks written */
int64 temp_blks_read; /* # of temp blocks read */
int64 temp_blks_written; /* # of temp blocks written */
- double blk_read_time; /* time spent reading, in msec */
- double blk_write_time; /* time spent writing, in msec */
+ double blk_read_time; /* time spent reading blocks, in msec */
+ double blk_write_time; /* time spent writing blocks, in msec */
+ double temp_blk_read_time; /* time spent reading temp blocks, in msec */
+ double temp_blk_write_time; /* time spent writing temp blocks, in msec */
maybe the comments should respectively be data file blocks and temp file
blocks.
This is a minor detail and the rest of the patch looks good to me, so I'm
marking the patch as Ready for Committer!
On Tue, Apr 05, 2022 at 11:57:14AM +0800, Julien Rouhaud wrote:
This is a minor detail and the rest of the patch looks good to me, so I'm
marking the patch as Ready for Committer!
@@ -440,10 +442,14 @@ BufFileLoadBuffer(BufFile *file)
+ if (track_io_timing)
+ INSTR_TIME_SET_CURRENT(io_start);
In places where we don't have clock_gettime(), this means using
gettimeofday(). I would not underestimate the performance impact of
such a change, even if track_io_timing is already known to have a
certain overhead on some platforms.
--
Michael
On Thu, Apr 07, 2022 at 03:58:46PM +0900, Michael Paquier wrote:
On Tue, Apr 05, 2022 at 11:57:14AM +0800, Julien Rouhaud wrote:
This is a minor detail and the rest of the patch looks good to me, so I'm
marking the patch as Ready for Committer!@@ -440,10 +442,14 @@ BufFileLoadBuffer(BufFile *file) + if (track_io_timing) + INSTR_TIME_SET_CURRENT(io_start);In places where we don't have clock_gettime(), this means using
gettimeofday(). I would not underestimate the performance impact of
such a change, even if track_io_timing is already known to have a
certain overhead on some platforms.
Sure, but gettimeofday() has been implemented in vDSO for quite some time on
most platforms, so it shouldn't hurt that much on mainstream platforms
especially compared to the cost of whatever operation is actually using that
temporary file.
I don't think that having an extra GUC for temp IO is sensible, if that's why
you're suggesting? Or are you just asking to do some benchmarking on some
platform where getting the time is known to be slow (Windows?).
On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote:
Sure, but gettimeofday() has been implemented in vDSO for quite some time on
most platforms, so it shouldn't hurt that much on mainstream platforms
especially compared to the cost of whatever operation is actually using that
temporary file.I don't think that having an extra GUC for temp IO is sensible, if that's why
you're suggesting? Or are you just asking to do some benchmarking on some
platform where getting the time is known to be slow (Windows?).
I am asking about the latter, but the former could be one solution if
the latter proves to be a problem, and this has not been discussed on
the thread yet. So, with some kind of worst-case scenario, how much
worse the performance gets once you add those extra calls when
compared to HEAD? I think that we'd better be careful with any
additions of INSTR_TIME_SET_CURRENT().
--
Michael
On Thu, Apr 07, 2022 at 04:24:54PM +0900, Michael Paquier wrote:
On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote:
Sure, but gettimeofday() has been implemented in vDSO for quite some time on
most platforms, so it shouldn't hurt that much on mainstream platforms
especially compared to the cost of whatever operation is actually using that
temporary file.I don't think that having an extra GUC for temp IO is sensible, if that's why
you're suggesting? Or are you just asking to do some benchmarking on some
platform where getting the time is known to be slow (Windows?).I am asking about the latter, but the former could be one solution if
the latter proves to be a problem, and this has not been discussed on
the thread yet. So, with some kind of worst-case scenario, how much
worse the performance gets once you add those extra calls when
compared to HEAD? I think that we'd better be careful with any
additions of INSTR_TIME_SET_CURRENT().
I just did a quick test on my linux box, using this data:
CREATE TABLE tt AS select generate_series(1, 10000) id;
VACUUM ANALYZE tt;
and this scenario:
SET work_mem TO '64kB';
SELECT count(*) FROM (SELECT id FROM tt ORDER BY id) s;
which yields this plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1349.39..1349.40 rows=1 width=8) (actual time=5.417..5.417 rows=1 loops=1)
-> Sort (cost=1199.39..1224.39 rows=10000 width=4) (actual time=2.910..4.422 rows=10000 loops=1)
Sort Key: tt.id
Sort Method: external merge Disk: 144kB
-> Seq Scan on tt (cost=0.00..145.00 rows=10000 width=4) (actual time=0.008..1.239 rows=10000 loops=1)
Planning Time: 0.405 ms
Execution Time: 5.524 ms
So maybe not the worst that could be tested, but probably bad enough for this
patch.
I ran that with pgbench, 4 clients (I have 4 cores) for 30 seconds, 3 times.
Comparing master and this patch with track_io_timing activated, I see a 0.95%
overhead, with a 2.6% noise level.