Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Started by Masahiko Sawadaover 4 years ago25 messageshackers
Jump to latest
#1Masahiko Sawada
sawada.mshk@gmail.com

Hi all ,

It's reported on pgsql-bugs[1]/messages/by-id/20210709084355.GA6251@depesz.com that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.

Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)

After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)

Feedback is very welcome.

Regards,

[1]: /messages/by-id/20210709084355.GA6251@depesz.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v1-0001-Track-I-O-timing-for-temp-buffers.patchapplication/x-patch; name=v1-0001-Track-I-O-timing-for-temp-buffers.patchDownload+67-14
#2Ranier Vilela
ranier.vf@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com>
escreveu:

Hi all ,

It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.

Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)

After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)

Feedback is very welcome.

The presentation seems a little confusing, wouldn't it be better?

I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073

I think can remove this lines:
+ if (has_temp_timing)
+ appendStringInfoChar(es->str, ',');

regards,
Ranier Vilela

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Ranier Vilela (#2)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

Hi all ,

It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.

Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)

After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)

Feedback is very welcome.

Thank you for the comments!

The presentation seems a little confusing, wouldn't it be better?

I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073

Yeah, it looks better to add "shared/local".

I think can remove this lines:
+ if (has_temp_timing)
+ appendStringInfoChar(es->str, ',');

But I think that it's consistent with buffers statistics in EXPLAIN
command. For example, "Buffers" in the output of EXPLAIN command could
be like:

Buffers: shared hit=1398, temp read=526 written=526

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#3)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

Hi all ,

It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.

Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)

After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)

Feedback is very welcome.

Thank you for the comments!

The presentation seems a little confusing, wouldn't it be better?

I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073

Yeah, it looks better to add "shared/local".

I've attached the updated patch that incorporates the above comment.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v2-0001-Track-I-O-timing-for-temp-buffers.patchDownload+68-14
#5Ranier Vilela
ranier.vf@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com>
escreveu:

On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com>

wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <

sawada.mshk@gmail.com> escreveu:

Hi all ,

It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.

Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)

After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)

Feedback is very welcome.

Thank you for the comments!

The presentation seems a little confusing, wouldn't it be better?

I/O Timings: shared/local read=xxxx write=xxx temp read=0.487

write=2.073

Yeah, it looks better to add "shared/local".

I've attached the updated patch that incorporates the above comment.

The patch looks fine to me.

regards,
Ranier Vilela

#6Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Ranier Vilela (#5)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, August 24th, 2021 at 13:20, Ranier Vilela <ranier.vf@gmail.com> wrote:

Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

Hi all ,

It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show

<snip>

I've attached the updated patch that incorporates the above comment.

The patch looks fine to me.

The patch looks good to me too. However I do wonder why the timing is added only on
the

if (es->format == EXPLAIN_FORMAT_TEXT)

block and is not added when, for example, the format is json. The instrumentation has
clearly recorded the timings regardless of the output format.

Also, it might be worth while to consider adding some regression tests. To my
understanding, explain.sql provides a function, explain_filter, which helps create
a stable result. For example, such a test case can be:

set track_io_timing = 'on';
select explain_filter('explain (analyze, buffers) select count(*) from generate_series(1,100000)');

then it would be enough to verify that the line:

I/O Timings: temp read=N.N write=N.N

is present. The above would apply on the json output via `explain_filter_to_json`
of course.

Thoughts?

Cheers,
//Georgios

#7Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Georgios Kokolatos (#6)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Tue, Sep 7, 2021 at 12:02 AM <gkokolatos@pm.me> wrote:

Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, August 24th, 2021 at 13:20, Ranier Vilela <ranier.vf@gmail.com> wrote:

Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

Hi all ,

It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show

<snip>

I've attached the updated patch that incorporates the above comment.

The patch looks fine to me.

Thank you for the comments!

The patch looks good to me too. However I do wonder why the timing is added only on
the

if (es->format == EXPLAIN_FORMAT_TEXT)

block and is not added when, for example, the format is json. The instrumentation has
clearly recorded the timings regardless of the output format.

Good point. Fixed.

Also, it might be worth while to consider adding some regression tests. To my
understanding, explain.sql provides a function, explain_filter, which helps create
a stable result. For example, such a test case can be:

set track_io_timing = 'on';
select explain_filter('explain (analyze, buffers) select count(*) from generate_series(1,100000)');

then it would be enough to verify that the line:

I/O Timings: temp read=N.N write=N.N

is present. The above would apply on the json output via `explain_filter_to_json`
of course.

Agreed. I've added regression tests.

I've attached an updated patch. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v3-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v3-0001-Track-I-O-timing-for-temp-buffers.patchDownload+186-13
#8Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#3)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

The presentation seems a little confusing, wouldn't it be better?

I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073

Yeah, it looks better to add "shared/local".

Using the patch, I do feel like the EXPLAIN format of

shared/local xxx, temp xxx

is a bit confusing. If temp is going to be its own EXPLAIN IO timing
category (as opposed to being summed with relation data block IO from
local and shared buffers), then it seems like local and shared should
be separated as well.

shared xxx, local xxx, temp xxx

With your patch applied, below is the top of the EXPLAIN output for a
query joining a temporary table (so using local buffers) to a regular
table (using shared buffers) and spilling the hash join (temp files).

Aggregate (actual rows=1 loops=1)
Buffers: shared read=4425, local read=4425 dirtied=4425
written=4423, temp read=5963 written=5963
I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198

I found that using the same terminology as the "EXPLAIN BUFFERS" output
but not using the same categories was kind of confusing.

If it is only meaningful to distinguish between relation data IO and
query temp file IO, then maybe the words used in I/O Timings in EXPLAIN
should be "rel data" and "temp" or something like that.

- Melanie

#9Julien Rouhaud
rjuju123@gmail.com
In reply to: Masahiko Sawada (#7)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Hi,

On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote:

I've attached an updated patch. Please review it.

It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.

The failures look like:

diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out	2022-01-19 03:50:37.087931908 +0000
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out	2022-01-19 03:57:41.013616137 +0000
@@ -512,9 +512,10 @@
    I/O Timings: temp read=N.N write=N.N
    ->  Function Scan on generate_series  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N)
          I/O Timings: temp read=N.N write=N.N
+   I/O Timings: shared/local read=N.N
  Planning Time: N.N ms
  Execution Time: N.N ms
-(6 rows)
+(7 rows)

select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filter

I don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#8)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Hi,

Sorry for the late reply.

On Fri, Nov 19, 2021 at 7:24 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

The presentation seems a little confusing, wouldn't it be better?

I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073

Yeah, it looks better to add "shared/local".

Using the patch, I do feel like the EXPLAIN format of

shared/local xxx, temp xxx

is a bit confusing. If temp is going to be its own EXPLAIN IO timing
category (as opposed to being summed with relation data block IO from
local and shared buffers), then it seems like local and shared should
be separated as well.

shared xxx, local xxx, temp xxx

I think the I/O timing shown as shared/local is the time spent on disk
I/O so it doesn't matter if these disk I/O are for shared buffers or
local buffers.

With your patch applied, below is the top of the EXPLAIN output for a
query joining a temporary table (so using local buffers) to a regular
table (using shared buffers) and spilling the hash join (temp files).

Aggregate (actual rows=1 loops=1)
Buffers: shared read=4425, local read=4425 dirtied=4425
written=4423, temp read=5963 written=5963
I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198

I found that using the same terminology as the "EXPLAIN BUFFERS" output
but not using the same categories was kind of confusing.

If it is only meaningful to distinguish between relation data IO and
query temp file IO, then maybe the words used in I/O Timings in EXPLAIN
should be "rel data" and "temp" or something like that.

But if we do that, we end up using different terminology in "I/O
Timing" and "Buffers". I think it's better to use consistent words
used in them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Julien Rouhaud (#9)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote:

I've attached an updated patch. Please review it.

It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.

The failures look like:

diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out      2022-01-19 03:50:37.087931908 +0000
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out        2022-01-19 03:57:41.013616137 +0000
@@ -512,9 +512,10 @@
I/O Timings: temp read=N.N write=N.N
->  Function Scan on generate_series  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N)
I/O Timings: temp read=N.N write=N.N
+   I/O Timings: shared/local read=N.N
Planning Time: N.N ms
Execution Time: N.N ms
-(6 rows)
+(7 rows)

select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filter

I don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.

Thank you for reporting.

You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.

I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v3-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchapplication/octet-stream; name=v3-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchDownload+217-7
v3-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v3-0001-Track-I-O-timing-for-temp-buffers.patchDownload+76-16
#12Julien Rouhaud
rjuju123@gmail.com
In reply to: Masahiko Sawada (#11)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

Hi,

On Tue, Mar 01, 2022 at 11:35:32AM +0900, Masahiko Sawada wrote:

On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.

The failures look like:

diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out      2022-01-19 03:50:37.087931908 +0000
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out        2022-01-19 03:57:41.013616137 +0000
@@ -512,9 +512,10 @@
I/O Timings: temp read=N.N write=N.N
->  Function Scan on generate_series  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N)
I/O Timings: temp read=N.N write=N.N
+   I/O Timings: shared/local read=N.N
Planning Time: N.N ms
Execution Time: N.N ms
-(6 rows)
+(7 rows)

select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filter

I don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.

Thank you for reporting.

You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.

Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.

I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.

Great!

Some other considerations:

- should we update pg_stat_statements documentation (and comments) for
blk_(read|write)_time to mention that it's for *data file* blocks rather than
simply blocks? It seems obvious now that we'd have
temp_blk_(read|write)_time, but still. This part should probably be
backpatched.

- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.

#13Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Julien Rouhaud (#12)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Tue, Mar 01, 2022 at 11:35:32AM +0900, Masahiko Sawada wrote:

On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

It seems that the regression tests aren't entirely stable, per cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298.

The failures look like:

diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out
--- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out      2022-01-19 03:50:37.087931908 +0000
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out        2022-01-19 03:57:41.013616137 +0000
@@ -512,9 +512,10 @@
I/O Timings: temp read=N.N write=N.N
->  Function Scan on generate_series  (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N)
I/O Timings: temp read=N.N write=N.N
+   I/O Timings: shared/local read=N.N
Planning Time: N.N ms
Execution Time: N.N ms
-(6 rows)
+(7 rows)

select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)');
explain_filter

I don't see any obvious error in the code, so I'm wondering if it's simply
the result of populating the cache with generate_series() info.

Thank you for reporting.

You're right, the regression test with track_io_timing = on is not
entirely stable because we may read catalog data during planning time,
resulting in an additional line in the EXPLAIN output. I've removed
the regression tests.

Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.

Fair point. By commit 7e12256b478 we disabled track_io_timing, but
probably we can temporarily enable it and run one query with "buffers"
and "format json" options.

I've attached updated patches. I've included an improvement of
pg_stat_statement support to support temp I/O timing.

Great!

Some other considerations:

- should we update pg_stat_statements documentation (and comments) for
blk_(read|write)_time to mention that it's for *data file* blocks rather than
simply blocks? It seems obvious now that we'd have
temp_blk_(read|write)_time, but still. This part should probably be
backpatched.

Agreed.

- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.

You mean we should include the time for opening files as write time?
IIUC smgrextend() writes data while extending file whereas
extendBufFile() doesn't do that but just opens a new file.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Masahiko Sawada (#13)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Tue, Apr 05, 2022 at 12:51:12AM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.

Fair point. By commit 7e12256b478 we disabled track_io_timing, but
probably we can temporarily enable it and run one query with "buffers"
and "format json" options.

Yes, enabling it for just this query. It can't really find any problem with
the values themselves but at least the new code path would be partially
executed.

- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.

You mean we should include the time for opening files as write time?

Yes. In normal circumstances it shouldn't need a lot of time to do that, but
I'm not so sure with e.g. network filesystems. I'm not strongly in favor of
counting it, especially since smgrextend doesn't either.

IIUC smgrextend() writes data while extending file whereas
extendBufFile() doesn't do that but just opens a new file.

Note that smgrextend can also call register_dirty_segment(), which can also
take some time, so we can't just assume that all the time there is IO time.

#15Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Julien Rouhaud (#14)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Tue, Apr 5, 2022 at 1:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Apr 05, 2022 at 12:51:12AM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hmm, but AFAICS the json format would be stable as the counters are always
shown even if zero. So just doing the json format first and then the text
format should also work. Although if you're really unlucky there could be a
cache invalidation in between so we could just ignore the text format. But I
think we should at least keep a regression test with the json format, with a
comment explain why only this one is tested.

Fair point. By commit 7e12256b478 we disabled track_io_timing, but
probably we can temporarily enable it and run one query with "buffers"
and "format json" options.

Yes, enabling it for just this query. It can't really find any problem with
the values themselves but at least the new code path would be partially
executed.

- not really your patch fault I guess, but I see that extendBufFile() isn't
handled. There shouldn't be much activity there so maybe it's ok.
This is likely because smgr_extend is also not handled, but this one seems
much more likely to take quite some time, and therefore should bump the
timing counters.

You mean we should include the time for opening files as write time?

Yes. In normal circumstances it shouldn't need a lot of time to do that, but
I'm not so sure with e.g. network filesystems. I'm not strongly in favor of
counting it, especially since smgrextend doesn't either.

Good point. I think that adding a new place to track I/O timing can be
a separate patch so probably we can work on it for PG16 or later.

I've attached updated patches, please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v4-0001-Track-I-O-timing-for-temp-buffers.patchapplication/octet-stream; name=v4-0001-Track-I-O-timing-for-temp-buffers.patchDownload+145-66
v4-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchapplication/octet-stream; name=v4-0002-pg_stat_statements-Track-I-O-timing-for-temp-bloc.patchDownload+219-9
#16Julien Rouhaud
rjuju123@gmail.com
In reply to: Masahiko Sawada (#15)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Tue, Apr 05, 2022 at 10:40:04AM +0900, Masahiko Sawada wrote:

On Tue, Apr 5, 2022 at 1:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Yes. In normal circumstances it shouldn't need a lot of time to do that, but
I'm not so sure with e.g. network filesystems. I'm not strongly in favor of
counting it, especially since smgrextend doesn't either.

Good point. I think that adding a new place to track I/O timing can be
a separate patch so probably we can work on it for PG16 or later.

Agreed.

I've attached updated patches, please review it.

It looks good to me, just one minor thing in 002:

@@ -183,8 +184,10 @@ typedef struct Counters
 	int64		local_blks_written; /* # of local disk blocks written */
 	int64		temp_blks_read; /* # of temp blocks read */
 	int64		temp_blks_written;	/* # of temp blocks written */
-	double		blk_read_time;	/* time spent reading, in msec */
-	double		blk_write_time; /* time spent writing, in msec */
+	double		blk_read_time;	/* time spent reading blocks, in msec */
+	double		blk_write_time; /* time spent writing blocks, in msec */
+	double		temp_blk_read_time;	/* time spent reading temp blocks, in msec */
+	double		temp_blk_write_time; /* time spent writing temp blocks, in msec */

maybe the comments should respectively be data file blocks and temp file
blocks.

This is a minor detail and the rest of the patch looks good to me, so I'm
marking the patch as Ready for Committer!

#17Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#16)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Tue, Apr 05, 2022 at 11:57:14AM +0800, Julien Rouhaud wrote:

This is a minor detail and the rest of the patch looks good to me, so I'm
marking the patch as Ready for Committer!

@@ -440,10 +442,14 @@ BufFileLoadBuffer(BufFile *file)
+   if (track_io_timing)
+       INSTR_TIME_SET_CURRENT(io_start);

In places where we don't have clock_gettime(), this means using
gettimeofday(). I would not underestimate the performance impact of
such a change, even if track_io_timing is already known to have a
certain overhead on some platforms.
--
Michael

#18Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#17)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Thu, Apr 07, 2022 at 03:58:46PM +0900, Michael Paquier wrote:

On Tue, Apr 05, 2022 at 11:57:14AM +0800, Julien Rouhaud wrote:

This is a minor detail and the rest of the patch looks good to me, so I'm
marking the patch as Ready for Committer!

@@ -440,10 +442,14 @@ BufFileLoadBuffer(BufFile *file)
+   if (track_io_timing)
+       INSTR_TIME_SET_CURRENT(io_start);

In places where we don't have clock_gettime(), this means using
gettimeofday(). I would not underestimate the performance impact of
such a change, even if track_io_timing is already known to have a
certain overhead on some platforms.

Sure, but gettimeofday() has been implemented in vDSO for quite some time on
most platforms, so it shouldn't hurt that much on mainstream platforms
especially compared to the cost of whatever operation is actually using that
temporary file.

I don't think that having an extra GUC for temp IO is sensible, if that's why
you're suggesting? Or are you just asking to do some benchmarking on some
platform where getting the time is known to be slow (Windows?).

#19Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#18)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote:

Sure, but gettimeofday() has been implemented in vDSO for quite some time on
most platforms, so it shouldn't hurt that much on mainstream platforms
especially compared to the cost of whatever operation is actually using that
temporary file.

I don't think that having an extra GUC for temp IO is sensible, if that's why
you're suggesting? Or are you just asking to do some benchmarking on some
platform where getting the time is known to be slow (Windows?).

I am asking about the latter, but the former could be one solution if
the latter proves to be a problem, and this has not been discussed on
the thread yet. So, with some kind of worst-case scenario, how much
worse the performance gets once you add those extra calls when
compared to HEAD? I think that we'd better be careful with any
additions of INSTR_TIME_SET_CURRENT().
--
Michael

#20Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#19)
Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN

On Thu, Apr 07, 2022 at 04:24:54PM +0900, Michael Paquier wrote:

On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote:

Sure, but gettimeofday() has been implemented in vDSO for quite some time on
most platforms, so it shouldn't hurt that much on mainstream platforms
especially compared to the cost of whatever operation is actually using that
temporary file.

I don't think that having an extra GUC for temp IO is sensible, if that's why
you're suggesting? Or are you just asking to do some benchmarking on some
platform where getting the time is known to be slow (Windows?).

I am asking about the latter, but the former could be one solution if
the latter proves to be a problem, and this has not been discussed on
the thread yet. So, with some kind of worst-case scenario, how much
worse the performance gets once you add those extra calls when
compared to HEAD? I think that we'd better be careful with any
additions of INSTR_TIME_SET_CURRENT().

I just did a quick test on my linux box, using this data:
CREATE TABLE tt AS select generate_series(1, 10000) id;
VACUUM ANALYZE tt;

and this scenario:
SET work_mem TO '64kB';
SELECT count(*) FROM (SELECT id FROM tt ORDER BY id) s;

which yields this plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1349.39..1349.40 rows=1 width=8) (actual time=5.417..5.417 rows=1 loops=1)
-> Sort (cost=1199.39..1224.39 rows=10000 width=4) (actual time=2.910..4.422 rows=10000 loops=1)
Sort Key: tt.id
Sort Method: external merge Disk: 144kB
-> Seq Scan on tt (cost=0.00..145.00 rows=10000 width=4) (actual time=0.008..1.239 rows=10000 loops=1)
Planning Time: 0.405 ms
Execution Time: 5.524 ms

So maybe not the worst that could be tested, but probably bad enough for this
patch.

I ran that with pgbench, 4 clients (I have 4 cores) for 30 seconds, 3 times.

Comparing master and this patch with track_io_timing activated, I see a 0.95%
overhead, with a 2.6% noise level.

#21Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Julien Rouhaud (#20)
#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#23)
#25Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#24)