per backend I/O statistics
Hi hackers,
Please find attached a patch to implement $SUBJECT.
While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
pg_my_stat_io view to display "my" backend I/O statistics and a new
pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
backend pid.
By having the per backend level of granularity, one could for example identify
which running backend is responsible for most of the reads, most of the extends
and so on... The pg_my_stat_io view could also be useful to check the
impact on the I/O made by some operations, queries,... in the current session.
Some remarks:
- it is split in 2 sub patches: 0001 introducing the necessary changes to provide
the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function.
- the idea of having per backend I/O statistics has already been mentioned in
[1]: /messages/by-id/20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de
Some implementation choices:
- The KIND_IO stats are still "fixed amount" ones as the maximum number of
backend is fixed.
- The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build for all the backend stats (that
could be memory expensive depending on the number of max connections and given
the fact that PgStat_IO is 16KB long).
- The above point means that pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory).
- The above 2 points are also the reasons why the pg_my_stat_io view has been
added (as its results takes care of the stats_fetch_consistency setting). I think
that makes sense to rely on it in that case, while I'm not sure that would make
a lot of sense to retrieve other's backend I/O stats and taking care of
stats_fetch_consistency.
[1]: /messages/by-id/20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de
Looking forward to your feedback,
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in
Hi hackers,
Please find attached a patch to implement $SUBJECT.
While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
pg_my_stat_io view to display "my" backend I/O statistics and a new
pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
backend pid.By having the per backend level of granularity, one could for example identify
which running backend is responsible for most of the reads, most of the extends
and so on... The pg_my_stat_io view could also be useful to check the
impact on the I/O made by some operations, queries,... in the current session.Some remarks:
- it is split in 2 sub patches: 0001 introducing the necessary changes to provide
the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function.
- the idea of having per backend I/O statistics has already been mentioned in
[1] by Andres.Some implementation choices:
- The KIND_IO stats are still "fixed amount" ones as the maximum number of
backend is fixed.
- The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build for all the backend stats (that
could be memory expensive depending on the number of max connections and given
the fact that PgStat_IO is 16KB long).
- The above point means that pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory).
- The above 2 points are also the reasons why the pg_my_stat_io view has been
added (as its results takes care of the stats_fetch_consistency setting). I think
that makes sense to rely on it in that case, while I'm not sure that would make
a lot of sense to retrieve other's backend I/O stats and taking care of
stats_fetch_consistency.[1]: /messages/by-id/20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de
I'm not sure about the usefulness of having the stats only available
from the current session. Since they are stored in shared memory,
shouldn't we make them accessible to all backends? However, this would
introduce permission considerations and could become complex.
When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.
Just to be clear, the above comments are not meant to oppose the
current implementation approach. They are purely for the sake of
discussing comparisons with other possible approaches.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.
As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
On Tue, Sep 03, 2024 at 03:37:49PM +0900, Kyotaro Horiguchi wrote:
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in
Hi hackers,
Please find attached a patch to implement $SUBJECT.
While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
pg_my_stat_io view to display "my" backend I/O statistics and a new
pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
backend pid.I'm not sure about the usefulness of having the stats only available
from the current session. Since they are stored in shared memory,
shouldn't we make them accessible to all backends?
Thanks for the feedback!
The stats are accessible to all backends thanks to 0002 and the introduction
of the pg_stat_get_backend_io() function.
However, this would
introduce permission considerations and could become complex.
Not sure that the data exposed here is sensible enough to consider permission
restriction.
When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.
The pros of using the fixed amount are:
- less code change (I think as I did not write the non fixed counterpart)
- probably better performance and less scalabilty impact (in case of high rate
of backends creation/ deletion)
Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.
"Conceptually" speaking, we know what the maximum number of backend is, so I
think that using the fixed amount approach makes sense (somehow I think it can
be compared to PGSTAT_KIND_SLRU which relies on SLRU_NUM_ELEMENTS).
Just to be clear, the above comments are not meant to oppose the
current implementation approach. They are purely for the sake of
discussing comparisons with other possible approaches.
No problem at all, thanks for your feedback and sharing your thoughts!
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.
Thanks for the feedback!
If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?
If we allow each session to disable the feature then the administrator
would lost this ability.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 2024-Sep-03, Bertrand Drouvot wrote:
Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.
I was surprised by the size you mention so I went to look for the
definition of that struct:
typedef struct PgStat_IO
{
TimestampTz stat_reset_timestamp;
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
(It would be good to have more than zero comments here BTW)
So what's happening is that struct PgStat_IO stores the data for every
single process type, be it regular backends, backend-like auxiliary
processes, and all other potential postmaster children. So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time). That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.
Another way to think about this might be to say that the B_BACKEND
element of the PgStat_Snapshot.io array should really be spread out to
MaxBackends separate elements. This would make it more difficult to
obtain a total value accumulating ops done by all backends (since it's
require to sum the values of each backend), but it allows you to obtain
backend-specific values, including those of remote backends rather than
only your own, as Kyotaro suggests.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Hi,
On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote:
On 2024-Sep-03, Bertrand Drouvot wrote:
Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.I was surprised by the size you mention so I went to look for the
definition of that struct:
Thanks for looking at it!
typedef struct PgStat_IO
{
TimestampTz stat_reset_timestamp;
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;(It would be good to have more than zero comments here BTW)
So what's happening is that struct PgStat_IO stores the data for every
single process type, be it regular backends, backend-like auxiliary
processes, and all other potential postmaster children.
Yeap.
So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time). That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.
Yeah, that could be an area of improvement. Thanks, I'll look at it.
Currently the filtering is done when retrieving the per backend stats but better
to do it when storing the stats.
Another way to think about this might be to say that the B_BACKEND
element of the PgStat_Snapshot.io array should really be spread out to
MaxBackends separate elements. This would make it more difficult to
obtain a total value accumulating ops done by all backends (since it's
require to sum the values of each backend), but it allows you to obtain
backend-specific values, including those of remote backends rather than
only your own, as Kyotaro suggests.
One backend can already see the stats of the other backends thanks to the
pg_stat_get_backend_io() function (that takes a backend pid as parameter)
that is introduced in the 0002 sub-patch.
I'll ensure that's still the case with the next version of the patch.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Sep 05, 2024 at 02:14:47PM +0000, Bertrand Drouvot wrote:
Hi,
On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote:
So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time). That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.Yeah, that could be an area of improvement. Thanks, I'll look at it.
Currently the filtering is done when retrieving the per backend stats but better
to do it when storing the stats.
I ended up creating (in v2 attached):
"
typedef struct PgStat_Backend_IO
{
TimestampTz stat_reset_timestamp;
BackendType bktype;
PgStat_BktypeIO stats;
} PgStat_Backend_IO;
"
The stat_reset_timestamp is there so that one knows when a particular backend
had its I/O stats reset. Also the backend type is part of the struct to
be able to filter the stats correctly when we display them.
The struct size is 1040 Bytes and that's much more reasonable than the size
needed for per backend I/O stats in v1 (about 16KB).
One backend can already see the stats of the other backends thanks to the
pg_stat_get_backend_io() function (that takes a backend pid as parameter)
that is introduced in the 0002 sub-patch.
0002 still provides the pg_stat_get_backend_io() function so that one could
get the stats of other backends.
Example:
postgres=# select backend_type,object,context,reads,extends,hits from pg_stat_get_backend_io(3779502);
backend_type | object | context | reads | extends | hits
----------------+---------------+-----------+-------+---------+--------
client backend | relation | bulkread | 0 | | 0
client backend | relation | bulkwrite | 0 | 0 | 0
client backend | relation | normal | 73 | 2216 | 504674
client backend | relation | vacuum | 0 | 0 | 0
client backend | temp relation | normal | 0 | 0 | 0
(5 rows)
could be an individual walsender:
postgres=# select pid, backend_type from pg_stat_activity where backend_type = 'walsender';
pid | backend_type
---------+--------------
3779565 | walsender
(1 row)
postgres=# select backend_type,object,context,reads,hits from pg_stat_get_backend_io(3779565);
backend_type | object | context | reads | hits
--------------+---------------+-----------+-------+------
walsender | relation | bulkread | 0 | 0
walsender | relation | bulkwrite | 0 | 0
walsender | relation | normal | 6 | 48
walsender | relation | vacuum | 0 | 0
walsender | temp relation | normal | 0 | 0
(5 rows)
and so on...
Remarks:
- As stated up-thread, the pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory). Indeed, the snapshot is not build in each backend to copy
all the others backends stats, as 1/ there is no use case (there is no need to
get others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.
- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote:
- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).
Please find attached v3, mandatory rebase due to fc415edf8c.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote:
The struct size is 1040 Bytes and that's much more reasonable than the size
needed for per backend I/O stats in v1 (about 16KB).
One could think that's a high increase of shared memory usage with a high
number of connections (due to the fact that it's implemented as "fixed amount"
stats based on the max number of backends).
So out of curiosity, I did some measurements with:
dynamic_shared_memory_type=sysv
shared_memory_type=sysv
max_connections=20000
On my lab, ipcs -m gives me:
- on master
key shmid owner perms bytes nattch status
0x00113a04 51347487 postgres 600 1149394944 6
0x4bc9f2fa 51347488 postgres 600 4006976 6
0x46790800 51347489 postgres 600 1048576 2
- with v3 applied
key shmid owner perms bytes nattch status
0x00113a04 51347477 postgres 600 1170227200 6
0x08e04b0a 51347478 postgres 600 4006976 6
0x74688c9c 51347479 postgres 600 1048576 2
So, with 20000 max_connections (not advocating that's a reasonable number in
all the cases), that's a 1170227200 - 1149394944 = 20 MB increase of shared
memory (expected with 20K max_connections and the new struct size of 1040 Bytes)
as compare to master which is 1096 MB. It means that v3 produces about 2% shared
memory increase with 20000 max_connections.
Out of curiosity with max_connections=100000, then:
- on master:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00113a04 52953134 postgres 600 5053915136 6
0x37abf5ce 52953135 postgres 600 20006976 6
0x71c07d5c 52953136 postgres 600 1048576 2
- with v3:
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00113a04 52953144 postgres 600 5157945344 6
0x7afb24de 52953145 postgres 600 20006976 6
0x2695af58 52953146 postgres 600 1048576 2
So, with 100000 max_connections (not advocating that's a reasonable number in
all the cases), that's a 5157945344 - 5053915136 = 100 MB increase of shared
memory (expected with 100K max_connections and the new struct size of 1040 Bytes)
as compare to master which is about 4800 MB. It means that v3 produces about
2% shared memory increase with 100000 max_connections.
Then, based on those numbers, I think that the "fixed amount" approach is a good
one as 1/ the amount of shared memory increase is "relatively" small and 2/ most
probably provides performance benefits as compare to the "non fixed" approach.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
Thanks for working on this!
Your patch applies and builds cleanly.
On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
- As stated up-thread, the pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory). Indeed, the snapshot is not build in each backend to copy
all the others backends stats, as 1/ there is no use case (there is no need to
get others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.
I believe this is the correct approach.
I manually tested your patches, and they work as expected. Here is
some feedback:
- The stats_reset column is NULL in both pg_my_stat_io and
pg_stat_get_backend_io() until the first call to reset io statistics.
While I'm not sure if it's necessary, it might be useful to display
the more recent of the two times in the stats_reset column: the
statistics reset time or the backend creation time.
- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Hi,
On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
Hi,
Thanks for working on this!
Your patch applies and builds cleanly.
Thanks for looking at it!
On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:- As stated up-thread, the pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory). Indeed, the snapshot is not build in each backend to copy
all the others backends stats, as 1/ there is no use case (there is no need to
get others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.I believe this is the correct approach.
Thanks for sharing your thoughts.
I manually tested your patches, and they work as expected. Here is
some feedback:- The stats_reset column is NULL in both pg_my_stat_io and
pg_stat_get_backend_io() until the first call to reset io statistics.
While I'm not sure if it's necessary, it might be useful to display
the more recent of the two times in the stats_reset column: the
statistics reset time or the backend creation time.
I'm not sure about that as one can already get the backend "creation time"
through pg_stat_activity.backend_start.
- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.
doh! Nice catch, thanks!
And also new backends that are not re-using a previous "existing" process slot
are getting the last reset time (if any). So the right place to fix this is in
pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
right "conceptually" speaking).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.doh! Nice catch, thanks!
And also new backends that are not re-using a previous "existing" process slot
are getting the last reset time (if any). So the right place to fix this is in
pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
right "conceptually" speaking).
Thanks, I confirm that it is fixed.
You mentioned some refactoring upthread:
On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).
Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats? If you meant the same thing, please don't
mind it.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Hi,
On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
Hi,
On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:
- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.doh! Nice catch, thanks!
And also new backends that are not re-using a previous "existing" process slot
are getting the last reset time (if any). So the right place to fix this is in
pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
right "conceptually" speaking).Thanks, I confirm that it is fixed.
Thanks!
You mentioned some refactoring upthread:
On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats?
The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]/messages/by-id/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.
OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]/messages/by-id/ZtsZtaRza9bFFeF8@ip-10-97-1-34.eu-west-3.compute.internal). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.
So I think it's better to keep both functions as they behave differently.
Thoughts?
If you meant the same thing, please don't
mind it.
What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.
[1]: /messages/by-id/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/ZtsZtaRza9bFFeF8@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:
Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats?The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.So I think it's better to keep both functions as they behave differently.
Thoughts?
Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.
If you meant the same thing, please don't
mind it.What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.
I got it, thanks for the explanation.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Hi,
On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote:
Hi,
On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:So I think it's better to keep both functions as they behave differently.
Thoughts?
Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.
No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote:
On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.Thanks for the feedback!
If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?If we allow each session to disable the feature then the administrator
would lost this ability.
Hmm, I've been studying this patch, and I am not completely sure to
agree with this feeling of using fixed-numbered stats, actually, after
reading the whole and seeing the structure of the patch
(FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't
know exactly the number of slots we need to know for the
fixed-numbered stats as MaxBackends may change). If we make these
kind of stats variable-numbered, does it have to actually involve many
creations or removals of the stats entries, though? One point is that
the number of entries to know about is capped by max_connections,
which is a PGC_POSTMASTER. That's the same kind of control as
replication slots. So one approach would be to reuse entries in the
dshash and use in the hashing key the number in the procarrays. If a
new connection spawns and reuses a slot that was used in the past,
then reset all the existing fields and assign its PID.
Another thing is the consistency of the data that we'd like to keep at
shutdown. If the connections have a balanced amount of stats shared
among them, doing decision-making based on them is kind of easy. But
that may cause confusion if the activity is unbalanced across the
sessions. We could also not flush them to disk as an option, but it
still seems more useful to me to save this data across restarts if one
takes frequent snapshots of the new system view reporting everything,
so as it is possible to get an idea of the deltas across the snapshots
for each connection slot.
--
Michael
On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote:
No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )
We've discussed a bit this patch offline, but after studying the patch
I doubt that this part is a good idea:
+ /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+ PgStatShared_IO io;
} PgStat_ShmemControl;
We are going to be in trouble if we introduce a second member in this
routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl
relies on the fact that all its members after deterministic offset
positions in this structure. So this lacks flexibility. This choice
is caused by the fact that we don't exactly know the number of
backends because that's controlled by the PGC_POSTMASTER GUC
max_connections so the size of the structure would be undefined.
There is a parallel with replication slot statistics here, where we
save the replication slot data in the dshash based on their index
number in shmem. Hence, wouldn't it be better to do like replication
slot stats, where we use the dshash and a key based on the procnum of
each backend or auxiliary process (ProcNumber in procnumber.h)? If at
restart max_connections is lower than what was previously used, we
could discard entries that would not fit anymore into the charts.
This is probably not something that happens often, so I'm not really
worried about forcing the removal of these stats depending on how the
upper-bound of ProcNumber evolves.
So, using a new kind of ID and making this kind variable-numbered may
ease the implementation quite a bit, while avoiding any extensibility
issues with the shmem portion of the patch if these are
fixed-numbered. The reporting of these stats comes down to having a
parallel with pgstat_count_io_op_time(), but to make sure that the
stats are split by connection slot number rather than the current
split of pg_stat_io. All its callers are in localbuf.c, bufmgr.c and
md.c, living with some duplication in the code paths to gather the
stats may be OK.
pg_stat_get_my_io() is based on a O(n^3). IOOBJECT_NUM_TYPES is
fortunately low, still that's annoying.
This would rely on the fact that we would use the ProcNumber for the
dshash key, and this information is not provided in pg_stat_activity.
Perhaps we should add this information in pg_stat_activity so as it
would be easily possible to do joins with a SQL function that returns
a SRF with all the stats associated with a given connection slot
(auxiliary or backend process)? That would be a separate patch.
Perhaps that's even something that has popped up for the work with
threading (did not follow this part closely, TBH)?
The active PIDs of the live sessions are not stored in the active
stats, why not? Perhaps that's useless anyway if we expose the
ProcNumbers in pg_stat_activity and make the stats available with a
single function taking in input a ProcNumber. Just mentioning an
option to consider.
--
Michael
Hi,
On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote:
On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote:
On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:
As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.Thanks for the feedback!
If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?If we allow each session to disable the feature then the administrator
would lost this ability.Hmm, I've been studying this patch,
Thanks for looking at it!
and I am not completely sure to
agree with this feeling of using fixed-numbered stats, actually, after
reading the whole and seeing the structure of the patch
(FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't
know exactly the number of slots we need to know for the
fixed-numbered stats as MaxBackends may change).
Right, that's a new way of dealing with "unknown" number of slots (and it has
cons as you mentioned in [1]/messages/by-id/Zuz5iQ4AjcuOMx_w@paquier.xyz).
If we make these
kind of stats variable-numbered, does it have to actually involve many
creations or removals of the stats entries, though? One point is that
the number of entries to know about is capped by max_connections,
which is a PGC_POSTMASTER. That's the same kind of control as
replication slots. So one approach would be to reuse entries in the
dshash and use in the hashing key the number in the procarrays. If a
new connection spawns and reuses a slot that was used in the past,
then reset all the existing fields and assign its PID.
Yeah, like it's done currently with the "fixed-numbered" stats proposal. That
sounds reasonable to me, I'll look at this proposed approach and come back with
a new patch version, thanks!
Another thing is the consistency of the data that we'd like to keep at
shutdown. If the connections have a balanced amount of stats shared
among them, doing decision-making based on them is kind of easy. But
that may cause confusion if the activity is unbalanced across the
sessions. We could also not flush them to disk as an option, but it
still seems more useful to me to save this data across restarts if one
takes frequent snapshots of the new system view reporting everything,
so as it is possible to get an idea of the deltas across the snapshots
for each connection slot.
The idea that has been implemented so far in this patch is that we still maintain
an aggregated version of the stats (visible through pg_stat_io) and that only the
aggregated stats are flushed/read to/from disk (means we don't flush the
per-backend stats).
I think that it makes sense that way. The way I see it is that the per-backend
I/O stats is more for current activity instrumentation. So it's not clear to me
what would be the benefits of restoring the per-backend stats at startup knowing
that: 1) we restored the aggregated stats and 2) the sessions that were responsible
for the the restored stats are gone.
[1]: /messages/by-id/Zuz5iQ4AjcuOMx_w@paquier.xyz
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:
On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote:
No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )We've discussed a bit this patch offline, but after studying the patch
I doubt that this part is a good idea:+ /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */ + PgStatShared_IO io; } PgStat_ShmemControl;We are going to be in trouble if we introduce a second member in this
routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl
relies on the fact that all its members after deterministic offset
positions in this structure.
Agree that it would be an issue should we have to add a new FLEXIBLE_ARRAY_MEMBER.
So this lacks flexibility. This choice
is caused by the fact that we don't exactly know the number of
backends because that's controlled by the PGC_POSTMASTER GUC
max_connections so the size of the structure would be undefined.
Right.
There is a parallel with replication slot statistics here, where we
save the replication slot data in the dshash based on their index
number in shmem. Hence, wouldn't it be better to do like replication
slot stats, where we use the dshash and a key based on the procnum of
each backend or auxiliary process (ProcNumber in procnumber.h)? If at
restart max_connections is lower than what was previously used, we
could discard entries that would not fit anymore into the charts.
This is probably not something that happens often, so I'm not really
worried about forcing the removal of these stats depending on how the
upper-bound of ProcNumber evolves.
Yeah, I'll look at implementing the dshash based on their procnum and see
where it goes.
So, using a new kind of ID and making this kind variable-numbered may
ease the implementation quite a bit, while avoiding any extensibility
issues with the shmem portion of the patch if these are
fixed-numbered.
Agree.
This would rely on the fact that we would use the ProcNumber for the
dshash key, and this information is not provided in pg_stat_activity.
Perhaps we should add this information in pg_stat_activity so as it
would be easily possible to do joins with a SQL function that returns
a SRF with all the stats associated with a given connection slot
(auxiliary or backend process)?
I'm not sure that's needed. What has been done in the previous versions is
to get the stats based on the pid (see pg_stat_get_backend_io()) where the
procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)).
Do you see an issue with that approach?
The active PIDs of the live sessions are not stored in the active
stats, why not?
That was not needed. We can still retrieve the stats based on the pid thanks
to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
to actually store the pid in the stats. I think that's fine because the pid
only matters at "display" time (pg_stat_get_backend_io()).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com