Add callbacks for fixed-numbered stats flush in pgstats
Hi all,
The last TODO item I had in my bucket about the generalization of
pgstats is the option to a better control on the flush of the stats
depending on the kind for fixed-numbered stats. Currently, this is
controlled by pgstat_report_stat(), that includes special handling for
WAL, IO and SLRU stats, with two generic concepts:
- Check if there are pending entries, allowing a fast-path exit.
- Do the actual flush, with a recheck on pending entries.
SLRU and IO control that with one variable each, and WAL uses a
routine for the same called pgstat_have_pending_wal(). Please find
attached a patch to generalize the concept, with two new callbacks
that can be used for fixed-numbered stats. SLRU, IO and WAL are
switched to use these (the two pgstat_flush_* routines have been kept
on purpose). This brings some clarity in the code, by making
have_iostats and have_slrustats static in their respective files. The
two pgstat_flush_* wrappers do not need a boolean as return result.
Running Postgres on scissors with a read-only workload that does not
trigger stats, I was not able to see a difference in runtime, but that
was on my own laptop, and I am planning to do more measurements on a
bigger machine.
This is in the same line of thoughts as the recent thread about the
backend init callback, generalizing more the whole facility:
/messages/by-id/ZtZr1K4PLdeWclXY@paquier.xyz
Like the other one, I wanted to send that a few days ago, but well,
life likes going its own ways sometimes.
Thanks,
--
Michael
Attachments:
0001-Add-callbacks-to-control-flush-of-fixed-numbered-sta.patchtext/x-diff; charset=us-asciiDownload+119-43
At Tue, 3 Sep 2024 13:48:59 +0900, Michael Paquier <michael@paquier.xyz> wrote in
Hi all,
The last TODO item I had in my bucket about the generalization of
pgstats is the option to a better control on the flush of the stats
depending on the kind for fixed-numbered stats. Currently, this is
controlled by pgstat_report_stat(), that includes special handling for
WAL, IO and SLRU stats, with two generic concepts:
- Check if there are pending entries, allowing a fast-path exit.
- Do the actual flush, with a recheck on pending entries.SLRU and IO control that with one variable each, and WAL uses a
routine for the same called pgstat_have_pending_wal(). Please find
attached a patch to generalize the concept, with two new callbacks
that can be used for fixed-numbered stats. SLRU, IO and WAL are
switched to use these (the two pgstat_flush_* routines have been kept
on purpose). This brings some clarity in the code, by making
have_iostats and have_slrustats static in their respective files. The
two pgstat_flush_* wrappers do not need a boolean as return result.
The generalization sounds good to me, and hiding the private flags in
private places also seems good.
Regarding pgstat_io_flush_cb, I think it no longer needs to check the
have_iostats variable, and that check should be moved to the new
pgstat_flush_io(). The same applies to pgstat_wal_flush_cb() (and
pgstat_flush_wal()). As for pgstat_slru_flush_cb, it simply doesn't
need the check anymore.
Running Postgres on scissors with a read-only workload that does not
trigger stats, I was not able to see a difference in runtime, but that
was on my own laptop, and I am planning to do more measurements on a
bigger machine.
I don't think it matters, since the actual flushing occurs at
10-second intervals during busy times. We could change the check from
a callback to a variable, but that would just shift the function call
overhead to a more frequently called side.
This is in the same line of thoughts as the recent thread about the
backend init callback, generalizing more the whole facility:
/messages/by-id/ZtZr1K4PLdeWclXY@paquier.xyzLike the other one, I wanted to send that a few days ago, but well,
life likes going its own ways sometimes.
reards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
On Wed, Sep 04, 2024 at 02:05:46PM +0900, Kyotaro Horiguchi wrote:
At Tue, 3 Sep 2024 13:48:59 +0900, Michael Paquier <michael@paquier.xyz> wrote in
Hi all,
The last TODO item I had in my bucket about the generalization of
pgstats is the option to a better control on the flush of the stats
depending on the kind for fixed-numbered stats. Currently, this is
controlled by pgstat_report_stat(), that includes special handling for
WAL, IO and SLRU stats, with two generic concepts:
- Check if there are pending entries, allowing a fast-path exit.
- Do the actual flush, with a recheck on pending entries.SLRU and IO control that with one variable each, and WAL uses a
routine for the same called pgstat_have_pending_wal(). Please find
attached a patch to generalize the concept, with two new callbacks
that can be used for fixed-numbered stats. SLRU, IO and WAL are
switched to use these (the two pgstat_flush_* routines have been kept
on purpose). This brings some clarity in the code, by making
have_iostats and have_slrustats static in their respective files. The
two pgstat_flush_* wrappers do not need a boolean as return result.The generalization sounds good to me, and hiding the private flags in
private places also seems good.
+1 on the idea.
Regarding pgstat_io_flush_cb, I think it no longer needs to check the
have_iostats variable, and that check should be moved to the new
pgstat_flush_io(). The same applies to pgstat_wal_flush_cb() (and
pgstat_flush_wal()). As for pgstat_slru_flush_cb, it simply doesn't
need the check anymore.
Agree. Another option could be to add only one callback (the flush_fixed_cb one)
and get rid of the have_fixed_pending_cb one. Then add an extra parameter to the
flush_fixed_cb that would only check if there is pending stats (without
flushing them). I think those 2 callbacks are highly related that's why I
think we could "merge" them, thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Sep 04, 2024 at 05:28:56AM +0000, Bertrand Drouvot wrote:
On Wed, Sep 04, 2024 at 02:05:46PM +0900, Kyotaro Horiguchi wrote:
The generalization sounds good to me, and hiding the private flags in
private places also seems good.+1 on the idea.
Thanks for the feedback.
Regarding pgstat_io_flush_cb, I think it no longer needs to check the
have_iostats variable, and that check should be moved to the new
pgstat_flush_io(). The same applies to pgstat_wal_flush_cb() (and
pgstat_flush_wal()). As for pgstat_slru_flush_cb, it simply doesn't
need the check anymore.Agree. Another option could be to add only one callback (the flush_fixed_cb one)
and get rid of the have_fixed_pending_cb one. Then add an extra parameter to the
flush_fixed_cb that would only check if there is pending stats (without
flushing them). I think those 2 callbacks are highly related that's why I
think we could "merge" them, thoughts?
I would still value the shortcut that we can use if there is no
activity to avoid the clock check with GetCurrentTimestamp(), though,
which is why I'd rather stick with two callbacks as that can lead to a
much cheaper path.
Anyway, I am not sure to follow your point about removing the boolean
checks in the flush callbacks. It's surely always a better to never
call LWLockAcquire() or LWLockConditionalAcquire() if we don't have
to, and we would go through the whole flush dance as long as we detect
some stats activity in at least one stats kind. I mean, that's cheap
enough to keep around.
--
Michael
Hi,
On Wed, Sep 04, 2024 at 03:12:37PM +0900, Michael Paquier wrote:
On Wed, Sep 04, 2024 at 05:28:56AM +0000, Bertrand Drouvot wrote:
On Wed, Sep 04, 2024 at 02:05:46PM +0900, Kyotaro Horiguchi wrote:
The generalization sounds good to me, and hiding the private flags in
private places also seems good.+1 on the idea.
Thanks for the feedback.
Regarding pgstat_io_flush_cb, I think it no longer needs to check the
have_iostats variable, and that check should be moved to the new
pgstat_flush_io(). The same applies to pgstat_wal_flush_cb() (and
pgstat_flush_wal()). As for pgstat_slru_flush_cb, it simply doesn't
need the check anymore.Agree. Another option could be to add only one callback (the flush_fixed_cb one)
and get rid of the have_fixed_pending_cb one. Then add an extra parameter to the
flush_fixed_cb that would only check if there is pending stats (without
flushing them). I think those 2 callbacks are highly related that's why I
think we could "merge" them, thoughts?I would still value the shortcut that we can use if there is no
activity to avoid the clock check with GetCurrentTimestamp(),
Agree. The idea was to add an additional parameter (say "check_only") to the
flush_fixed_cb. If this parameter is set to true then the flush_fixed_cb would
do nothing (no flush at all) but return a boolean that would indicate if there
is pending stats. In case it returns false, then we could avoid the clock check.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
At Wed, 4 Sep 2024 15:12:37 +0900, Michael Paquier <michael@paquier.xyz> wrote in
On Wed, Sep 04, 2024 at 05:28:56AM +0000, Bertrand Drouvot wrote:
On Wed, Sep 04, 2024 at 02:05:46PM +0900, Kyotaro Horiguchi wrote:
The generalization sounds good to me, and hiding the private flags in
private places also seems good.+1 on the idea.
Thanks for the feedback.
Regarding pgstat_io_flush_cb, I think it no longer needs to check the
have_iostats variable, and that check should be moved to the new
pgstat_flush_io(). The same applies to pgstat_wal_flush_cb() (and
pgstat_flush_wal()). As for pgstat_slru_flush_cb, it simply doesn't
need the check anymore.Agree. Another option could be to add only one callback (the flush_fixed_cb one)
and get rid of the have_fixed_pending_cb one. Then add an extra parameter to the
flush_fixed_cb that would only check if there is pending stats (without
flushing them). I think those 2 callbacks are highly related that's why I
think we could "merge" them, thoughts?I would still value the shortcut that we can use if there is no
activity to avoid the clock check with GetCurrentTimestamp(), though,
which is why I'd rather stick with two callbacks as that can lead to a
much cheaper path.
Yeah, I was wrong in this point.
Anyway, I am not sure to follow your point about removing the boolean
checks in the flush callbacks. It's surely always a better to never
call LWLockAcquire() or LWLockConditionalAcquire() if we don't have
to, and we would go through the whole flush dance as long as we detect
some stats activity in at least one stats kind. I mean, that's cheap
enough to keep around.
I doubt that overprotection is always better, but in this case, it's
not overprotection at all. The flush callbacks are called
unconditionally once we decide to flush anything. Sorry for the noise.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Sep 04, 2024 at 06:37:01PM +0900, Kyotaro Horiguchi wrote:
I doubt that overprotection is always better, but in this case, it's
not overprotection at all. The flush callbacks are called
unconditionally once we decide to flush anything. Sorry for the noise.
Yes, it's intended to be a fastpath, with checks happening if we have
no data pending for any variable-numbered stats.
--
Michael
On Wed, Sep 04, 2024 at 06:27:43AM +0000, Bertrand Drouvot wrote:
Agree. The idea was to add an additional parameter (say "check_only") to the
flush_fixed_cb. If this parameter is set to true then the flush_fixed_cb would
do nothing (no flush at all) but return a boolean that would indicate if there
is pending stats. In case it returns false, then we could avoid the clock check.
Hmm. It's the same thing in terms of logic, but I am not really
convinced that it is a good idea to mix a code path for a sanity check
with a code path dealing with the action, particularly considering
that there is the nowait option to think about in the flush callback
which would require one to think about 4 possible states rather than
two possibilities. So that's more error-prone for extension
developers, IMO.
At the end, I have used my original suggestions for the callbacks. If
there are voices in favor of something different, we still have a good
chunk of the development and beta cycle for that.
--
Michael