Add tracking of backend memory allocated to pg_stat_activity
Hi Hackers,
Attached is a patch to
Add tracking of backend memory allocated to pg_stat_activity
This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included
in the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
001-pg-stat-activity-backend-memory-allocated.patchtext/x-patch; charset=UTF-8; name=001-pg-stat-activity-backend-memory-allocated.patchDownload+269-9
On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
Hi Hackers,
Attached is a patch to
Add tracking of backend memory allocated to pg_stat_activity
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years. This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information? Like adding backends to
pg_get_backend_memory_contexts() , maybe with another view on top of
that ?
+ * shown allocated in pgstat_activity when the creator destroys the
pg_stat
+ * Posix creation calls dsm_impl_posix_resize implying that resizing + * occurs or may be added in the future. As implemented + * dsm_impl_posix_resize utilizes fallocate or truncate, passing the + * whole new size as input, growing the allocation as needed * (only + * truncate supports shrinking). We update by replacing the * old
wrapping caused extraneous stars
+ * Do not allow backend_mem_allocated to go below zero. ereport if we + * would have. There's no need for a lock around the read here asit's
as it's
+ ereport(LOG, (errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0")));
errmsg() doesn't require the outside paranthesis since a couple years
ago.
+ /* + * Until header allocation is included in context->mem_allocated cast to + * slab and decrement the headerSize
add a comma before "cast" ?
--
Justin
At Wed, 31 Aug 2022 12:05:55 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
Hi Hackers,
Attached is a patch to
Add tracking of backend memory allocated to pg_stat_activity+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years. This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information? Like adding backends to
The view looks already too wide to me. I don't want the numbers for
metrics are added to the view.
pg_get_backend_memory_contexts() , maybe with another view on top of
that ?
+1
+ * shown allocated in pgstat_activity when the creator destroys the
pg_stat
+ * Posix creation calls dsm_impl_posix_resize implying that resizing + * occurs or may be added in the future. As implemented + * dsm_impl_posix_resize utilizes fallocate or truncate, passing the + * whole new size as input, growing the allocation as needed * (only + * truncate supports shrinking). We update by replacing the * oldwrapping caused extraneous stars
+ * Do not allow backend_mem_allocated to go below zero. ereport if we + * would have. There's no need for a lock around the read here asit'sas it's
+ ereport(LOG, (errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0")));
errmsg() doesn't require the outside paranthesis since a couple years
ago.
+1
+ /* + * Until header allocation is included in context->mem_allocated cast to + * slab and decrement the headerSizeadd a comma before "cast" ?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Wed, 31 Aug 2022 12:03:06 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
Attached is a patch to
Add tracking of backend memory allocated to pg_stat_activity
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;context->mem_allocated += blksize;
+ pgstat_report_backend_mem_allocated_increase(blksize);
I'm not sure this is acceptable. The function adds a branch even when
the feature is turned off, which I think may cause a certain extent of
performance degradation. A past threads [1]/messages/by-id/1434311039.4369.39.camel@jeff-desktop, [2]/messages/by-id/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com and [3]/messages/by-id/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com might be
informative.
[1]: /messages/by-id/1434311039.4369.39.camel@jeff-desktop
[2]: /messages/by-id/72a656e0f71d0860161e0b3f67e4d771@oss.nttdata.com
[3]: /messages/by-id/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi,
On 9/1/22 3:28 AM, Kyotaro Horiguchi wrote:
At Wed, 31 Aug 2022 12:05:55 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
Hi Hackers,
Attached is a patch to
Add tracking of backend memory allocated
Thanks for the patch.
+ 1 on the idea.
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years. This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information? Like adding backends toThe view looks already too wide to me. I don't want the numbers for
metrics are added to the view.
+1 for a dedicated view.
While we are at it, what do you think about also recording the max
memory allocated by a backend? (could be useful and would avoid sampling
for which there is no guarantee to sample the max anyway).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, 2022-08-31 at 12:05 -0500, Justin Pryzby wrote:
+ proargmodes =>
'{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}'
,In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years. This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information? Like adding backends
to
pg_get_backend_memory_contexts() , maybe with another view on top of
I will take a look at pg_get_backend_memory_contexts. I will also look
at the other suggestions in the thread.
+ * shown allocated in pgstat_activity when the
pg_stat
Corrected,
replacing the * old
wrapping caused extraneous stars
Corrected
here asit's
as it's
Corrected
errmsg() doesn't require the outside paranthesis since a couple years
ago.
Corrected
mem_allocated cast to
add a comma before "cast" ?
Corrected
Patch with the corrections attached
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload+269-10
On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_backend_mem_allocated_increase(blksi ze);I'm not sure this is acceptable. The function adds a branch even when
the feature is turned off, which I think may cause a certain extent
of
performance degradation. A past threads [1], [2] and [3] might be
informative.
Stated above is '...even when the feature is turned off...', I want to
note that this feature/patch (for tracking memory allocated) doesn't
have an 'on/off'. Tracking would always occur.
I'm open to guidance on testing for performance degradation. I did
note some basic pgbench comparison numbers in the thread regarding
limiting backend memory allocations.
Show quoted text
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Tue, 06 Sep 2022 17:10:49 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_backend_mem_allocated_increase(blksi ze);I'm not sure this is acceptable. The function adds a branch even when
the feature is turned off, which I think may cause a certain extent
of
performance degradation. A past threads [1], [2] and [3] might be
informative.Stated above is '...even when the feature is turned off...', I want to
note that this feature/patch (for tracking memory allocated) doesn't
have an 'on/off'. Tracking would always occur.
In the patch, I see that
pgstat_report_backend_mem_allocated_increase() runs the following
code, which seems like to me to be a branch..
+ if (!beentry || !pgstat_track_activities)
+ {
+ /*
+ * Account for memory before pgstats is initialized. This will be
+ * migrated to pgstats on initialization.
+ */
+ backend_mem_allocated += allocation;
+
+ return;
+ }
I'm open to guidance on testing for performance degradation. I did
note some basic pgbench comparison numbers in the thread regarding
limiting backend memory allocations.
Yeah.. That sounds good..
(I have a patch that is stuck at benchmarking on slight possible
degradation caused by a branch (or indirect call) on a hot path
similary to this one. The test showed fluctuation that is not clearly
distinguishable between noise and degradation by running the target
functions in a busy loop..)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Wed, 07 Sep 2022 17:08:41 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
At Tue, 06 Sep 2022 17:10:49 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
On Thu, 2022-09-01 at 13:43 +0900, Kyotaro Horiguchi wrote:
@@ -916,6 +930,7 @@ AllocSetAlloc(MemoryContext context, Size size) return NULL; context->mem_allocated += blksize; + pgstat_report_backend_mem_allocated_increase(blksi ze);I'm not sure this is acceptable. The function adds a branch even when
the feature is turned off, which I think may cause a certain extent
of
performance degradation. A past threads [1], [2] and [3] might be
informative.Stated above is '...even when the feature is turned off...', I want to
note that this feature/patch (for tracking memory allocated) doesn't
have an 'on/off'. Tracking would always occur.In the patch, I see that
pgstat_report_backend_mem_allocated_increase() runs the following
code, which seems like to me to be a branch..
Ah.. sorry.
pgstat_report_backend_mem_allocated_increase() runs the following
- code, which seems like to me to be a branch..
+ code, which seems like to me to be a branch that can turn of the feature..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Greetings,
* Drouvot, Bertrand (bdrouvot@amazon.com) wrote:
On 9/1/22 3:28 AM, Kyotaro Horiguchi wrote:
At Wed, 31 Aug 2022 12:05:55 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
On Wed, Aug 31, 2022 at 12:03:06PM -0400, Reid Thompson wrote:
Attached is a patch to
Add tracking of backend memory allocatedThanks for the patch.
+ 1 on the idea.
Glad folks are in support of the general idea.
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
In the past, there was concern about making pg_stat_activity wider by
adding information that's less-essential than what's been there for
years. This is only an int64, so it's not "wide", but I wonder if
there's another way to expose this information? Like adding backends toThe view looks already too wide to me. I don't want the numbers for
metrics are added to the view.+1 for a dedicated view.
A dedicated view with a single column in it hardly seems sensible. I'd
also argue that this particular bit of information is extremely useful
and therefore worthy of being put directly into pg_stat_activity. I
could see a dedicated view possibly *also* being added later if/when we
provide a more detailed break-down of how the memory is being used but
that's a whole other thing and I'm not even 100% sure we'll ever
actually get there, as you can already poke a backend and have it dump
out the memory context-level information on an as-needed basis.
While we are at it, what do you think about also recording the max memory
allocated by a backend? (could be useful and would avoid sampling for which
there is no guarantee to sample the max anyway).
What would you do with that information..? By itself, it doesn't strike
me as useful. Perhaps it'd be interesting to grab the max required for
a particular query in pg_stat_statements or such but again, that's a
very different thing.
Thanks,
Stephen
Greetings,
* Kyotaro Horiguchi (horikyota.ntt@gmail.com) wrote:
At Tue, 06 Sep 2022 17:10:49 -0400, Reid Thompson <reid.thompson@crunchydata.com> wrote in
I'm open to guidance on testing for performance degradation. I did
note some basic pgbench comparison numbers in the thread regarding
limiting backend memory allocations.Yeah.. That sounds good..
(I have a patch that is stuck at benchmarking on slight possible
degradation caused by a branch (or indirect call) on a hot path
similary to this one. The test showed fluctuation that is not clearly
distinguishable between noise and degradation by running the target
functions in a busy loop..)
Just to be clear- this path is (hopefully) not *super* hot as we're only
tracking actual allocations (that is- malloc() calls), this isn't
changing anything for palloc() calls that aren't also needing to do a
malloc(), and we already try to reduce the amount of malloc() calls
we're doing by allocating more and more each time we run out in a given
context.
While I'm generally supportive of doing some benchmarking around this, I
don't think the bar is as high as it would be if we were actually
changing the cost of routine palloc() or such calls.
Thanks,
Stephen
On Fri, Sep 09, 2022 at 12:34:15PM -0400, Stephen Frost wrote:
While we are at it, what do you think about also recording the max memory
allocated by a backend? (could be useful and would avoid sampling for which
there is no guarantee to sample the max anyway).
FYI, that's already kind-of available from getrusage:
$ psql ts -c "SET log_executor_stats=on; SET client_min_messages=debug;
SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1;" |wc
LOG: EXECUTOR STATISTICS
...
! 194568 kB max resident size
Note that max rss counts things allocated outside postgres (like linked
libraries).
What would you do with that information..? By itself, it doesn't strike
me as useful. Perhaps it'd be interesting to grab the max required for
a particular query in pg_stat_statements or such but again, that's a
very different thing.
log_executor_stats is at level "debug", so it's not great to enable it
for a single session, and painful to think about enabling it globally.
This would be a lot friendlier.
Storing the maxrss per backend somewhere would be useful (and avoid the
issue of "sampling" with top), after I agree that it ought to be exposed
to a view. For example, it might help to determine whether (and which!)
backends are using large multiple of work_mem, and then whether that can
be increased. If/when we had a "memory budget allocator", this would
help to determine how to set its GUCs, maybe to see "which backends are
using the work_mem that are precluding this other backend from using
efficient query plan".
I wonder if it's better to combine these two threads into one. The 0001
patch of course can be considered independently from the 0002 patch, as
usual. Right now, there's different parties on both threads ...
--
Justin
Hi,
On 9/9/22 7:08 PM, Justin Pryzby wrote:
On Fri, Sep 09, 2022 at 12:34:15PM -0400, Stephen Frost wrote:
While we are at it, what do you think about also recording the max memory
allocated by a backend? (could be useful and would avoid sampling for which
there is no guarantee to sample the max anyway).What would you do with that information..? By itself, it doesn't strike
me as useful. Perhaps it'd be interesting to grab the max required for
a particular query in pg_stat_statements or such but again, that's a
very different thing.Storing the maxrss per backend somewhere would be useful (and avoid the
issue of "sampling" with top), after I agree that it ought to be exposed
to a view. For example, it might help to determine whether (and which!)
backends are using large multiple of work_mem, and then whether that can
be increased. If/when we had a "memory budget allocator", this would
help to determine how to set its GUCs, maybe to see "which backends are
using the work_mem that are precluding this other backend from using
efficient query plan".
+1.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com
Greetings,
* Drouvot, Bertrand (bdrouvot@amazon.com) wrote:
On 9/9/22 7:08 PM, Justin Pryzby wrote:
On Fri, Sep 09, 2022 at 12:34:15PM -0400, Stephen Frost wrote:
While we are at it, what do you think about also recording the max memory
allocated by a backend? (could be useful and would avoid sampling for which
there is no guarantee to sample the max anyway).What would you do with that information..? By itself, it doesn't strike
me as useful. Perhaps it'd be interesting to grab the max required for
a particular query in pg_stat_statements or such but again, that's a
very different thing.Storing the maxrss per backend somewhere would be useful (and avoid the
issue of "sampling" with top), after I agree that it ought to be exposed
to a view. For example, it might help to determine whether (and which!)
backends are using large multiple of work_mem, and then whether that can
be increased. If/when we had a "memory budget allocator", this would
help to determine how to set its GUCs, maybe to see "which backends are
using the work_mem that are precluding this other backend from using
efficient query plan".+1.
I still have a hard time seeing the value in tracking which backends are
using the most memory over the course of a backend's entire lifetime,
which would involve lots of different queries, some of which might use
many multiples of work_mem and others not. Much more interesting would
be to track this as part of pg_stat_statements and associated with
queries.
Either way, this looks like an independent feature which someone who has
interest in could work on but generally doesn't impact what the feature
of this thread is about; a feature which has already shown merit in
finding a recently introduced memory leak and is the basis of another
feature being contemplated to help avoid OOM-killer introduced crashes.
Thanks,
Stephen
patch rebased to current master
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
On Tue, 2022-10-25 at 14:51 -0400, Reid Thompson wrote:
patch rebased to current master
actually attach the patch
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload+269-10
Hi,
On 10/25/22 8:59 PM, Reid Thompson wrote:
On Tue, 2022-10-25 at 14:51 -0400, Reid Thompson wrote:
patch rebased to current master
actually attach the patch
It looks like the patch does not apply anymore since b1099eca8f.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Fri, 2022-11-04 at 11:06 +0100, Drouvot, Bertrand wrote:
Hi,
It looks like the patch does not apply anymore since b1099eca8f.
Regards,
Thanks,
rebased to current master attached.
--
Reid Thompson
Senior Software Engineer
Crunchy Data, Inc.
reid.thompson@crunchydata.com
www.crunchydata.com
Attachments:
0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchtext/x-patch; charset=UTF-8; name=0001-Add-tracking-of-backend-memory-allocated-to-pg_stat_.patchDownload+270-10
Hi,
On 2022-11-04 08:56:13 -0400, Reid Thompson wrote:
From a8de5d29c0c6f10962181926a49ad4fec1e52bd1 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activityThis new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
I'm not convinced that counting DSM values this way is quite right. There are
a few uses of DSMs that track shared resources, with the biggest likely being
the stats for relations etc. I suspect that tracking that via backend memory
usage will be quite confusing, because fairly random backends that had to grow
the shared state end up being blamed with the memory usage in perpituity - and
then suddenly that memory usage will vanish when that backend exits, despite
the memory continuing to exist.
@@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;context->mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -944,6 +958,7 @@ AllocSetAlloc(MemoryContext context, Size size)
return NULL;context->mem_allocated += blksize;
+ pgstat_report_backend_allocated_bytes_increase(blksize);block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1043,6 +1058,7 @@ AllocSetFree(void *pointer)
block->next->prev = block->prev;set->header.mem_allocated -= block->endptr - ((char *) block);
+ pgstat_report_backend_allocated_bytes_decrease(block->endptr - ((char *) block));#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
I suspect this will be noticable cost-wise. Even though these paths aren't the
hottest memory allocation paths, by nature of going down into malloc, adding
an external function call that then does a bunch of branching etc. seems
likely to add up to some. See below for how I think we can deal with that...
+ +/* -------- + * pgstat_report_backend_allocated_bytes_increase() - + * + * Called to report increase in memory allocated for this backend + * -------- + */ +void +pgstat_report_backend_allocated_bytes_increase(uint64 allocation) +{ + volatile PgBackendStatus *beentry = MyBEEntry; + + if (!beentry || !pgstat_track_activities) + { + /* + * Account for memory before pgstats is initialized. This will be + * migrated to pgstats on initialization. + */ + backend_allocated_bytes += allocation; + + return; + } + + /* + * Update my status entry, following the protocol of bumping + * st_changecount before and after. We use a volatile pointer here to + * ensure the compiler doesn't try to get cute. + */ + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); + beentry->backend_allocated_bytes += allocation; + PGSTAT_END_WRITE_ACTIVITY(beentry); +}
This is quite a few branches, including write/read barriers.
It doesn't really make sense to use the PGSTAT_BEGIN_WRITE_ACTIVITY() pattern
here - you're just updating a single value, there's nothing to be gained by
it. The point of PGSTAT_BEGIN_*ACTIVITY() stuff is to allow readers to get a
consistent view of multiple values - but there aren't multiple values here!
To avoid the overhead of checking (!beentry || !pgstat_track_activities) and
the external function call, I think you'd be best off copying the trickery I
introduced for pgstat_report_wait_start(), in 225a22b19ed.
I.e. make pgstat_report_backend_allocated_bytes_increase() a static inline
function that unconditionally updates something like
*my_backend_allocated_memory. To deal with the case of (!beentry ||
!pgstat_track_activities), that variable initially points to some backend
local state and is set to the shared state in pgstat_bestart().
This additionally has the nice benefit that you can track memory usage from
before pgstat_bestart(), it'll be in the local variable.
+void +pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation) +{ + volatile PgBackendStatus *beentry = MyBEEntry; + + /* + * Cases may occur where shared memory from a previous postmaster + * invocation still exist. These are cleaned up at startup by + * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to + * zero in case no corresponding prior increase exists or decrease has + * already been accounted for. + */
I don't really follow - postmaster won't ever have a backend status array, so
how would they be tracked here?
Greetings,
Andres Freund
On Fri, Nov 04, 2022 at 08:56:13AM -0400, Reid Thompson wrote:
From a8de5d29c0c6f10962181926a49ad4fec1e52bd1 Mon Sep 17 00:00:00 2001
From: Reid Thompson <jreidthompson@nc.rr.com>
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activityThis new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value. Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
doc/src/sgml/monitoring.sgml | 12 +++
src/backend/catalog/system_views.sql | 1 +
src/backend/storage/ipc/dsm_impl.c | 81 +++++++++++++++
src/backend/utils/activity/backend_status.c | 105 ++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 4 +-
src/backend/utils/mmgr/aset.c | 18 ++++
src/backend/utils/mmgr/generation.c | 15 +++
src/backend/utils/mmgr/slab.c | 21 ++++
src/include/catalog/pg_proc.dat | 6 +-
src/include/utils/backend_status.h | 7 +-
src/test/regress/expected/rules.out | 9 +-
11 files changed, 270 insertions(+), 9 deletions(-)diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index e5d622d514..972805b85a 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -947,6 +947,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </para></entry> </row>+ <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>backend_allocated_bytes</structfield> <type>bigint</type> + </para> + <para> + The byte count of memory allocated to this backend. Dynamic shared memory + allocations are included only in the value displayed for the backend that + created them, they are not included in the value for backends that are + attached to them to avoid double counting. + </para></entry> + </row> +
It doesn't seem like you need the backend_ prefix in the view since that
is implied by it being in pg_stat_activity.
For the wording on the description, I find "memory allocated to this
backend" a bit confusing. Perhaps you could reword it to make clear you
mean that the number represents the balance of allocations by this
backend. Something like:
Memory currently allocated to this backend in bytes. This is the
balance of bytes allocated and freed by this backend.
I would also link to the system administration function pg_size_pretty()
so users know how to easily convert the value.
If you end up removing shared memory as Andres suggests in [1]/messages/by-id/20221105024146.xxlbtsxh2niyz2fu@awork3.anarazel.de, I would link
pg_shmem_allocations view here and point out that shared memory allocations can
be viewed there instead (and why).
You could instead add dynamic shared memory allocation to the
pg_shmem_allocations view as suggested as follow-on work by the commit which
introduced it, ed10f32e3.
+/* -------- + * pgstat_report_backend_allocated_bytes_increase() - + * + * Called to report increase in memory allocated for this backend + * -------- + */
It seems like you could combine the
pgstat_report_backend_allocated_bytes_decrease/increase() by either using a
signed integer to represent the allocation/deallocation or passing in a
"direction" that is just a positive or negative multiplier enum.
Especially if you don't use the write barriers, I think you could
simplify the logic in the two functions.
If you do combine them, you might shorten the name to
pgstat_report_backend_allocation() or pgstat_report_allocation().
+void +pgstat_report_backend_allocated_bytes_increase(uint64 allocation) +{ + volatile PgBackendStatus *beentry = MyBEEntry; + + if (!beentry || !pgstat_track_activities) + { + /* + * Account for memory before pgstats is initialized. This will be + * migrated to pgstats on initialization. + */ + backend_allocated_bytes += allocation; + + return; + } + + /* + * Update my status entry, following the protocol of bumping + * st_changecount before and after. We use a volatile pointer here to + * ensure the compiler doesn't try to get cute. + */ + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); + beentry->backend_allocated_bytes += allocation; + PGSTAT_END_WRITE_ACTIVITY(beentry); +} + +/* -------- + * pgstat_report_backend_allocated_bytes_decrease() - + * + * Called to report decrease in memory allocated for this backend + * -------- + */ +void +pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation) +{ + volatile PgBackendStatus *beentry = MyBEEntry; + + /* + * Cases may occur where shared memory from a previous postmaster + * invocation still exist. These are cleaned up at startup by + * dsm_cleanup_using_control_segment. Limit decreasing memory allocated to + * zero in case no corresponding prior increase exists or decrease has + * already been accounted for. + */ + + if (!beentry || !pgstat_track_activities) + { + /* + * Account for memory before pgstats is initialized. This will be + * migrated to pgstats on initialization. Do not allow + * backend_allocated_bytes to go below zero. If pgstats has not been + * initialized, we are in startup and we set backend_allocated_bytes + * to zero in cases where it would go negative and skip generating an + * ereport. + */ + if (deallocation > backend_allocated_bytes) + backend_allocated_bytes = 0; + else + backend_allocated_bytes -= deallocation; + + return; + } + + /* + * Do not allow backend_allocated_bytes to go below zero. ereport if we + * would have. There's no need for a lock around the read here as it's + * being referenced from the same backend which means that there shouldn't + * be concurrent writes. We want to generate an ereport in these cases. + */ + if (deallocation > beentry->backend_allocated_bytes) + { + ereport(LOG, errmsg("decrease reduces reported backend memory allocated below zero; setting reported to 0")); +
I also think it would be nice to include the deallocation amount and
backend_allocated_bytes amount in the log message.
It also might be nice to start the message with something more clear
than "decrease".
For example, I would find this clear as a user:
Backend [backend_type or pid] deallocated [deallocation number] bytes,
[backend_allocated_bytes - deallocation number] more than this backend
has reported allocating.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 96bffc0f2a..b6d135ad2f 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -553,7 +553,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS) Datum pg_stat_get_activity(PG_FUNCTION_ARGS) { -#define PG_STAT_GET_ACTIVITY_COLS 30 +#define PG_STAT_GET_ACTIVITY_COLS 31 int num_backends = pgstat_fetch_stat_numbackends(); int curr_backend; int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0); @@ -609,6 +609,8 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) else nulls[16] = true;+ values[30] = UInt64GetDatum(beentry->backend_allocated_bytes);
Though not the fault of this patch, it is becoming very difficult to
keep the columns straight in pg_stat_get_activity(). Perhaps you could
add a separate commit to add an enum for the columns so the function is
easier to understand.
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index b582b46e9f..75d87e8308 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -169,6 +169,9 @@ typedef struct PgBackendStatus/* query identifier, optionally computed using post_parse_analyze_hook */ uint64 st_query_id; + + /* Current memory allocated to this backend */ + uint64 backend_allocated_bytes; } PgBackendStatus;
I don't think you need the backend_ prefix here since it is in
PgBackendStatus.
@@ -313,7 +316,9 @@ extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser); extern const char *pgstat_get_crashed_backend_activity(int pid, char *buffer, int buflen); extern uint64 pgstat_get_my_query_id(void); - +extern void pgstat_report_backend_allocated_bytes_increase(uint64 allocation); +extern void pgstat_report_backend_allocated_bytes_decrease(uint64 deallocation); +extern uint64 pgstat_get_all_backend_memory_allocated(void);/* ---------- * Support functions for the SQL-callable functions to diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 624d0e5aae..ba9f494806 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1753,10 +1753,11 @@ pg_stat_activity| SELECT s.datid, s.state, s.backend_xid, s.backend_xmin, + s.backend_allocated_bytes, s.query_id, s.query, s.backend_type
Seems like it would be possible to add a functional test to stats.sql.
- Melanie
[1]: /messages/by-id/20221105024146.xxlbtsxh2niyz2fu@awork3.anarazel.de