Summary function for pg_buffercache
Hi hackers,
Added a pg_buffercache_summary() function to retrieve an aggregated summary
information with less cost.
It's often useful to know only how many buffers are used, how many of them
are dirty etc. for monitoring purposes.
This info can already be retrieved by pg_buffercache. The extension
currently creates a row with many details for each buffer, then summary
info can be aggregated from that returned table.
But it is quite expensive to run regularly for monitoring.
The attached patch adds a pg_buffercache_summary() function to get this
summary info faster.
New function only collects following info and returns them in a single row:
- used_buffers = number of buffers with a valid relfilenode (both dirty and
not)
- unused_buffers = number of buffers with invalid relfilenode
- dirty_buffers = number of dirty buffers.
- pinned_buffers = number of buffers that have at least one pinning backend
(i.e. refcount > 0)
- average usagecount of used buffers
One other difference between pg_buffercache_summary and
pg_buffercache_pages is that pg_buffercache_summary does not get locks on
buffer headers as opposed to pg_buffercache_pages.
Since the purpose of pg_buffercache_summary is just to give us an overall
idea about shared buffers and to be a cheaper function, locks are not
strictly needed.
To compare pg_buffercache_summary() and pg_buffercache_pages(), I used a
simple query to aggregate the summary information above by calling
pg_buffercache_pages().
Here is the result:
postgres=# show shared_buffers;
shared_buffers
----------------
16GB
(1 row)
Time: 0.756 ms
postgres=# SELECT relfilenode <> 0 AS is_valid, isdirty, count(*) FROM
pg_buffercache GROUP BY relfilenode <> 0, isdirty;
is_valid | isdirty | count
----------+---------+---------
t | f | 209
| | 2096904
t | t | 39
(3 rows)
Time: 1434.870 ms (00:01.435)
postgres=# select * from pg_buffercache_summary();
used_buffers | unused_buffers | dirty_buffers | pinned_buffers |
avg_usagecount
--------------+----------------+---------------+----------------+----------------
248 | 2096904 | 39 | 0 |
3.141129
(1 row)
Time: 9.712 ms
There is a significant difference between timings of those two functions,
even though they return similar results.
I would appreciate any feedback/comment on this change.
Thanks,
Melih
Attachments:
0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=0001-Added-pg_buffercache_summary-function.patchDownload+108-5
Hi hackers,
I also added documentation changes into the patch.
You can find it attached.
I would appreciate any feedback about this pg_buffercache_summary function.
Best,
Melih
Attachments:
v2-0001-Added-pg_buffercache_summary-function.patchapplication/x-patch; name=v2-0001-Added-pg_buffercache_summary-function.patchDownload+215-8
Hi Melih,
I would appreciate any feedback/comment on this change.
Another benefit of pg_buffercache_summary() you didn't mention is that
it allocates much less memory than pg_buffercache_pages() does.
Here is v3 where I added this to the documentation. The patch didn't
apply to the current master branch with the following error:
```
pg_buffercache_pages.c:286:19: error: no member named 'rlocator' in
'struct buftag'
if (bufHdr->tag.rlocator.relNumber != InvalidOid)
~~~~~~~~~~~ ^
1 error generated.
```
I fixed this too. Additionally, the patch was pgindent'ed and some
typos were fixed.
However I'm afraid you can't examine BufferDesc's without taking
locks. This is explicitly stated in buf_internals.h:
"""
Buffer header lock (BM_LOCKED flag) must be held to EXAMINE or change
TAG, state or wait_backend_pgprocno fields.
"""
Let's consider this code again (this is after my fix):
```
if (RelFileNumberIsValid(BufTagGetRelNumber(bufHdr))) {
/* ... */
}
```
When somebody modifies relNumber concurrently (e.g. calls
ClearBufferTag()) this will cause an undefined behaviour.
I suggest we focus on saving the memory first and then think about the
performance, if necessary.
--
Best regards,
Aleksander Alekseev
Attachments:
v3-0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=v3-0001-Added-pg_buffercache_summary-function.patchDownload+213-9
On Fri, Sep 09, 2022 at 05:36:45PM +0300, Aleksander Alekseev wrote:
However I'm afraid you can't examine BufferDesc's without taking
locks. This is explicitly stated in buf_internals.h:
Yeah, when I glanced at this patch earlier, I wondered about this.
I suggest we focus on saving the memory first and then think about the
performance, if necessary.
+1
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Hi hackers,
I suggest we focus on saving the memory first and then think about the
performance, if necessary.+1
I made a mistake in v3 cfbot complained about. It should have been:
```
if (RelFileNumberIsValid(BufTagGetRelNumber(&bufHdr->tag)))
```
Here is the corrected patch.
--
Best regards,
Aleksander Alekseev
Attachments:
v4-0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=v4-0001-Added-pg_buffercache_summary-function.patchDownload+213-9
Hi Aleksander and Nathan,
Thanks for your comments.
Aleksander Alekseev <aleksander@timescale.com>, 9 Eyl 2022 Cum, 17:36
tarihinde şunu yazdı:
However I'm afraid you can't examine BufferDesc's without taking
locks. This is explicitly stated in buf_internals.h:"""
Buffer header lock (BM_LOCKED flag) must be held to EXAMINE or change
TAG, state or wait_backend_pgprocno fields.
"""
I wasn't aware of this explanation. Thanks for pointing it out.
When somebody modifies relNumber concurrently (e.g. calls
ClearBufferTag()) this will cause an undefined behaviour.
I thought that it wouldn't really be a problem even if relNumber is
modified concurrently, since the function does not actually rely on the
actual values.
I'm not sure about what undefined behaviour could harm this badly. It
seemed to me that it would read an invalid relNumber in the worst case
scenario.
But I'm not actually familiar with buffer related parts of the code, so I
might be wrong.
And I'm okay with taking header locks if necessary.
In the attached patch, I added buffer header locks just before examining
tag as follows:
+ buf_state = LockBufHdr(bufHdr);
+ + /* Invalid RelFileNumber means the buffer is unused */ + if (RelFileNumberIsValid(BufTagGetRelNumber(&bufHdr->tag))) + { ... + } ... + UnlockBufHdr(bufHdr, buf_state);
I suggest we focus on saving the memory first and then think about the
performance, if necessary.
+1
I again did the same quick benchmarking, here are the numbers with locks.
postgres=# show shared_buffers;
shared_buffers
----------------
16GB
(1 row)
postgres=# SELECT relfilenode <> 0 AS is_valid, isdirty, count(*) FROM
pg_buffercache GROUP BY relfilenode <> 0, isdirty;
is_valid | isdirty | count
----------+---------+---------
t | f | 256
| | 2096876
t | t | 20
(3 rows)
Time: 1024.456 ms (00:01.024)
postgres=# select * from pg_buffercache_summary();
used_buffers | unused_buffers | dirty_buffers | pinned_buffers |
avg_usagecount
--------------+----------------+---------------+----------------+----------------
282 | 2096870 | 20 | 0 |
3.4574468
(1 row)
Time: 33.074 ms
Yes, locks slowed pg_buffercache_summary down. But there is still quite a
bit of performance improvement, plus memory saving as you mentioned.
Here is the corrected patch.
Also thanks for corrections.
Best,
Melih
Attachments:
v5-0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=v5-0001-Added-pg_buffercache_summary-function.patchDownload+211-9
Hi Melih,
I'm not sure about what undefined behaviour could harm this badly.
You are right that in practice nothing wrong will (probably) happen on
x86/x64 architecture with (most?) modern C compilers. This is not true in
the general case though. It's up to the compiler to decide how reading the
bufHdr->tag is going to be actually implemented. This can be one assembly
instruction or several instructions. This reading can be optimized-out if
the compiler believes the required value is already in the register, etc.
Since the result will be different depending on the assembly code used this
is an undefined behaviour and we can't use code like this.
In the attached patch, I added buffer header locks just before examining
tag as follows
Many thanks for the updated patch! It looks better now.
However I have somewhat mixed feelings about avg_usagecount. Generally
AVG() is a relatively useless methric for monitoring. What if the user
wants MIN(), MAX() or let's say a 99th percentile? I suggest splitting it
into usagecount_min, usagecount_max and usagecount_sum. AVG() can be
derived as usercount_sum / used_buffers.
Also I suggest changing the names of the columns in order to make them
consistent with the rest of the system. If you consider pg_stat_activity
and family [1]https://www.postgresql.org/docs/current/monitoring-stats.html you will notice that the columns are named
(entity)_(property), e.g. backend_xid, backend_type, client_addr, etc. So
instead of used_buffers and unused_buffers the naming should be
buffers_used and buffers_unused.
[1]: https://www.postgresql.org/docs/current/monitoring-stats.html
--
Best regards,
Aleksander Alekseev
Hello Aleksander,
I'm not sure about what undefined behaviour could harm this badly.
You are right that in practice nothing wrong will (probably) happen on
x86/x64 architecture with (most?) modern C compilers. This is not true in
the general case though. It's up to the compiler to decide how reading the
bufHdr->tag is going to be actually implemented. This can be one assembly
instruction or several instructions. This reading can be optimized-out if
the compiler believes the required value is already in the register, etc.
Since the result will be different depending on the assembly code used this
is an undefined behaviour and we can't use code like this.
Got it. Thanks for explaining.
However I have somewhat mixed feelings about avg_usagecount. Generally
AVG() is a relatively useless methric for monitoring. What if the user
wants MIN(), MAX() or let's say a 99th percentile? I suggest splitting it
into usagecount_min, usagecount_max and usagecount_sum. AVG() can be
derived as usercount_sum / used_buffers.
Won't be usagecount_max almost always 5 as "BM_MAX_USAGE_COUNT" set to 5 in
buf_internals.h? I'm not sure about how much usagecount_min would add
either.
A usagecount is always an integer between 0 and 5, it's not
something unbounded. I think the 99th percentile would be much better than
average if strong outlier values could occur. But in this case, I feel like
an average value would be sufficiently useful as well.
usagecount_sum would actually be useful since average can be derived from
it. If you think that the sum of usagecounts has a meaning just by itself,
it makes sense to include it. Otherwise, wouldn't showing directly averaged
value be more useful?
Also I suggest changing the names of the columns in order to make them
consistent with the rest of the system. If you consider pg_stat_activity
and family [1] you will notice that the columns are named
(entity)_(property), e.g. backend_xid, backend_type, client_addr, etc. So
instead of used_buffers and unused_buffers the naming should be
buffers_used and buffers_unused.[1]: https://www.postgresql.org/docs/current/monitoring-stats.html
You're right. I will change the names accordingly. Thanks.
Regards,
Melih
Hi,
On 2022-09-09 17:36:45 +0300, Aleksander Alekseev wrote:
I suggest we focus on saving the memory first and then think about the
performance, if necessary.
Personally I think the locks part is at least as important - it's what makes
the production impact higher.
Greetings,
Andres Freund
Hi,
Also I suggest changing the names of the columns in order to make them
consistent with the rest of the system. If you consider pg_stat_activity
and family [1] you will notice that the columns are named
(entity)_(property), e.g. backend_xid, backend_type, client_addr, etc. So
instead of used_buffers and unused_buffers the naming should be
buffers_used and buffers_unused.[1]: https://www.postgresql.org/docs/current/monitoring-stats.html
I changed these names and updated the patch.
However I have somewhat mixed feelings about avg_usagecount. Generally
AVG() is a relatively useless methric for monitoring. What if the user
wants MIN(), MAX() or let's say a 99th percentile? I suggest splitting it
into usagecount_min, usagecount_max and usagecount_sum. AVG() can be
derived as usercount_sum / used_buffers.Won't be usagecount_max almost always 5 as "BM_MAX_USAGE_COUNT" set to 5
in buf_internals.h? I'm not sure about how much usagecount_min would add
either.
A usagecount is always an integer between 0 and 5, it's not
something unbounded. I think the 99th percentile would be much better than
average if strong outlier values could occur. But in this case, I feel like
an average value would be sufficiently useful as well.
usagecount_sum would actually be useful since average can be derived from
it. If you think that the sum of usagecounts has a meaning just by itself,
it makes sense to include it. Otherwise, wouldn't showing directly averaged
value be more useful?
Aleksander, do you still think the average usagecount is a bit useless? Or
does it make sense to you to keep it like this?
I suggest we focus on saving the memory first and then think about the
performance, if necessary.
Personally I think the locks part is at least as important - it's what
makes
the production impact higher.
I agree that it's important due to its high impact. I'm not sure how to
avoid any undefined behaviour without locks though.
Even with locks, performance is much better. But is it good enough for
production?
Thanks,
Melih
Attachments:
v6-0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=v6-0001-Added-pg_buffercache_summary-function.patchDownload+210-9
Hi Melih,
I changed these names and updated the patch.
Thanks for the updated patch!
Aleksander, do you still think the average usagecount is a bit useless? Or does it make sense to you to keep it like this?
I don't mind keeping the average.
I'm not sure how to avoid any undefined behaviour without locks though.
Even with locks, performance is much better. But is it good enough for production?
Potentially you could avoid taking locks by utilizing atomic
operations and lock-free algorithms. But these algorithms are
typically error-prone and not always produce a faster code than the
lock-based ones. I'm pretty confident this is out of scope of this
particular patch.
The patch v6 had several defacts:
* Trailing whitespaces (can be checked by applying the patch with `git am`)
* Wrong code formatting (can be fixed with pgindent)
* Several empty lines were removed which is not related to the
proposed change (can be seen with `git diff`)
* An unlikely division by zero if buffers_used = 0
* Missing part of the commit message added in v4
Here is a corrected patch v7. To me it seems to be in pretty good
shape, unless cfbot and/or other hackers will report any issues.
--
Best regards,
Aleksander Alekseev
Attachments:
v7-0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=v7-0001-Added-pg_buffercache_summary-function.patchDownload+212-7
Hi hackers,
Here is a corrected patch v7. To me it seems to be in pretty good
shape, unless cfbot and/or other hackers will report any issues.
There was a missing empty line in pg_buffercache.out which made the
tests fail. Here is a corrected v8 patch.
--
Best regards,
Aleksander Alekseev
Attachments:
v8-0001-Added-pg_buffercache_summary-function.patchapplication/octet-stream; name=v8-0001-Added-pg_buffercache_summary-function.patchDownload+213-7
Aleksander Alekseev <aleksander@timescale.com>, 20 Eyl 2022 Sal, 13:57
tarihinde şunu yazdı:
There was a missing empty line in pg_buffercache.out which made the
tests fail. Here is a corrected v8 patch.
I was just sending a corrected patch without the missing line.
Thanks a lot for all these reviews and the corrected patch.
Best,
Melih
Hi,
Seems like cfbot tests are passing now:
https://cirrus-ci.com/build/4727923671302144
Best,
Melih
Melih Mutlu <m.melihmutlu@gmail.com>, 20 Eyl 2022 Sal, 14:00 tarihinde şunu
yazdı:
Show quoted text
Aleksander Alekseev <aleksander@timescale.com>, 20 Eyl 2022 Sal, 13:57
tarihinde şunu yazdı:There was a missing empty line in pg_buffercache.out which made the
tests fail. Here is a corrected v8 patch.I was just sending a corrected patch without the missing line.
Thanks a lot for all these reviews and the corrected patch.
Best,
Melih
Hi,
Correct me if I’m wrong.
The doc says we don’t take lock during pg_buffercache_summary, but I see locks in the v8 patch, Isn’t it?
```
Similar to <function>pg_buffercache_pages</function> function
<function>pg_buffercache_summary</function> doesn't take buffer manager
locks, thus the result is not consistent across all buffers. This is
intentional. The purpose of this function is to provide a general idea about
the state of shared buffers as fast as possible. Additionally,
<function>pg_buffercache_summary</function> allocates much less memory.
```
Regards,
Zhang Mingli
Show quoted text
On Sep 20, 2022, 20:10 +0800, Melih Mutlu <m.melihmutlu@gmail.com>, wrote:
Hi,
Seems like cfbot tests are passing now:
https://cirrus-ci.com/build/4727923671302144Best,
MelihMelih Mutlu <m.melihmutlu@gmail.com>, 20 Eyl 2022 Sal, 14:00 tarihinde şunu yazdı:
Aleksander Alekseev <aleksander@timescale.com>, 20 Eyl 2022 Sal, 13:57 tarihinde şunu yazdı:
There was a missing empty line in pg_buffercache.out which made the
tests fail. Here is a corrected v8 patch.I was just sending a corrected patch without the missing line.
Thanks a lot for all these reviews and the corrected patch.
Best,
Melih
Hi Zhang,
The doc says we don’t take lock during pg_buffercache_summary, but I see locks in the v8 patch, Isn’t it?
```
Similar to <function>pg_buffercache_pages</function> function
<function>pg_buffercache_summary</function> doesn't take buffer manager
locks [...]
```
Correct, the procedure doesn't take the locks of the buffer manager.
It does take the locks of every individual buffer.
I agree that the text is somewhat confusing, but it is consistent with
the current description of pg_buffercache [1]https://www.postgresql.org/docs/current/pgbuffercache.html. I think this is a
problem worth addressing but it also seems to be out of scope of the
proposed patch.
[1]: https://www.postgresql.org/docs/current/pgbuffercache.html
--
Best regards,
Aleksander Alekseev
Hi,
Regards,
Zhang Mingli
On Sep 20, 2022, 20:43 +0800, Aleksander Alekseev <aleksander@timescale.com>, wrote:
Correct, the procedure doesn't take the locks of the buffer manager.
It does take the locks of every individual buffer.
Ah, now I get it, thanks.
Hi Zhang,
Those are two different locks.
The locks that are taken in the patch are for buffer headers. This locks
only the current buffer and makes that particular buffer's info consistent
within itself.
However, the lock mentioned in the doc is for buffer manager which would
prevent changes on any buffer if it's held.
pg_buffercache_summary (and pg_buffercache_pages) does not hold buffer
manager lock. Therefore, consistency across all buffers is not guaranteed.
For pg_buffercache_pages, self-consistent buffer information is useful
since it shows each buffer separately.
For pg_buffercache_summary, even self-consistency may not matter much since
results are aggregated and we can't see individual buffer information.
Consistency across all buffers is also not a concern since its purpose is
to give an overall idea about the state of buffers.
I see that these two different locks in the same context can be confusing.
I hope it is a bit more clear now.
Best,
Melih
Show quoted text
Hi,
On Sep 20, 2022, 20:49 +0800, Melih Mutlu <m.melihmutlu@gmail.com>, wrote:
Hi Zhang,
Those are two different locks.
The locks that are taken in the patch are for buffer headers. This locks only the current buffer and makes that particular buffer's info consistent within itself.However, the lock mentioned in the doc is for buffer manager which would prevent changes on any buffer if it's held.
pg_buffercache_summary (and pg_buffercache_pages) does not hold buffer manager lock. Therefore, consistency across all buffers is not guaranteed.For pg_buffercache_pages, self-consistent buffer information is useful since it shows each buffer separately.
For pg_buffercache_summary, even self-consistency may not matter much since results are aggregated and we can't see individual buffer information.
Consistency across all buffers is also not a concern since its purpose is to give an overall idea about the state of buffers.I see that these two different locks in the same context can be confusing. I hope it is a bit more clear now.
Best,
Melih
Thanks for your explanation, LGTM.
Hi,
On 2022-09-20 12:45:24 +0300, Aleksander Alekseev wrote:
I'm not sure how to avoid any undefined behaviour without locks though.
Even with locks, performance is much better. But is it good enough for production?Potentially you could avoid taking locks by utilizing atomic
operations and lock-free algorithms. But these algorithms are
typically error-prone and not always produce a faster code than the
lock-based ones. I'm pretty confident this is out of scope of this
particular patch.
Why would you need lockfree operations? All you need to do is to read
BufferDesc->state into a local variable and then make decisions based on that?
+ for (int i = 0; i < NBuffers; i++) + { + BufferDesc *bufHdr; + uint32 buf_state; + + bufHdr = GetBufferDescriptor(i); + + /* Lock each buffer header before inspecting. */ + buf_state = LockBufHdr(bufHdr); + + /* Invalid RelFileNumber means the buffer is unused */ + if (RelFileNumberIsValid(BufTagGetRelNumber(&bufHdr->tag))) + { + buffers_used++; + usagecount_avg += BUF_STATE_GET_USAGECOUNT(buf_state); + + if (buf_state & BM_DIRTY) + buffers_dirty++; + } + else + buffers_unused++; + + if (BUF_STATE_GET_REFCOUNT(buf_state) > 0) + buffers_pinned++; + + UnlockBufHdr(bufHdr, buf_state); + }
I.e. instead of locking the buffer header as done above, this could just do
something along these lines:
BufferDesc *bufHdr;
uint32 buf_state;
bufHdr = GetBufferDescriptor(i);
buf_state = pg_atomic_read_u32(&bufHdr->state);
if (buf_state & BM_VALID)
{
buffers_used++;
usagecount_avg += BUF_STATE_GET_USAGECOUNT(buf_state);
if (buf_state & BM_DIRTY)
buffers_dirty++;
}
else
buffers_unused++;
if (BUF_STATE_GET_REFCOUNT(buf_state) > 0)
buffers_pinned++;
Without a memory barrier you can get very slightly "out-of-date" values of the
state, but that's fine in this case.
Greetings,
Andres Freund