Bypassing shared_buffers
Hello,
There is often a need to test particular queries executed in the worst-case
scenario, i.e. right after a server restart or with no or minimal amount of
data in shared buffers. In Postgres it's currently hard to achieve (other
than to restart the server completely to run a single query, which is not
practical). Is there a simple way to introduce a GUC variable that makes
queries bypass shared_buffers and always read from storage? It would make
testing like that orders of magnitude simpler. I mean, are there serious
technical obstacles or any other objections to that idea in principle?
Thanks,
-Vladimir Churyukin
To be clear, I'm talking about bypassing shared buffers for reading data /
indexes only, not about disabling it completely (which I guess is
impossible anyway).
-Vladimir Churyukin
On Wed, Jun 14, 2023 at 5:57 PM Vladimir Churyukin <vladimir@churyukin.com>
wrote:
Show quoted text
Hello,
There is often a need to test particular queries executed in the
worst-case scenario, i.e. right after a server restart or with no or
minimal amount of data in shared buffers. In Postgres it's currently hard
to achieve (other than to restart the server completely to run a single
query, which is not practical). Is there a simple way to introduce a GUC
variable that makes queries bypass shared_buffers and always read from
storage? It would make testing like that orders of magnitude simpler. I
mean, are there serious technical obstacles or any other objections to that
idea in principle?Thanks,
-Vladimir Churyukin
Vladimir Churyukin <vladimir@churyukin.com> writes:
There is often a need to test particular queries executed in the worst-case
scenario, i.e. right after a server restart or with no or minimal amount of
data in shared buffers. In Postgres it's currently hard to achieve (other
than to restart the server completely to run a single query, which is not
practical). Is there a simple way to introduce a GUC variable that makes
queries bypass shared_buffers and always read from storage? It would make
testing like that orders of magnitude simpler. I mean, are there serious
technical obstacles or any other objections to that idea in principle?
It's a complete non-starter. Pages on disk are not necessarily up to
date; but what is in shared buffers is.
regards, tom lane
Ok, got it, thanks.
Is there any alternative approach to measuring the performance as if the
cache was empty?
The goal is basically to calculate the max possible I/O time for a query,
to get a range between min and max timing.
It's ok if it's done during EXPLAIN ANALYZE call only, not for regular
executions.
One thing I can think of is even if the data in storage might be stale,
issue read calls from it anyway, for measuring purposes.
For EXPLAIN ANALYZE it should be fine as it doesn't return real data anyway.
Is it possible that some pages do not exist in storage at all? Is there a
different way to simulate something like that?
-Vladimir Churyukin
On Wed, Jun 14, 2023 at 6:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Vladimir Churyukin <vladimir@churyukin.com> writes:
There is often a need to test particular queries executed in the
worst-case
scenario, i.e. right after a server restart or with no or minimal amount
of
data in shared buffers. In Postgres it's currently hard to achieve (other
than to restart the server completely to run a single query, which is not
practical). Is there a simple way to introduce a GUC variable that makes
queries bypass shared_buffers and always read from storage? It would make
testing like that orders of magnitude simpler. I mean, are there serious
technical obstacles or any other objections to that idea in principle?It's a complete non-starter. Pages on disk are not necessarily up to
date; but what is in shared buffers is.regards, tom lane
On Thu, Jun 15, 2023 at 1:37 PM Vladimir Churyukin
<vladimir@churyukin.com> wrote:
Ok, got it, thanks.
Is there any alternative approach to measuring the performance as if the cache was empty?
There are two levels of cache. If you're on Linux you can ask it to
drop its caches by writing certain values to /proc/sys/vm/drop_caches.
For PostgreSQL's own buffer pool, it would be nice if someone would
extend the pg_prewarm extension to have a similar 'unwarm' operation,
for testing like that. But one thing you can do is just restart the
database cluster, or use pg_prewarm to fill its buffer pool up with
other stuff (and thus kick out the stuff you didn't want in there).
Thomas Munro <thomas.munro@gmail.com> writes:
There are two levels of cache. If you're on Linux you can ask it to
drop its caches by writing certain values to /proc/sys/vm/drop_caches.
For PostgreSQL's own buffer pool, it would be nice if someone would
extend the pg_prewarm extension to have a similar 'unwarm' operation,
for testing like that. But one thing you can do is just restart the
database cluster, or use pg_prewarm to fill its buffer pool up with
other stuff (and thus kick out the stuff you didn't want in there).
But that'd also have to push out any dirty buffers. I'm skeptical
that it'd be noticeably cheaper than stopping and restarting the
server.
regards, tom lane
Do you foresee any difficulties in implementation of the "unwarm"
operation? It requires a cache flush operation,
so I'm curious how complicated that is (probably there is a reason this is
not supported by Postgres by now? mssql and oracle support stuff like that
for a long time)
Cluster restart is not an option for us unfortunately, as it will be
required for each query pretty much, and there are a lot of them.
An ideal solution would be, if it's possible, to test it in parallel with
other activities...
Evicting all the other stuff using pg_prewarm is an interesting idea though
(if a large prewarm operation really evicts all the previously stored data
reliably).
It's a bit hacky, but thanks, I think it's possible to make this work with
some effort.
It will require exclusive access just for that testing, which is not ideal
but may work for us.
-Vladimir )churyukin
On Wed, Jun 14, 2023 at 7:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Show quoted text
On Thu, Jun 15, 2023 at 1:37 PM Vladimir Churyukin
<vladimir@churyukin.com> wrote:Ok, got it, thanks.
Is there any alternative approach to measuring the performance as if thecache was empty?
There are two levels of cache. If you're on Linux you can ask it to
drop its caches by writing certain values to /proc/sys/vm/drop_caches.
For PostgreSQL's own buffer pool, it would be nice if someone would
extend the pg_prewarm extension to have a similar 'unwarm' operation,
for testing like that. But one thing you can do is just restart the
database cluster, or use pg_prewarm to fill its buffer pool up with
other stuff (and thus kick out the stuff you didn't want in there).
It could be cheaper, if the testing is done for many SELECT queries
sequentially - you need to flush dirty buffers just once pretty much.
-Vladimir Churyukin
On Wed, Jun 14, 2023 at 7:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Thomas Munro <thomas.munro@gmail.com> writes:
There are two levels of cache. If you're on Linux you can ask it to
drop its caches by writing certain values to /proc/sys/vm/drop_caches.
For PostgreSQL's own buffer pool, it would be nice if someone would
extend the pg_prewarm extension to have a similar 'unwarm' operation,
for testing like that. But one thing you can do is just restart the
database cluster, or use pg_prewarm to fill its buffer pool up with
other stuff (and thus kick out the stuff you didn't want in there).But that'd also have to push out any dirty buffers. I'm skeptical
that it'd be noticeably cheaper than stopping and restarting the
server.regards, tom lane
On Thu, Jun 15, 2023 at 2:51 PM Vladimir Churyukin
<vladimir@churyukin.com> wrote:
Do you foresee any difficulties in implementation of the "unwarm" operation? It requires a cache flush operation,
so I'm curious how complicated that is (probably there is a reason this is not supported by Postgres by now? mssql and oracle support stuff like that for a long time)
If they have a way to kick individual relations out of the buffer
pool, then I suspect they have an efficient way to find the relevant
buffers. We'd have to scan the entire buffer pool, or (for small
relations), probe for blocks 0..n (when we know that n isn't too
high). We'll probably eventually get something tree-based, like
operating system kernels and perhaps those other databases use for
their own buffer pools, which is useful for I/O merging and for faster
DROP, but until then you'll face the same problem while implementing
unwarm, and you'd probably have to understand a lot of details about
bufmgr.c and add some new interfaces.
As Tom says, in the end it's going to work out much like restarting,
which requires a pleasing zero lines of new code, perhaps explaining
why no one has tried this before... Though of course you can be more
selective about which tables are zapped.
Cluster restart is not an option for us unfortunately, as it will be required for each query pretty much, and there are a lot of them.
An ideal solution would be, if it's possible, to test it in parallel with other activities...
Evicting all the other stuff using pg_prewarm is an interesting idea though (if a large prewarm operation really evicts all the previously stored data reliably).
It's a bit hacky, but thanks, I think it's possible to make this work with some effort.
It will require exclusive access just for that testing, which is not ideal but may work for us.
You can use pg_buffercache to check the current contents of the buffer
pool, to confirm that a relation you're interested in is gone.
https://www.postgresql.org/docs/current/pgbuffercache.html#PGBUFFERCACHE-COLUMNS
I guess another approach if you really want to write code to do this
would be to introduce a function that takes a buffer ID and
invalidates it, and then you could use queries of pg_buffercache to
drive it. It would simplify things greatly if you only supported
invalidating clean buffers, and then you could query pg_buffercache to
see if any dirty buffers are left and if so run a checkpoint and try
again or something like that...
Another thing I have wondered about while hacking on I/O code is
whether pg_prewarm should also have an unwarm-the-kernel-cache thing.
There is that drop_cache thing, but that's holus bolus and Linux-only.
Perhaps POSIX_FADV_WONTNEED could be used for this, though that would
seem to require a double decker bus-sized layering violation.
On 15.06.2023 4:37 AM, Vladimir Churyukin wrote:
Ok, got it, thanks.
Is there any alternative approach to measuring the performance as if
the cache was empty?
The goal is basically to calculate the max possible I/O time for a
query, to get a range between min and max timing.
It's ok if it's done during EXPLAIN ANALYZE call only, not for regular
executions.
One thing I can think of is even if the data in storage might be
stale, issue read calls from it anyway, for measuring purposes.
For EXPLAIN ANALYZE it should be fine as it doesn't return real data
anyway.
Is it possible that some pages do not exist in storage at all? Is
there a different way to simulate something like that?
I do not completely understand what you want to measure: how fast cache
be prewarmed or what is the performance
when working set doesn't fit in memory?
Why not changing `shared_buffers` size to some very small values (i.e.
1MB) doesn't work?
As it was already noticed, there are levels of caching: shared buffers
and OS file cache.
By reducing size of shared buffers you rely mostly on OS file cache.
And actually there is no big gap in performance here - at most workloads
I didn't see more than 15% difference).
You can certainly flush OS cache `echo 3 > /proc/sys/vm/drop_caches` and
so simulate cold start.
But OS cached will be prewarmed quite fast (unlike shared buffer because
of strange Postgres ring-buffer strategies which cause eviction of pages
from shared buffers even if there is a lot of free space).
So please more precisely specify the goal of your experiment.
"max possible I/O time for a query" depends on so many factors...
Do you consider just one client working in isolation or there will be
many concurrent queries and background tasks like autovacuum and
checkpointer competing for the resources?
My point is that if you need some deterministic result then you will
have to exclude a lot of different factors which may affect performance
and then ... you calculate speed of horse in vacuum, which has almost no
relation to real performance.
On Thu, Jun 15, 2023 at 12:32 AM Konstantin Knizhnik <knizhnik@garret.ru>
wrote:
On 15.06.2023 4:37 AM, Vladimir Churyukin wrote:
Ok, got it, thanks.
Is there any alternative approach to measuring the performance as if
the cache was empty?
The goal is basically to calculate the max possible I/O time for a
query, to get a range between min and max timing.
It's ok if it's done during EXPLAIN ANALYZE call only, not for regular
executions.
One thing I can think of is even if the data in storage might be
stale, issue read calls from it anyway, for measuring purposes.
For EXPLAIN ANALYZE it should be fine as it doesn't return real data
anyway.
Is it possible that some pages do not exist in storage at all? Is
there a different way to simulate something like that?I do not completely understand what you want to measure: how fast cache
be prewarmed or what is the performance
when working set doesn't fit in memory?
No, it's not about working set or prewarming speed.
We're trying to see what is the worst performance in terms of I/O, i.e.
when the database just started up or the data/indexes being queried are not
cached at all.
Why not changing `shared_buffers` size to some very small values (i.e.
1MB) doesn't work?
As it was already noticed, there are levels of caching: shared buffers
and OS file cache.
By reducing size of shared buffers you rely mostly on OS file cache.
And actually there is no big gap in performance here - at most workloads
I didn't see more than 15% difference).
I thought about the option of setting minimal shared_buffers, but it
requires a server restart anyway, something I'd like to avoid.
You can certainly flush OS cache `echo 3 > /proc/sys/vm/drop_caches` and
so simulate cold start.
But OS cached will be prewarmed quite fast (unlike shared buffer because
of strange Postgres ring-buffer strategies which cause eviction of pages
from shared buffers even if there is a lot of free space).So please more precisely specify the goal of your experiment.
"max possible I/O time for a query" depends on so many factors...
Do you consider just one client working in isolation or there will be
many concurrent queries and background tasks like autovacuum and
checkpointer competing for the resources?
My point is that if you need some deterministic result then you will
have to exclude a lot of different factors which may affect performance
and then ... you calculate speed of horse in vacuum, which has almost no
relation to real performance.
Exactly, we need more or less deterministic results for how bad I/O timings
can be.
Even though it's not necessarily the numbers we will be getting in real
life, it gives us ideas about distribution,
and it's useful because we care about the long tail (p99+) of our queries.
For simplicity let's say it will be a single client only (it will be hard
to do the proposed solutions reliably with other stuff running in parallel
anyway).
-Vladimir Churyukin
On Thu, Jun 15, 2023 at 4:16 AM Vladimir Churyukin <vladimir@churyukin.com>
wrote:
We're trying to see what is the worst performance in terms of I/O, i.e.
when the database just started up or the data/indexes being queried are not
cached at all.
You could create new tables that are copies of the existing ones (CREATE
TABLE foo as SELECT * FROM ...), create new indexes, and run a query on
those. Use schemas and search_path to keep the queries the same. No restart
needed! (just potentially lots of I/O, time, and disk space :) Don't forget
to do explain (analyze, buffers) to double check things.
Hi!
On 15 Jun 2023, at 03:57, Vladimir Churyukin <vladimir@churyukin.com> wrote:
Hello,
There is often a need to test particular queries executed in the worst-case scenario, i.e. right after a server restart or with no or minimal amount of data in shared buffers. In Postgres it's currently hard to achieve (other than to restart the server completely to run a single query, which is not practical). Is there a simple way to introduce a GUC variable that makes queries bypass shared_buffers and always read from storage? It would make testing like that orders of magnitude simpler. I mean, are there serious technical obstacles or any other objections to that idea in principle?
Few months ago I implemented "drop of caches" to demonstrate basic structure of shared buffers [0]https://www.youtube.com/watch?v=u8BAOqeKnwY. The patch is very unsafe in the form is was implemented, but if you think that functionality is really useful (it was not intended to be) I can try to do the same as extension.
it worked like "SELECT FlushAllBuffers();" and what is done resembles checkpoint, but evicts every buffer that can be evicted. Obviously, emptied buffers would be immediately reused by concurrent sessions.
Best regards, Andrey Borodin.