Option to not use ringbuffer in VACUUM, using it in failsafe mode

Started by Andres Freundover 3 years ago79 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

The use of the ringbuffer in VACUUM often causes very substantial slowdowns.

The primary reason for that is that often most/all the buffers in the
ringbuffer have been dirtied when they were processed, with an associated WAL
record. When we then reuse the buffer via the (quite small) ringbuffer, we
need to flush the WAL before reclaiming the buffer. A synchronous flush of
the WAL every few buffers ends up as a very significant bottleneck, unless you
have local low-latency durable storage (i.e. local storage with some sort of
power protection).

The slowdown caused by the frequent WAL flushes is very significant.

A secondary issue, when we end up doing multiple passes, we'll have to re-read
data into shared_buffers, when we just wrote it out and evicted it.

An example:

On a local SSD with decently fast fdatasync([1]according to pg_test_fsync: fdatasync 769.189 ops/sec 1300 usecs/op). Table size is 3322MB, with ~10%
updated, ~30% deleted tuples, and a single index. m_w_m is large enough to do
this in one pass. I used pg_prewarm() of another relation to ensure the
vacuumed table isn't in s_b (otherwise ringbuffers aren't doing anything).

s_b ringbuffer enabled time wal_syncs wal_sync_time
128MB 1 77797ms
128MB 0 13676ms 241 2538ms
8GB 1 72834ms 23976 51989ms
8GB 0 9544ms 150 1634ms

see [2]For the s_b 128MB case: for logs / stats of the 8GB run. All the data here is in the OS page
cache, so we don't even pay the real-price for reading the data multiple
times.

On cloud hardware with higher fsync latency I've seen > 15x time differences
between using the ringbuffers and avoiding them by using pg_prewarm.

Of course there's a good reason we have the ringbuffer - we don't want
maintenance operations to completely disturb the buffer pool and harm the
production workload. But if, e.g., the database isn't available due to
anti-wraparound measures, there's no other workload to protect, and the
ringbuffer substantially reduces availability. Initial data loads could
similarly benefit.

Therefore I'd like to add an option to the VACUUM command to use to disable
the use of the ringbuffer. Not sure about the name yet.

I think we should auto-enable that mode once we're using the failsafe mode,
similar to [auto]vacuum cost delays getting disabled
(c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're
soon going to shut down, we want to be aggressive.

Greetings,

Andres Freund

[1]: according to pg_test_fsync: fdatasync 769.189 ops/sec 1300 usecs/op
fdatasync 769.189 ops/sec 1300 usecs/op

[2]: For the s_b 128MB case:

For the s_b 128MB case:

ringbuffers enabled:

2023-01-11 10:24:58.726 PST [355353][client backend][2/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:26:19.488 PST [355353][client backend][2/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0": index scans: 1
pages: 0 removed, 424975 remain, 424975 scanned (100.00% of total)
tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable
removable cutoff: 2981, which was 0 XIDs old when operation ended
new relfrozenxid: 2981, which is 102 XIDs ahead of previous value
frozen: 424975 pages from table (100.00% of total) had 6666700 tuples frozen
index scan needed: 424975 pages from table (100.00% of total) had 4325101 dead item identifiers removed
index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable
I/O timings: read: 2284.292 ms, write: 4325.009 ms
avg read rate: 83.203 MB/s, avg write rate: 83.199 MB/s
buffer usage: 425044 hits, 860113 misses, 860074 dirtied
WAL usage: 1709902 records, 434990 full page images, 2273501683 bytes
system usage: CPU: user: 11.62 s, system: 11.86 s, elapsed: 80.76 s

┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│ 8092795 │ 443356 │ 2999358740 │ 1569 │ 28651 │ 27081 │ 1874.391 │ 59895.674 │ 2023-01-11 10:24:58.664859-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘

ringbuffers disabled:

2023-01-11 10:23:05.081 PST [355054][client backend][2/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:23:18.755 PST [355054][client backend][2/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0": index scans: 1
pages: 0 removed, 424979 remain, 424979 scanned (100.00% of total)
tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable
removable cutoff: 2879, which was 0 XIDs old when operation ended
new relfrozenxid: 2879, which is 102 XIDs ahead of previous value
frozen: 424979 pages from table (100.00% of total) had 6666700 tuples frozen
index scan needed: 424979 pages from table (100.00% of total) had 4325176 dead item identifiers removed
index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable
I/O timings: read: 1247.366 ms, write: 2888.756 ms
avg read rate: 491.485 MB/s, avg write rate: 491.395 MB/s
buffer usage: 424927 hits, 860242 misses, 860083 dirtied
WAL usage: 1709918 records, 434994 full page images, 2273503049 bytes
system usage: CPU: user: 5.42 s, system: 6.26 s, elapsed: 13.67 s

┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│ 8092963 │ 443362 │ 2999373996 │ 212190 │ 212333 │ 241 │ 1209.516 │ 2538.706 │ 2023-01-11 10:23:05.004783-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘

For the s_b 8GB case:

ringbuffers enabled:

2023-01-11 10:04:12.479 PST [352665][client backend][2/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:05:25.312 PST [352665][client backend][2/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0": index scans: 1
pages: 0 removed, 424977 remain, 424977 scanned (100.00% of total)
tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable
removable cutoff: 2675, which was 0 XIDs old when operation ended
new relfrozenxid: 2675, which is 102 XIDs ahead of previous value
frozen: 424977 pages from table (100.00% of total) had 6666700 tuples frozen
index scan needed: 424977 pages from table (100.00% of total) had 4325066 dead item identifiers removed
index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable
I/O timings: read: 2610.875 ms, write: 4177.842 ms
avg read rate: 81.688 MB/s, avg write rate: 87.515 MB/s
buffer usage: 523611 hits, 761552 misses, 815868 dirtied
WAL usage: 1709910 records, 434992 full page images, 2273502729 bytes
system usage: CPU: user: 11.00 s, system: 11.86 s, elapsed: 72.83 s
┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│ 8092707 │ 443358 │ 2999354090 │ 42259 │ 66227 │ 23976 │ 2050.963 │ 51989.099 │ 2023-01-11 10:04:12.404054-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘

ringbuffers disabled:

2023-01-11 10:08:48.414 PST [353287][client backend][3/19:0][psql] INFO: aggressively vacuuming "postgres.public.copytest_0"
2023-01-11 10:08:57.956 PST [353287][client backend][3/19:0][psql] INFO: finished vacuuming "postgres.public.copytest_0": index scans: 1
pages: 0 removed, 424977 remain, 424977 scanned (100.00% of total)
tuples: 4333300 removed, 6666700 remain, 0 are dead but not yet removable
removable cutoff: 2777, which was 0 XIDs old when operation ended
new relfrozenxid: 2777, which is 102 XIDs ahead of previous value
frozen: 424977 pages from table (100.00% of total) had 6666700 tuples frozen
index scan needed: 424976 pages from table (100.00% of total) had 4325153 dead item identifiers removed
index "copytest_0_id_idx": pages: 10032 in total, 0 newly deleted, 0 currently deleted, 0 reusable
I/O timings: read: 1040.230 ms, write: 0.000 ms
avg read rate: 312.016 MB/s, avg write rate: 356.242 MB/s
buffer usage: 904078 hits, 381084 misses, 435101 dirtied
WAL usage: 1709908 records, 434992 full page images, 2273499663 bytes
system usage: CPU: user: 5.57 s, system: 2.26 s, elapsed: 9.54 s

┌─────────────┬─────────┬────────────┬──────────────────┬───────────┬──────────┬────────────────┬───────────────┬───────────────────────────────┐
│ wal_records │ wal_fpi │ wal_bytes │ wal_buffers_full │ wal_write │ wal_sync │ wal_write_time │ wal_sync_time │ stats_reset │
├─────────────┼─────────┼────────────┼──────────────────┼───────────┼──────────┼────────────────┼───────────────┼───────────────────────────────┤
│ 8092933 │ 443358 │ 2999364596 │ 236354 │ 236398 │ 150 │ 1166.314 │ 1634.408 │ 2023-01-11 10:08:48.350328-08 │
└─────────────┴─────────┴────────────┴──────────────────┴───────────┴──────────┴────────────────┴───────────────┴───────────────────────────────┘

In reply to: Andres Freund (#1)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Wed, Jan 11, 2023 at 10:27 AM Andres Freund <andres@anarazel.de> wrote:

Therefore I'd like to add an option to the VACUUM command to use to disable
the use of the ringbuffer. Not sure about the name yet.

Sounds like a good idea.

I think we should auto-enable that mode once we're using the failsafe mode,
similar to [auto]vacuum cost delays getting disabled
(c.f. lazy_check_wraparound_failsafe()). If things are bad enough that we're
soon going to shut down, we want to be aggressive.

+1

--
Peter Geoghegan

#3Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 10:27:20 -0800, Andres Freund wrote:

On cloud hardware with higher fsync latency I've seen > 15x time differences
between using the ringbuffers and avoiding them by using pg_prewarm.

A slightly edited version of what I've in the past to defeat the ringbuffers
using pg_prewarm, as I think it might be useful for others:

WITH what_rel AS (
SELECT 'copytest_0'::regclass AS vacuum_me
),
what_to_prefetch AS (
SELECT vacuum_me, greatest(heap_blks_total - 1, 0) AS last_block,
CASE WHEN phase = 'scanning heap' THEN heap_blks_scanned ELSE heap_blks_vacuumed END AS current_pos
FROM what_rel, pg_stat_progress_vacuum
WHERE relid = vacuum_me AND phase IN ('scanning heap', 'vacuuming heap')
)
SELECT
vacuum_me, current_pos,
pg_prewarm(vacuum_me, 'buffer', 'main', current_pos, least(current_pos + 10000, last_block))
FROM what_to_prefetch
\watch 0.1

Having this running in the background brings the s_b=128MB, ringbuffer enabled
case down from 77797ms to 14838ms. Close to the version with the ringbuffer
disabled.

Unfortunately, afaik, that trick isn't currently possible for the index vacuum
phase, as we don't yet expose the current scan position. And not every index
might be as readily prefetchable as just prefetching the next 10k blocks from
the current position.

That's not too bad if your indexes are small, but unfortunately that's not
always the case...

Greetings,

Andres Freund

#4Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#2)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 10:35:19 -0800, Peter Geoghegan wrote:

On Wed, Jan 11, 2023 at 10:27 AM Andres Freund <andres@anarazel.de> wrote:

Therefore I'd like to add an option to the VACUUM command to use to disable
the use of the ringbuffer. Not sure about the name yet.

Sounds like a good idea.

Any idea about the name? The obvious thing is to reference ring buffers in the
option name, but that's more of an implementation detail...

Some ideas:

USE_RING_BUFFERS on|off
SCAN_PROTECTION on|off
REUSE_BUFFERS on|off
LIMIT_BUFFER_USAGE on|off

Regards,

Andres

In reply to: Andres Freund (#4)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Wed, Jan 11, 2023 at 10:58 AM Andres Freund <andres@anarazel.de> wrote:

Any idea about the name? The obvious thing is to reference ring buffers in the
option name, but that's more of an implementation detail...

What are the chances that anybody using this feature via a manual
VACUUM command will also use INDEX_CLEANUP off? It's not really
supposed to be used routinely, at all. Right? It's just for
emergencies.

Perhaps it can be tied to INDEX_CLEANUP=off? That makes it hard to get
just the behavior you want when testing VACUUM, but maybe that doesn't
matter.

Realistically, most of the value here comes from changing the failsafe
behavior, which doesn't require the user to know anything about
VACUUM. I know that AWS has reduced the vacuum_failsafe_age default on
RDS to 1.2 billion (a decision made before I joined Amazon), so it is
already something AWS lean on quite a bit where available.

--
Peter Geoghegan

#6Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#5)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 11:06:26 -0800, Peter Geoghegan wrote:

On Wed, Jan 11, 2023 at 10:58 AM Andres Freund <andres@anarazel.de> wrote:

Any idea about the name? The obvious thing is to reference ring buffers in the
option name, but that's more of an implementation detail...

What are the chances that anybody using this feature via a manual
VACUUM command will also use INDEX_CLEANUP off? It's not really
supposed to be used routinely, at all. Right? It's just for
emergencies.

I think it's also quite useful for e.g. vacuuming after initial data loads or
if you need to do a first vacuum after a lot of bloat accumulated due to a
stuck transaction.

Perhaps it can be tied to INDEX_CLEANUP=off? That makes it hard to get
just the behavior you want when testing VACUUM, but maybe that doesn't
matter.

I don't like that - it's also quite useful to disable use of ringbuffers when
you actually need to clean up indexes. Especially when we have a lot of dead
tuples we'll rescan indexes over and over...

Greetings,

Andres Freund

In reply to: Andres Freund (#6)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote:

I don't like that - it's also quite useful to disable use of ringbuffers when
you actually need to clean up indexes. Especially when we have a lot of dead
tuples we'll rescan indexes over and over...

That's a fair point.

My vote goes to "REUSE_BUFFERS", then.

--
Peter Geoghegan

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Andres Freund (#4)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote:

Hi,

On 2023-01-11 10:35:19 -0800, Peter Geoghegan wrote:

On Wed, Jan 11, 2023 at 10:27 AM Andres Freund <andres@anarazel.de> wrote:

Therefore I'd like to add an option to the VACUUM command to use to disable
the use of the ringbuffer. Not sure about the name yet.

Sounds like a good idea.

Any idea about the name? The obvious thing is to reference ring buffers in the
option name, but that's more of an implementation detail...

Some ideas:

USE_RING_BUFFERS on|off
REUSE_BUFFERS on|off

+1 for either of these.

I don't think it's an issue to expose implementation details here.
Anyone who wants to change this will know about the implementation
details that they're changing, and it's better to refer to it by the
same/similar name and not by some other name that's hard to find.

BTW I can't see that the ring buffer is currently exposed in any
user-facing docs for COPY/ALTER/VACUUM/CREATE ?

--
Justin

#9Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#8)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 14:38:34 -0600, Justin Pryzby wrote:

On Wed, Jan 11, 2023 at 10:58:54AM -0800, Andres Freund wrote:

Some ideas:

USE_RING_BUFFERS on|off
REUSE_BUFFERS on|off

+1 for either of these.

Then I'd go for REUSE_BUFFERS. What made you prefer it over
LIMIT_BUFFER_USAGE?

USE_BUFFER_ACCESS_STRATEGY would be a name tied to the implementation that's
not awful, I think..

I don't think it's an issue to expose implementation details here.
Anyone who wants to change this will know about the implementation
details that they're changing, and it's better to refer to it by the
same/similar name and not by some other name that's hard to find.

A ringbuffer could refer to a lot of things other than something limiting
buffer usage, that's why I don't like it.

BTW I can't see that the ring buffer is currently exposed in any
user-facing docs for COPY/ALTER/VACUUM/CREATE ?

Yea, there's surprisingly little in the docs about it, given how much it
influences behaviour. It's mentioned in tablesample-method.sgml, but without
explanation - and it's a page documenting C API...

Greetings,

Andres Freund

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#7)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote:

I don't like that - it's also quite useful to disable use of ringbuffers when
you actually need to clean up indexes. Especially when we have a lot of dead
tuples we'll rescan indexes over and over...

That's a fair point.

My vote goes to "REUSE_BUFFERS", then.

I wonder whether it could make sense to allow a larger ringbuffer size,
rather than just the limit cases of "on" and "off".

regards, tom lane

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 16:18:34 -0500, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de> wrote:

I don't like that - it's also quite useful to disable use of ringbuffers when
you actually need to clean up indexes. Especially when we have a lot of dead
tuples we'll rescan indexes over and over...

That's a fair point.

My vote goes to "REUSE_BUFFERS", then.

I wonder whether it could make sense to allow a larger ringbuffer size,
rather than just the limit cases of "on" and "off".

I can see that making sense, particularly if we were to later extend this to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading of
data > 16MB but also << s_b vastly slower, but it can still be very important
to use if there's lots of parallel processes loading data.

Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
default value, 0 preventing use of a buffer access strategy, and 1..N
indicating the size in blocks?

Would we want to set an upper limit lower than implied by the memory limit for
the BufferAccessStrategy allocation?

Greetings,

Andres Freund

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Andres Freund (#11)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Wed, Jan 11, 2023 at 2:39 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-01-11 16:18:34 -0500, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de>

wrote:

I don't like that - it's also quite useful to disable use of

ringbuffers when

you actually need to clean up indexes. Especially when we have a lot

of dead

tuples we'll rescan indexes over and over...

That's a fair point.

My vote goes to "REUSE_BUFFERS", then.

I wonder whether it could make sense to allow a larger ringbuffer size,
rather than just the limit cases of "on" and "off".

I can see that making sense, particularly if we were to later extend this
to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
of
data > 16MB but also << s_b vastly slower, but it can still be very
important
to use if there's lots of parallel processes loading data.

Should we just add "ring_buffers" to the existing "shared_buffers" and
"temp_buffers" settings?

Then give VACUUM a (BUFFER_POOL=ring*|shared) option?

I think making DBAs aware of this dynamic and making the ring buffer usage
user-facing is beneficial in its own right (at least, the concept that
changes done by vacuum don't impact shared_buffers, regardless of how that
non-impact manifests). But I don't see much benefit trying to come up with
a different name.

David J.

#13Andres Freund
andres@anarazel.de
In reply to: David G. Johnston (#12)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:

Should we just add "ring_buffers" to the existing "shared_buffers" and
"temp_buffers" settings?

The different types of ring buffers have different sizes, for good reasons. So
I don't see that working well. I also think it'd be more often useful to
control this on a statement basis - if you have a parallel import tool that
starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of
course each session can change the ring buffer settings, but still.

Then give VACUUM a (BUFFER_POOL=ring*|shared) option?

That seems likely to mislead, because it'd still use shared buffers when the
blocks are already present. The ring buffers aren't a separate buffer pool,
they're a subset of the normal bufferpool. Lookup is done normally, only when
a page isn't found, the search for a victim buffer first tries to use a buffer
from the ring.

I think making DBAs aware of this dynamic and making the ring buffer usage
user-facing is beneficial in its own right (at least, the concept that
changes done by vacuum don't impact shared_buffers, regardless of how that
non-impact manifests).

VACUUM can end up dirtying all of shared buffers, even with the ring buffer in
use...

Greetings,

Andres Freund

#14Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#11)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

So, I attached a rough implementation of both the autovacuum failsafe
reverts to shared buffers and the vacuum option (no tests or docs or
anything).

The first three patches in the set are just for enabling use of shared
buffers in failsafe mode for autovacuum. I haven't actually ensured it
works (i.e. triggering failsafe mode and checking the stats for whether
or not shared buffers were used).

I was wondering about the status of the autovacuum wraparound failsafe
test suggested in [1]/messages/by-id/CAB8KJ=j1b3kscX8Cg5G=Q39ZQsv2x4URXsuTueJLz=fcvJ3eoQ@mail.gmail.com. I don't see it registered for the March's
commitfest. I'll probably review it since it will be useful for this
patchset.

The first patch in the set is to free the BufferAccessStrategy object
that is made in do_autovacuum() -- I don't see when the memory context
it is allocated in is destroyed, so it seems like it might be a leak?

The last patch in the set is a trial implementation of the VACUUM option
suggested -- BUFFER_USAGE_LIMIT. More on that below.

On Wed, Jan 11, 2023 at 4:39 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-01-11 16:18:34 -0500, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

On Wed, Jan 11, 2023 at 11:18 AM Andres Freund <andres@anarazel.de>

wrote:

I don't like that - it's also quite useful to disable use of

ringbuffers when

you actually need to clean up indexes. Especially when we have a lot

of dead

tuples we'll rescan indexes over and over...

That's a fair point.

My vote goes to "REUSE_BUFFERS", then.

I wonder whether it could make sense to allow a larger ringbuffer size,
rather than just the limit cases of "on" and "off".

I can see that making sense, particularly if we were to later extend this
to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
of
data > 16MB but also << s_b vastly slower, but it can still be very
important
to use if there's lots of parallel processes loading data.

Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
default value, 0 preventing use of a buffer access strategy, and 1..N
indicating the size in blocks?

I have found the implementation you suggested very hard to use.
The attached fourth patch in the set implements it the way you suggest.
I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
since I don't specify shared buffers in units of nbuffer, it's pretty
annoying to have to figure out a valid number.

I think that it would be better to have it be either a percentage of
shared buffers or a size in units of bytes/kb/mb like that of shared
buffers.

Using a fraction or percentage appeals to me because you don't need to
reference your shared buffers setting and calculate what size you want
to set it to. Also, parsing the size in different units sounds like more
work.

Unfortunately, the fraction doesn't really work if we cap the ring size
of a buffer access strategy to NBuffers / 8. Also, there are other
issues like what would 0% and 100% mean.

I have a list of other questions, issues, and TODOs related to the code
I wrote to implement BUFFER_USAGE_LIMIT, but I'm not sure those are
worth discussing until we shape up the interface.

Would we want to set an upper limit lower than implied by the memory limit
for
the BufferAccessStrategy allocation?

So, I was wondering what you thought about NBuffers / 8 (the current
limit). Does it make sense?

If we clamp the user-specified value to this, I think we definitely need
to inform them through some kind of logging or message. I am sure there
are lots of other gucs doing this -- do you know any off the top of your
head?

- Melanie

[1]: /messages/by-id/CAB8KJ=j1b3kscX8Cg5G=Q39ZQsv2x4URXsuTueJLz=fcvJ3eoQ@mail.gmail.com
/messages/by-id/CAB8KJ=j1b3kscX8Cg5G=Q39ZQsv2x4URXsuTueJLz=fcvJ3eoQ@mail.gmail.com

Attachments:

v1-0003-use-shared-buffers-when-failsafe-active.patchtext/x-patch; charset=US-ASCII; name=v1-0003-use-shared-buffers-when-failsafe-active.patchDownload+5-1
v1-0001-dont-leak-strategy-object.patchtext/x-patch; charset=US-ASCII; name=v1-0001-dont-leak-strategy-object.patchDownload+2-1
v1-0002-remove-global-variable-vac_strategy.patchtext/x-patch; charset=US-ASCII; name=v1-0002-remove-global-variable-vac_strategy.patchDownload+7-10
v1-0004-add-vacuum-option-to-specify-nbuffers.patchtext/x-patch; charset=US-ASCII; name=v1-0004-add-vacuum-option-to-specify-nbuffers.patchDownload+127-26
#15Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#14)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Hi,

On 2023-02-22 16:32:53 -0500, Melanie Plageman wrote:

I was wondering about the status of the autovacuum wraparound failsafe
test suggested in [1]. I don't see it registered for the March's
commitfest. I'll probably review it since it will be useful for this
patchset.

It's pretty hard to make it work reliably. I was suggesting somewhere that we
ought to add a EMERGENCY parameter to manual VACUUMs to allow testing that
path a tad more easily.

The first patch in the set is to free the BufferAccessStrategy object
that is made in do_autovacuum() -- I don't see when the memory context
it is allocated in is destroyed, so it seems like it might be a leak?

The backend shuts down just after, so that's not a real issue. Not that it'd
hurt to fix it.

I can see that making sense, particularly if we were to later extend this
to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
of
data > 16MB but also << s_b vastly slower, but it can still be very
important
to use if there's lots of parallel processes loading data.

Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
default value, 0 preventing use of a buffer access strategy, and 1..N
indicating the size in blocks?

I have found the implementation you suggested very hard to use.
The attached fourth patch in the set implements it the way you suggest.
I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
since I don't specify shared buffers in units of nbuffer, it's pretty
annoying to have to figure out a valid number.

I think we should be able to parse it in a similar way to how we parse
shared_buffers. You could even implement this as a GUC that is then set by
VACUUM (similar to how VACUUM FREEZE is implemented).

I think that it would be better to have it be either a percentage of
shared buffers or a size in units of bytes/kb/mb like that of shared
buffers.

I don't think a percentage of shared buffers works particularly well - you
very quickly run into the ringbuffer becoming impractically big.

Would we want to set an upper limit lower than implied by the memory limit
for
the BufferAccessStrategy allocation?

So, I was wondering what you thought about NBuffers / 8 (the current
limit). Does it make sense?

That seems *way* too big. Imagine how large allocations we'd end up with a
shared_buffers size of a few TB.

I'd probably make it a hard error at 1GB and a silent cap at NBuffers / 2 or
such.

@@ -547,7 +547,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
"amcheck context",
ALLOCSET_DEFAULT_SIZES);
-	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
+	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD, -1);

/* Get true root block from meta-page */
metapage = palloc_btree_page(state, BTREE_METAPAGE);

Changing this everywhere seems pretty annoying, particularly because I suspect
a bunch of extensions also use GetAccessStrategy(). How about a
GetAccessStrategyExt(), GetAccessStrategyCustomSize() or such?

BufferAccessStrategy
-GetAccessStrategy(BufferAccessStrategyType btype)
+GetAccessStrategy(BufferAccessStrategyType btype, int buffers)
{
BufferAccessStrategy strategy;
int			ring_size;
+	const char *strategy_name = btype_get_name(btype);

Shouldn't be executed when we don't need it.

+	if (btype != BAS_VACUUM)
+	{
+		if (buffers == 0)
+			elog(ERROR, "Use of shared buffers unsupported for buffer access strategy: %s. nbuffers must be -1.",
+					strategy_name);
+
+		if (buffers > 0)
+			elog(ERROR, "Specification of ring size in buffers unsupported for buffer access strategy: %s. nbuffers must be -1.",
+					strategy_name);
+	}
+
+	// TODO: DEBUG logging message for dev?
+	if (buffers == 0)
+		btype = BAS_NORMAL;

GetAccessStrategy() often can be executed hundreds of thousands of times a
second, so I'm very sceptical that adding log messages to it useful.

Greetings,

Andres Freund

#16Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Andres Freund (#13)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:

Should we just add "ring_buffers" to the existing "shared_buffers" and
"temp_buffers" settings?

The different types of ring buffers have different sizes, for good reasons. So
I don't see that working well. I also think it'd be more often useful to
control this on a statement basis - if you have a parallel import tool that
starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of
course each session can change the ring buffer settings, but still.

How about having GUCs for each ring buffer (bulk_read_ring_buffers,
bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)?
These options can help especially when statement level controls aren't
easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If
needed users can also set them at the system level. For instance, one
can set bulk_write_ring_buffers to other than 16MB or -1 to disable
the ring buffer to use shared_buffers and run a bunch of bulk write
queries.

Although I'm not quite opposing the idea of statement level controls
(like the VACUUM one proposed here), it is better to make these ring
buffer sizes configurable across the system to help with the other
similar cases e.g., a CTAS or RMV can help subsequent reads from
shared buffers if ring buffer is skipped.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#17Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Melanie Plageman (#14)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Thu, Feb 23, 2023 at 3:03 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Hi,

So, I attached a rough implementation of both the autovacuum failsafe
reverts to shared buffers and the vacuum option (no tests or docs or
anything).

Thanks for the patches. I have some comments.

0001:
1. I don't quite understand the need for this 0001 patch. Firstly,
bstrategy allocated once per autovacuum worker in AutovacMemCxt which
goes away with the process. Secondly, the worker exits after
do_autovacuum() with which memory context is gone. I think this patch
is unnecessary unless I'm missing something.

0002:
1. Don't we need to remove vac_strategy for analyze.c as well? It's
pretty-meaningless there than vacuum.c as we're passing bstrategy to
all required functions.

0004:
1. I think no multiple sentences in a single error message. How about
"of %d, changing it to %d"?
+ elog(WARNING, "buffer_usage_limit %d is below the
minimum buffer_usage_limit of %d. setting it to %d",

2. Typically, postgres error messages start with lowercase letters,
hints and detail messages start with uppercase letters.
+        if (buffers == 0)
+            elog(ERROR, "Use of shared buffers unsupported for buffer
access strategy: %s. nbuffers must be -1.",
+                    strategy_name);
+
+        if (buffers > 0)
+            elog(ERROR, "Specification of ring size in buffers
unsupported for buffer access strategy: %s. nbuffers must be -1.",
+                    strategy_name);
3. A function for this seems unnecessary, especially when a static
array would do the needful, something like forkNames[].
+static const char *
+btype_get_name(BufferAccessStrategyType btype)
+{
+    switch (btype)
+    {
4. Why are these assumptions needed? Can't we simplify by doing
validations on the new buffers parameter only when the btype is
BAS_VACUUM?
+        if (buffers == 0)
+            elog(ERROR, "Use of shared buffers unsupported for buffer
access strategy: %s. nbuffers must be -1.",
+                    strategy_name);
+    // TODO: DEBUG logging message for dev?
+    if (buffers == 0)
+        btype = BAS_NORMAL;
5. Is this change needed for this patch?
         default:
             elog(ERROR, "unrecognized buffer access strategy: %d",
-                 (int) btype);
-            return NULL;        /* keep compiler quiet */
+                    (int) btype);
+
+        pg_unreachable();

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#18Melanie Plageman
melanieplageman@gmail.com
In reply to: Bharath Rupireddy (#17)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

Thank you to all reviewers!

This email is in answer to the three reviews
done since my last version. Attached is v2 and inline below is replies
to all the review comments.

The main difference between this version and the previous version is
that I've added a guc, buffer_usage_limit and the VACUUM option
BUFFER_USAGE_LIMIT is now to be specified in size (like kB, MB, etc).

I currently only use the guc value for VACUUM, but it is meant to be
used for all buffer access strategies and is configurable at the session
level.

I would prefer that we had the option of resizing the buffer access
strategy object per table being autovacuumed. Since autovacuum reloads
the config file between tables, this would be quite possible.

I started implementing this, but stopped because the code is not really
in a good state for that.

In fact, I'm not very happy with my implementation at all because I
think given the current structure of vacuum() and vacuum_rel(), it will
potentially make the code more confusing.

I don't like how autovacuum and vacuum use vacuum_rel() and vacuum()
differently (autovacuum always calls vacuum() with a list containing a
single relation). And vacuum() takes buffer access strategy as a
parameter, supposedly so that autovacuum can change the buffer access
strategy object per call, but it doesn't do that. And then vacuum() and
vacuum_rel() go and access VacuumParams at various places with no rhyme
or reason -- seemingly just based on the random availability of other
objects whose state they would like to check on. So, IMO, in adding a
"buffers" parameter to VacuumParams, I am asking for confusion in
autovacuum code with table-level VacuumParams containing an value for
buffers that isn't used.

On Mon, Feb 27, 2023 at 4:21 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-02-22 16:32:53 -0500, Melanie Plageman wrote:

The first patch in the set is to free the BufferAccessStrategy object
that is made in do_autovacuum() -- I don't see when the memory context
it is allocated in is destroyed, so it seems like it might be a leak?

The backend shuts down just after, so that's not a real issue. Not that it'd
hurt to fix it.

I've dropped that patch from the set.

I can see that making sense, particularly if we were to later extend this
to
other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
of
data > 16MB but also << s_b vastly slower, but it can still be very
important
to use if there's lots of parallel processes loading data.

Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
default value, 0 preventing use of a buffer access strategy, and 1..N
indicating the size in blocks?

I have found the implementation you suggested very hard to use.
The attached fourth patch in the set implements it the way you suggest.
I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
since I don't specify shared buffers in units of nbuffer, it's pretty
annoying to have to figure out a valid number.

I think we should be able to parse it in a similar way to how we parse
shared_buffers. You could even implement this as a GUC that is then set by
VACUUM (similar to how VACUUM FREEZE is implemented).

in the attached v2, I've used parse_int() to do this.

I think that it would be better to have it be either a percentage of
shared buffers or a size in units of bytes/kb/mb like that of shared
buffers.

I don't think a percentage of shared buffers works particularly well - you
very quickly run into the ringbuffer becoming impractically big.

It is now a size.

Would we want to set an upper limit lower than implied by the memory limit
for
the BufferAccessStrategy allocation?

So, I was wondering what you thought about NBuffers / 8 (the current
limit). Does it make sense?

That seems *way* too big. Imagine how large allocations we'd end up with a
shared_buffers size of a few TB.

I'd probably make it a hard error at 1GB and a silent cap at NBuffers / 2 or
such.

Well, as I mentioned NBuffers / 8 is the current GetAccessStrategy()
cap.

In the attached patchset, I have introduced a hard cap of 16GB which is
enforced for the VACUUM option and for the buffer_usage_limit guc. I
kept the "silent cap" at NBuffers / 8 but am open to changing it to
NBuffers / 2 if we think it is okay for its silent cap to be different
than GetAccessStrategy()'s cap.

@@ -547,7 +547,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
"amcheck context",
ALLOCSET_DEFAULT_SIZES);
-     state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
+     state->checkstrategy = GetAccessStrategy(BAS_BULKREAD, -1);

/* Get true root block from meta-page */
metapage = palloc_btree_page(state, BTREE_METAPAGE);

Changing this everywhere seems pretty annoying, particularly because I suspect
a bunch of extensions also use GetAccessStrategy(). How about a
GetAccessStrategyExt(), GetAccessStrategyCustomSize() or such?

Yes, I don't know what I was thinking. Changed it to
GetAccessStrategyExt() -- though now I am thinking I don't like Ext and
want to change it.

BufferAccessStrategy
-GetAccessStrategy(BufferAccessStrategyType btype)
+GetAccessStrategy(BufferAccessStrategyType btype, int buffers)
{
BufferAccessStrategy strategy;
int                     ring_size;
+     const char *strategy_name = btype_get_name(btype);

Shouldn't be executed when we don't need it.

I got rid of it for now.

+     if (btype != BAS_VACUUM)
+     {
+             if (buffers == 0)
+                     elog(ERROR, "Use of shared buffers unsupported for buffer access strategy: %s. nbuffers must be -1.",
+                                     strategy_name);
+
+             if (buffers > 0)
+                     elog(ERROR, "Specification of ring size in buffers unsupported for buffer access strategy: %s. nbuffers must be -1.",
+                                     strategy_name);
+     }
+
+     // TODO: DEBUG logging message for dev?
+     if (buffers == 0)
+             btype = BAS_NORMAL;

GetAccessStrategy() often can be executed hundreds of thousands of times a
second, so I'm very sceptical that adding log messages to it useful.

So, in the case of vacuum and autovacuum, I don't see how
GetAccessStrategyExt() could be called hundreds of thousands of times a
second. It is not even called for each table being vacuumed -- it is
only called before vacuuming a list of tables.

On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:

Should we just add "ring_buffers" to the existing "shared_buffers" and
"temp_buffers" settings?

The different types of ring buffers have different sizes, for good reasons. So
I don't see that working well. I also think it'd be more often useful to
control this on a statement basis - if you have a parallel import tool that
starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of
course each session can change the ring buffer settings, but still.

How about having GUCs for each ring buffer (bulk_read_ring_buffers,
bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)?
These options can help especially when statement level controls aren't
easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If
needed users can also set them at the system level. For instance, one
can set bulk_write_ring_buffers to other than 16MB or -1 to disable
the ring buffer to use shared_buffers and run a bunch of bulk write
queries.

So, I've rebelled a bit and implemented a single guc,
buffer_usage_limit, in the attached patchset. Users can set it at the
session or system level or they can specify BUFFER_USAGE_LIMIT to
vacuum. It is the same size for all operations. By default all of this
would be the same as it is now.

The attached patchset does not use the guc for any operations except
VACUUM, though. I will add on another patch if people still feel
strongly that we cannot have a single guc. If the other operations use
this guc, I think we could get much of the same flexibility as having
multiple gucs by just being able to set it at the session level (or
having command options).

Although I'm not quite opposing the idea of statement level controls
(like the VACUUM one proposed here), it is better to make these ring
buffer sizes configurable across the system to help with the other
similar cases e.g., a CTAS or RMV can help subsequent reads from
shared buffers if ring buffer is skipped.

Yes, I've done both.

On Tue, Feb 28, 2023 at 3:52 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Feb 23, 2023 at 3:03 AM Melanie Plageman
0001:
1. I don't quite understand the need for this 0001 patch. Firstly,
bstrategy allocated once per autovacuum worker in AutovacMemCxt which
goes away with the process. Secondly, the worker exits after
do_autovacuum() with which memory context is gone. I think this patch
is unnecessary unless I'm missing something.

I've dropped this one.

0002:
1. Don't we need to remove vac_strategy for analyze.c as well? It's
pretty-meaningless there than vacuum.c as we're passing bstrategy to
all required functions.

So, it is a bit harder to remove it from analyze because acquire_func
func doesn't take the buffer access strategy as a parameter and
acquire_sample_rows uses the vac_context global variable to pass to
table_scan_analyze_next_block().

We could change acquire_func, but it looks like FDW uses it, so I'm not
sure. It would be more for consistency than as a performance win, as I
imagine analyze is less of a problem than vacuum (i.e. it is probably
reading fewer blocks and probably not dirtying them [unless it does
on-access pruning?]).

I haven't done this in the attached set.

0004:
1. I think no multiple sentences in a single error message. How about
"of %d, changing it to %d"?
+ elog(WARNING, "buffer_usage_limit %d is below the
minimum buffer_usage_limit of %d. setting it to %d",

I've removed this message, but if I put back a message about clamping, I
will remember this note.

2. Typically, postgres error messages start with lowercase letters,
hints and detail messages start with uppercase letters.
+        if (buffers == 0)
+            elog(ERROR, "Use of shared buffers unsupported for buffer
access strategy: %s. nbuffers must be -1.",
+                    strategy_name);
+
+        if (buffers > 0)
+            elog(ERROR, "Specification of ring size in buffers
unsupported for buffer access strategy: %s. nbuffers must be -1.",
+                    strategy_name);

Thanks! I've removed some of the error messages for now, but, for the
ones I kept, I tthink they are consistent now with this pattern.

3. A function for this seems unnecessary, especially when a static
array would do the needful, something like forkNames[].
+static const char *
+btype_get_name(BufferAccessStrategyType btype)
+{
+    switch (btype)
+    {

I've removed it for now.

4. Why are these assumptions needed? Can't we simplify by doing
validations on the new buffers parameter only when the btype is
BAS_VACUUM?
+        if (buffers == 0)
+            elog(ERROR, "Use of shared buffers unsupported for buffer
access strategy: %s. nbuffers must be -1.",
+                    strategy_name);
+    // TODO: DEBUG logging message for dev?
+    if (buffers == 0)
+        btype = BAS_NORMAL;

So, I've moved validation to the vacuum option parsing for the vacuum
option and am using the guc infrastructure to check min and max for the
guc value.

5. Is this change needed for this patch?
default:
elog(ERROR, "unrecognized buffer access strategy: %d",
-                 (int) btype);
-            return NULL;        /* keep compiler quiet */
+                    (int) btype);
+
+        pg_unreachable();

The pg_unreachable() is removed, as I've left GetAccessStrategy()
untouched.

- Melanie

Attachments:

v2-0003-add-vacuum-option-to-specify-nbuffers-and-guc.patchtext/x-patch; charset=US-ASCII; name=v2-0003-add-vacuum-option-to-specify-nbuffers-and-guc.patchDownload+134-4
v2-0002-use-shared-buffers-when-failsafe-active.patchtext/x-patch; charset=US-ASCII; name=v2-0002-use-shared-buffers-when-failsafe-active.patchDownload+5-1
v2-0001-remove-global-variable-vac_strategy.patchtext/x-patch; charset=US-ASCII; name=v2-0001-remove-global-variable-vac_strategy.patchDownload+7-10
#19Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#18)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Jan 12, 2023 at 6:06 AM Andres Freund <andres@anarazel.de> wrote:

On 2023-01-11 17:26:19 -0700, David G. Johnston wrote:

Should we just add "ring_buffers" to the existing "shared_buffers" and
"temp_buffers" settings?

The different types of ring buffers have different sizes, for good reasons. So
I don't see that working well. I also think it'd be more often useful to
control this on a statement basis - if you have a parallel import tool that
starts NCPU COPYs you'd want a smaller buffer than a single threaded COPY. Of
course each session can change the ring buffer settings, but still.

How about having GUCs for each ring buffer (bulk_read_ring_buffers,
bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)?
These options can help especially when statement level controls aren't
easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If
needed users can also set them at the system level. For instance, one
can set bulk_write_ring_buffers to other than 16MB or -1 to disable
the ring buffer to use shared_buffers and run a bunch of bulk write
queries.

In attached v3, I've changed the name of the guc from buffer_usage_limit
to vacuum_buffer_usage_limit, since it is only used for vacuum and
autovacuum.

I haven't added the other suggested strategy gucs, as those could easily
be done in a future patchset.

I've also changed GetAccessStrategyExt() to GetAccessStrategyWithSize()
-- similar to initArrayResultWithSize().

And I've added tab completion for BUFFER_USAGE_LIMIT so that it is
easier to try out my patch.

Most of the TODOs in the code are related to the question of how
autovacuum uses the guc vacuum_buffer_usage_limit. autovacuum creates
the buffer access strategy ring in do_autovacuum() before looping
through and vacuuming tables. It passes this strategy object on to
vacuum(). Since we reuse the same strategy object for all tables in a
given invocation of do_autovacuum(), only failsafe autovacuum would
change buffer access strategies. This is probably okay, but it does mean
that the table-level VacuumParams variable, ring_size, means something
different for autovacuum than vacuum. Autovacuum workers will always
have set it to -1. We won't ever reach code in vacuum() which relies on
VacuumParams->ring_size as long as autovacuum workers pass a non-NULL
BufferAccessStrategy object to vacuum(), though.

- Melanie

Attachments:

v3-0002-use-shared-buffers-when-failsafe-active.patchtext/x-patch; charset=US-ASCII; name=v3-0002-use-shared-buffers-when-failsafe-active.patchDownload+5-1
v3-0003-add-vacuum-option-to-specify-ring_size-and-guc.patchtext/x-patch; charset=US-ASCII; name=v3-0003-add-vacuum-option-to-specify-ring_size-and-guc.patchDownload+136-5
v3-0001-remove-global-variable-vac_strategy.patchtext/x-patch; charset=US-ASCII; name=v3-0001-remove-global-variable-vac_strategy.patchDownload+7-10
#20Justin Pryzby
pryzby@telsasoft.com
In reply to: Melanie Plageman (#19)
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote:

Subject: [PATCH v3 2/3] use shared buffers when failsafe active

+		/*
+		 * Assume the caller who allocated the memory for the
+		 * BufferAccessStrategy object will free it.
+		 */
+		vacrel->bstrategy = NULL;

This comment could use elaboration:

** VACUUM normally restricts itself to a small ring buffer; but in
failsafe mode, in order to process tables as quickly as possible, allow
the leaving behind large number of dirty buffers.

Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc

#define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var))))
+#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ)

Macros are normally be capitalized

It's a good idea to write "(bufsize)", in case someone passes "a+b".

@@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype)
+BufferAccessStrategy
+GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size)

Maybe it would make sense for GetAccessStrategy() to call
GetAccessStrategyWithSize(). Or maybe not.

+		{"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM,
+			gettext_noop("Sets the buffer pool size for operations employing a buffer access strategy."),

The description should mention vacuum, if that's the scope of the GUC's
behavior.

+#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring.
+				# -1 to use default,
+				# 0 to disable vacuum buffer access strategy and use shared buffers
+				# > 0 to specify size

If I'm not wrong, there's still no documentation about "ring buffers" or
postgres' "strategy". Which seems important to do for this patch, along
with other documentation.

This patch should add support in vacuumdb.c. And maybe a comment about
adding support there, since it's annoying when it the release notes one
year say "support VACUUM (FOO)" and then one year later say "support
vacuumdb --foo".

--
Justin

#21Ants Aasma
ants.aasma@cybertec.at
In reply to: Melanie Plageman (#19)
#22Melanie Plageman
melanieplageman@gmail.com
In reply to: Ants Aasma (#21)
#23Melanie Plageman
melanieplageman@gmail.com
In reply to: Justin Pryzby (#20)
#24Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#19)
#25Ants Aasma
ants.aasma@cybertec.at
In reply to: Melanie Plageman (#22)
#26Ants Aasma
ants.aasma@cybertec.at
In reply to: Melanie Plageman (#23)
#27Justin Pryzby
pryzby@telsasoft.com
In reply to: Melanie Plageman (#23)
#28Melanie Plageman
melanieplageman@gmail.com
In reply to: Masahiko Sawada (#24)
#29Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#28)
#30Justin Pryzby
pryzby@telsasoft.com
In reply to: Melanie Plageman (#28)
#31Melanie Plageman
melanieplageman@gmail.com
In reply to: Justin Pryzby (#30)
#32Melanie Plageman
melanieplageman@gmail.com
In reply to: Ants Aasma (#25)
#33Ants Aasma
ants.aasma@cybertec.at
In reply to: Melanie Plageman (#32)
#34Melanie Plageman
melanieplageman@gmail.com
In reply to: Ants Aasma (#33)
#35David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#31)
#36Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#35)
#37David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#36)
#38Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#37)
#39David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#38)
#40Justin Pryzby
pryzby@telsasoft.com
In reply to: David Rowley (#39)
#41Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#39)
#42Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#41)
#43Justin Pryzby
pryzby@telsasoft.com
In reply to: Melanie Plageman (#42)
#44Melanie Plageman
melanieplageman@gmail.com
In reply to: Justin Pryzby (#43)
#45David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#41)
#46David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#44)
#47Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#46)
#48David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#47)
#49Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#48)
#50David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#49)
In reply to: David Rowley (#50)
#52Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#49)
#53Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#52)
#54Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#52)
#55Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#54)
#56David Rowley
dgrowleyml@gmail.com
In reply to: Peter Geoghegan (#51)
In reply to: David Rowley (#56)
#58Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#55)
#59Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#58)
#60David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#59)
#61Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#60)
#62David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#60)
#63Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#61)
#64David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#63)
#65Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#64)
#66David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#65)
#67Justin Pryzby
pryzby@telsasoft.com
In reply to: David Rowley (#64)
#68Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#66)
#69David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#68)
#70Peter Eisentraut
peter_e@gmx.net
In reply to: David Rowley (#69)
#71David Rowley
dgrowleyml@gmail.com
In reply to: Peter Eisentraut (#70)
#72David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#71)
#73Melanie Plageman
melanieplageman@gmail.com
In reply to: David Rowley (#71)
#74David Rowley
dgrowleyml@gmail.com
In reply to: Melanie Plageman (#73)
#75Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David Rowley (#72)
#76David Rowley
dgrowleyml@gmail.com
In reply to: Masahiko Sawada (#75)
#77Daniel Gustafsson
daniel@yesql.se
In reply to: David Rowley (#76)
#78Melanie Plageman
melanieplageman@gmail.com
In reply to: Daniel Gustafsson (#77)
#79Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Melanie Plageman (#78)