Trigger more frequent autovacuums of heavy insert tables
Hi,
Because of the way autovacuum_vacuum_[insert]_scale_factor works,
autovacuums trigger less frequently as the relation gets larger.
See this math in relation_needs_vacanalyze:
vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
For an insert-only table, nearly all the unvacuumed pages will be
eligible to be set all-visible and many will be eligible to be set
all-frozen.
Because normal vacuums can skip all-visible pages, proactively setting
these pages all-visible by vacuuming them sooner often reduces IO
overhead, as they are more likely to still be in shared buffers the
sooner they are vacuumed after last being touched.
Vacuuming these pages more proactively and setting them frozen also
helps amortize the work of aggressive vacuums -- which often
negatively impact the performance of the system.
By considering only the unfrozen portion of the table when calculating
the vacuum insert threshold, we can trigger vacuums more proactively
on insert-heavy tables. This changes the definition of
insert_scale_factor to a percentage of "active" table size. The
attached patch does this.
I've estimated the unfrozen percentage of the table by adding a new
field to pg_class, relallfrozen, which is updated in the same places
as relallvisible.
As an example of this patch in action, I designed a benchmark in which
a table is bulk-loaded with 1 GB of data with COPY FREEZE. Then I run
a custom pgbench with four clients inserting 10 tuples per transaction
into the table for 1_000_000 transactions each.
Note that I configured Postgres to try and observe the effects of this
patch on a compressed timeline. At the bottom of the mail, I've
included details on all of the GUCs I set and why.
Over the course of the same number of transactions, master triggered 8
autovacuums of the table and the patch triggered 16.
With the patch, despite doing twice as many vacuums, autovacuum
workers did 10% fewer reads and 93% fewer writes.
At the end of the benchmark, the patched version of Postgres had
emitted twice as many FPIs as master.
More frequent vacuums means each vacuum scans fewer pages, but, more
interestingly, the first vacuum after a checkpoint is much more
efficient. With the patch, the first vacuum after a checkpoint emits
half as many FPIs. You can see that only 18 pages were newly dirtied.
So, with the patch, the pages being vacuumed are usually still in
shared buffers and still dirty.
Master
------
2024-10-22 13:53:14.293 EDT [3594] LOG: checkpoint starting: time
2024-10-22 13:53:27.849 EDT [3964] LOG: automatic vacuum of table "history"
pages: 0 removed, 753901 remain, 151589 scanned (20.11% of total)
I/O timings: read: 77.962 ms, write: 92.879 ms
avg read rate: 95.840 MB/s, avg write rate: 96.852 MB/s
buffer usage: 268318 hits, 35133 reads, 35504 dirtied
WAL usage: 218320 records, 98672 full page images, 71314906 bytes
Patch
-----
2024-10-22 13:48:43.951 EDT [1471] LOG: checkpoint starting: time
2024-10-22 13:48:59.741 EDT [1802] LOG: automatic vacuum of table "history"
pages: 0 removed, 774375 remain, 121367 scanned (15.67% of total)
I/O timings: read: 2.974 ms, write: 4.434 ms
avg read rate: 1.363 MB/s, avg write rate: 0.126 MB/s
buffer usage: 242817 hits, 195 reads, 18 dirtied
WAL usage: 121389 records, 49216 full page images, 34408291 bytes
While it is true that timing will change significantly from run to
run, I observed over many runs that the more frequent vacuums of the
table led to less overall overhead due to vacuuming pages before they
are evicted from shared buffers.
Below is a detailed description of the benchmark and Postgres configuration:
Benchmark
=========
Set these GUCs:
-- initial table data should fill shared buffers
shared_buffers=1GB
-- give us a chance to try and vacuum the table a bunch of times
autovacuum_naptime=2
-- all checkpoints should be triggered by timing
max/min_wal_size=150GB
-- let's get at least 1 checkpoint during the short benchmark
checkpoint_timeout='2min'
-- let's not be bottlenecked on WAL I/O
wal_buffers='128MB'
wal_compression='zstd'
-- let's get a lot of inserts done quickly
synchronous_commit='off'
-- let's not take too many breaks for vacuum delay
vacuum_cost_limit = 2000
-- so we can see what happened
log_checkpoints = on
log_autovacuum_min_duration=0
-- so we can get more stats
track_wal_io_timing=on
track_io_timing = on
First I created the table that you will see later in DDL and loaded it
by running pgbench in the same way as I do in the benchmark until
there was 1 GB of table data. Then I copied that out to a file
'history.data'
I included an index because the more up-to-date visibility map would
benefit index-only scans -- which you could add to the benchmark if
you want.
DDL
--
BEGIN;
DROP TABLE IF EXISTS history;
CREATE TABLE history(
id BIGINT,
client_id INT NOT NULL,
mtime TIMESTAMPTZ DEFAULT NOW(),
data TEXT);
COPY history FROM 'history.data' WITH (freeze on);
CREATE INDEX ON history(id);
COMMIT;
pgbench \
--random-seed=0 \
--no-vacuum \
-M prepared \
-c 4 \
-j 4 \
-t 1000000 \
-R 27000 \
-f- <<EOF
INSERT INTO history(id, client_id, data)
VALUES
(:client_id, :client_id, repeat('a', 90)),
(:client_id, :client_id, repeat('b', 90)),
(:client_id, :client_id, repeat('c', 90)),
(:client_id, :client_id, repeat('d', 90)),
(:client_id, :client_id, repeat('e', 90)),
(:client_id, :client_id, repeat('f', 90)),
(:client_id, :client_id, repeat('g', 90)),
(:client_id, :client_id, repeat('h', 90)),
(:client_id, :client_id, repeat('i', 90)),
(:client_id, :client_id, repeat('j', 90));
EOF
- Melanie
Attachments:
v1-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchapplication/octet-stream; name=v1-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchDownload+28-7
v1-0001-Add-relallfrozen-to-pg_class.patchapplication/octet-stream; name=v1-0001-Add-relallfrozen-to-pg_class.patchDownload+81-19
On Tue, Oct 22, 2024 at 3:12 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
The attached patch does this.
I realized that I broke relation_statistics_update(). Attached v2 is fixed.
I've estimated the unfrozen percentage of the table by adding a new
field to pg_class, relallfrozen, which is updated in the same places
as relallvisible.
While my relallfrozen column correctly appears in pg_class, I noticed
that it seems like catalog/pg_class_d.h did not have my column added
(this file is auto-generated), despite my adding relallfrozen to
catalog/pg_class.h. Is there something else I have to do when adding a
new column to pg_class?
At the end of the benchmark, the patched version of Postgres had
emitted twice as many FPIs as master.
This was meant to say the reverse -- _master_ did twice as many FPIs
as the patch
- Melanie
Attachments:
v2-0001-Add-relallfrozen-to-pg_class.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-relallfrozen-to-pg_class.patchDownload+90-23
v2-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchDownload+28-7
I really appreciate all the work to make vacuum better. Anything that helps
our problem of autovacuum not scaling well for large tables is a win.
I'm not overly familiar with this part of the code base, but here are some
questions/ideas:
+ /*
+ * Every block marked all-frozen in the VM must also be marked
+ * all-visible.
+ */
+ if (new_rel_allfrozen > new_rel_allvisible)
+ new_rel_allfrozen = new_rel_allvisible;
+
Maybe tweak either the comment, or the code, as I read that comment as
meaning:
if (new_rel_allfrozen > new_rel_allvisible)
new_ral_allvisible = new_rel_allfrozen;
+ /*
+ * If we are modifying relallvisible manually, it
is not clear
+ * what relallfrozen value would make sense.
Therefore, set it to
+ * -1, or unknown. It will be updated the next time
these fields
+ * are updated.
+ */
+ replaces[ncols] = Anum_pg_class_relallfrozen;
+ values[ncols] = Int32GetDatum(-1);
Do we need some extra checks later on when we are actually using this to
prevent negative numbers in the calculations? It's only going to make
pcnt_unfrozen something like 1.0001 but still might want to skip that.
In autovacuum.c, seems we could simplify some of the logic there to this?:
if (relpages > 0 && reltuples > 0) {
relallfrozen = classForm->relallfrozen;
relallvisible = classForm->relallvisible;
if (relallvisible > relpages)
relallvisible = relpages;
if (relallfrozen > relallvisible)
relallfrozen = relallvisible;
pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
}
vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor *
reltuples * pcnt_unfrozen;
Again, I'm not clear under what circumstances will relallvisible > relpages?
Cheers,
Greg
Thanks for the review!
On Thu, Oct 24, 2024 at 3:51 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
I really appreciate all the work to make vacuum better. Anything that helps our problem of autovacuum not scaling well for large tables is a win.
I'm not overly familiar with this part of the code base, but here are some questions/ideas:
+ /* + * Every block marked all-frozen in the VM must also be marked + * all-visible. + */ + if (new_rel_allfrozen > new_rel_allvisible) + new_rel_allfrozen = new_rel_allvisible; +Maybe tweak either the comment, or the code, as I read that comment as meaning:
if (new_rel_allfrozen > new_rel_allvisible)
new_ral_allvisible = new_rel_allfrozen;
I've updated it. An all-frozen block must also be all-visible. But not
all-visible blocks are all-frozen
+ /* + * If we are modifying relallvisible manually, it is not clear + * what relallfrozen value would make sense. Therefore, set it to + * -1, or unknown. It will be updated the next time these fields + * are updated. + */ + replaces[ncols] = Anum_pg_class_relallfrozen; + values[ncols] = Int32GetDatum(-1);Do we need some extra checks later on when we are actually using this to prevent negative numbers in the calculations? It's only going to make pcnt_unfrozen something like 1.0001 but still might want to skip that.
Great point! I've added this
In autovacuum.c, seems we could simplify some of the logic there to this?:
if (relpages > 0 && reltuples > 0) {
relallfrozen = classForm->relallfrozen;
relallvisible = classForm->relallvisible;if (relallvisible > relpages)
relallvisible = relpages;if (relallfrozen > relallvisible)
relallfrozen = relallvisible;pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
}
vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples * pcnt_unfrozen;
I've done something similar to this in attached v2.
Again, I'm not clear under what circumstances will relallvisible > relpages?
I think this is mostly if someone manually updated the relation stats,
so we clamp it for safety.
- Melanie
Attachments:
v3-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchDownload+39-7
v3-0001-Add-relallfrozen-to-pg_class.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Add-relallfrozen-to-pg_class.patchDownload+95-23
On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I've done something similar to this in attached v2.
This needed a rebase. See attached v4.
- Melanie
Attachments:
v4-0001-Add-relallfrozen-to-pg_class.patchapplication/octet-stream; name=v4-0001-Add-relallfrozen-to-pg_class.patchDownload+108-20
v4-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchapplication/octet-stream; name=v4-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchDownload+39-14
On Thu, Jan 16, 2025 at 4:43 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:I've done something similar to this in attached v2.
This needed a rebase. See attached v4.
Whoops -- docs didn't build. Attached v5.
- Melanie
Attachments:
v5-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchapplication/octet-stream; name=v5-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchDownload+39-14
v5-0001-Add-relallfrozen-to-pg_class.patchapplication/octet-stream; name=v5-0001-Add-relallfrozen-to-pg_class.patchDownload+108-20
On Thu, Jan 16, 2025 at 5:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Thu, Jan 16, 2025 at 4:43 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:I've done something similar to this in attached v2.
This needed a rebase. See attached v4.
Whoops -- docs didn't build. Attached v5.
Outside of the positive performance impact of vacuuming pages before
they go cold (detailed in my first email [1]/messages/by-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP+vWis=iz3YcomkB3NpYA--w@mail.gmail.com), there is also a
substantial positive effect with this patch for large tables with
substantial cold regions: fewer anti-wraparound vacuums and more
frequent normal/aggressive vacuums
With the default vacuum settings, you often see an append-only table
devolve to _only_ anti-wraparound vacuums after the first aggressive
vacuum. I ran an insert-only workload for an hour (with 32 clients and
synchronous commit off to maximize the amount of data inserted) with
the default vacuum settings. On master, after the first aggressive
vacuum, we do only anti-wraparound vacuums (and only two of these are
triggered). With the patch, after the first aggressive vacuum, 10 more
vacuums are triggered -- none of which are anti-wraparound vacuums.
I attached a chart comparing the autovacuums triggered on master vs
with the patch.
Besides the performance benefit of spreading the freezing work over
more normal vacuums (thereby disrupting foreground workloads less),
anti-wraparound vacuums are not auto canceled by DDL -- making them
more of a nuisance to users.
[1]: /messages/by-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP+vWis=iz3YcomkB3NpYA--w@mail.gmail.com
Attachments:
autovacuums.pngimage/png; name=autovacuums.pngDownload
Hi Melanie Plageman
Thank you for working on this ,Actually, there were two patches aimed at
optimizing vacuum-triggered processes, and one of them reached a consensus
and has been committed:https://commitfest.postgresql.org/52/5046/ ,
https://commitfest.postgresql.org/51/5395/, Maybe referring to the already
committed patch and setting a maximum value for vacuum_max_ins_threshold
would be more acceptable.
Thanks
On Thu, Feb 6, 2025 at 6:08 AM Melanie Plageman <melanieplageman@gmail.com>
wrote:
Show quoted text
On Thu, Jan 16, 2025 at 5:50 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:On Thu, Jan 16, 2025 at 4:43 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:I've done something similar to this in attached v2.
This needed a rebase. See attached v4.
Whoops -- docs didn't build. Attached v5.
Outside of the positive performance impact of vacuuming pages before
they go cold (detailed in my first email [1]), there is also a
substantial positive effect with this patch for large tables with
substantial cold regions: fewer anti-wraparound vacuums and more
frequent normal/aggressive vacuumsWith the default vacuum settings, you often see an append-only table
devolve to _only_ anti-wraparound vacuums after the first aggressive
vacuum. I ran an insert-only workload for an hour (with 32 clients and
synchronous commit off to maximize the amount of data inserted) with
the default vacuum settings. On master, after the first aggressive
vacuum, we do only anti-wraparound vacuums (and only two of these are
triggered). With the patch, after the first aggressive vacuum, 10 more
vacuums are triggered -- none of which are anti-wraparound vacuums.I attached a chart comparing the autovacuums triggered on master vs
with the patch.Besides the performance benefit of spreading the freezing work over
more normal vacuums (thereby disrupting foreground workloads less),
anti-wraparound vacuums are not auto canceled by DDL -- making them
more of a nuisance to users.[1]
/messages/by-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP+vWis=iz3YcomkB3NpYA--w@mail.gmail.com
Attached v6 is rebased over 306dc520b9dfd60
On Wed, Feb 5, 2025 at 8:54 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi Melanie Plageman
Thank you for working on this ,Actually, there were two patches aimed at optimizing vacuum-triggered processes, and one of them reached a consensus and has been committed:https://commitfest.postgresql.org/52/5046/ , https://commitfest.postgresql.org/51/5395/, Maybe referring to the already committed patch and setting a maximum value for vacuum_max_ins_threshold would be more acceptable.
We could add autovacuum_vacuum_insert_max_threshold, but with an
insert-only workload, we can expect that the cold data is being
frozen. By calculating the threshold based on unfrozen data, we are
effectively capping the threshold for inserted data without adding
another guc. If any of that data is being unfrozen via updates or
deletes, then the autovacuum_vacuum_max_threshold would apply.
Perhaps I'm missing a case where calculating the insert threshold on
unfrozen data would not act as a cap, in which case I could get on
board with a guc.
- Melanie
Attachments:
v6-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchapplication/x-patch; name=v6-0002-Trigger-more-frequent-autovacuums-with-relallfroz.patchDownload+39-14
v6-0001-Add-relallfrozen-to-pg_class.patchapplication/x-patch; name=v6-0001-Add-relallfrozen-to-pg_class.patchDownload+108-20
Hi
We could add autovacuum_vacuum_insert_max_threshold, but with an
insert-only workload, we can expect that the cold data is being
frozen. By calculating the threshold based on unfrozen data, we are
effectively capping the threshold for inserted data without adding
another guc. If any of that data is being unfrozen via updates or
deletes, then the autovacuum_vacuum_max_threshold would apply.
Perhaps I'm missing a case where calculating the insert threshold on
unfrozen data would not act as a cap, in which case I could get on
board with a guc.
Actually ,I like your solution. Even I think this formula could use that
pcnt_unfrozen parameter
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples *
pcnt_unfrozen;
Thanks
On Thu, Feb 6, 2025 at 11:42 PM Melanie Plageman <melanieplageman@gmail.com>
wrote:
Show quoted text
Attached v6 is rebased over 306dc520b9dfd60
On Wed, Feb 5, 2025 at 8:54 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi Melanie Plageman
Thank you for working on this ,Actually, there were two patches aimedat optimizing vacuum-triggered processes, and one of them reached a
consensus and has been committed:
https://commitfest.postgresql.org/52/5046/ ,
https://commitfest.postgresql.org/51/5395/, Maybe referring to the
already committed patch and setting a maximum value for
vacuum_max_ins_threshold would be more acceptable.We could add autovacuum_vacuum_insert_max_threshold, but with an
insert-only workload, we can expect that the cold data is being
frozen. By calculating the threshold based on unfrozen data, we are
effectively capping the threshold for inserted data without adding
another guc. If any of that data is being unfrozen via updates or
deletes, then the autovacuum_vacuum_max_threshold would apply.Perhaps I'm missing a case where calculating the insert threshold on
unfrozen data would not act as a cap, in which case I could get on
board with a guc.- Melanie
On Thu, 06 Feb 2025 at 10:42, Melanie Plageman <melanieplageman@gmail.com> wrote:
Attached v6 is rebased over 306dc520b9dfd60
On Wed, Feb 5, 2025 at 8:54 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi Melanie Plageman
Thank you for working on this ,Actually, there were two patches
aimed at optimizing vacuum-triggered processes, and one of them
reached a consensus and has been
committed:https://commitfest.postgresql.org/52/5046/ ,
https://commitfest.postgresql.org/51/5395/, Maybe referring to the
already committed patch and setting a maximum value for
vacuum_max_ins_threshold would be more acceptable.We could add autovacuum_vacuum_insert_max_threshold, but with an
insert-only workload, we can expect that the cold data is being
frozen. By calculating the threshold based on unfrozen data, we are
effectively capping the threshold for inserted data without adding
another guc. If any of that data is being unfrozen via updates or
deletes, then the autovacuum_vacuum_max_threshold would apply.Perhaps I'm missing a case where calculating the insert threshold on
unfrozen data would not act as a cap, in which case I could get on
board with a guc.
Make sense.
It appears that there is an incorrect indentation in the config.sgml file.
The <literal> is accidentally removed.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 38244409e3c..571c73668f9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8652,10 +8652,10 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</term>
<listitem>
<para>
- Specifies a fraction of the table size to add to
+ Specifies a fraction of the active (unfrozen) table size to add to
<varname>autovacuum_vacuum_insert_threshold</varname>
when deciding whether to trigger a <command>VACUUM</command>.
- The default is <literal>0.2</literal> (20% of table size).
+ The default is <literal>0.2</literal> (20% of active table size).
This parameter can only be set in the <filename>postgresql.conf</filename>
file or on the server command line;
but the setting can be overridden for individual tables by
--
Regrads,
Japin Li
On Tue, Oct 22, 2024 at 03:12:53PM -0400, Melanie Plageman wrote:
By considering only the unfrozen portion of the table when calculating
the vacuum insert threshold, we can trigger vacuums more proactively
on insert-heavy tables. This changes the definition of
insert_scale_factor to a percentage of "active" table size. The
attached patch does this.
I think this is a creative idea. My first reaction is to question whether
it makes send to have two strategies for this sort of thing:
autovacuum_vacuum_max_threshold for updates/deletes and this for inserts.
Perhaps we don't want to more aggressively clean up bloat (except for the
very largest tables via the hard cap), but we do want to more aggressively
mark newly-inserted tuples frozen. I'm curious what you think.
I've estimated the unfrozen percentage of the table by adding a new
field to pg_class, relallfrozen, which is updated in the same places
as relallvisible.
Wouldn't relallvisible be sufficient here? We'll skip all-visible pages
unless this is an anti-wraparound vacuum, at which point I would think the
insert threshold goes out the window.
More frequent vacuums means each vacuum scans fewer pages, but, more
interestingly, the first vacuum after a checkpoint is much more
efficient. With the patch, the first vacuum after a checkpoint emits
half as many FPIs. You can see that only 18 pages were newly dirtied.
So, with the patch, the pages being vacuumed are usually still in
shared buffers and still dirty.
Are you aware of any scenarios where your proposed strategy might make
things worse? From your test results, it sounds like these vacuums ought
to usually be relatively efficient, so sending insert-only tables to the
front of the line is normally okay, but maybe that's not always true.
--
nathan
On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Oct 22, 2024 at 03:12:53PM -0400, Melanie Plageman wrote:
By considering only the unfrozen portion of the table when calculating
the vacuum insert threshold, we can trigger vacuums more proactively
on insert-heavy tables. This changes the definition of
insert_scale_factor to a percentage of "active" table size. The
attached patch does this.I think this is a creative idea.
Indeed. I can't take much credit for it -- Andres suggested this
direction during an off-list conversation where I was complaining
about how difficult it was to benchmark my vacuum eager scanning patch
set [1] because normal vacuums were so rarely triggered for
insert-only tables after the first aggressive vacuum.
My first reaction is to question whether
it makes send to have two strategies for this sort of thing:
autovacuum_vacuum_max_threshold for updates/deletes and this for inserts.
Perhaps we don't want to more aggressively clean up bloat (except for the
very largest tables via the hard cap), but we do want to more aggressively
mark newly-inserted tuples frozen. I'm curious what you think.
The goal with insert-only tables is to set the whole page frozen in
the VM. So, the number of pages is more important than the total
number of tuples inserted. Whereas, with updates/deletes, it seems
like the total amount of garbage (# tuples) needing cleaning is more
important.
My intuition (maybe wrong) is that it is more common to have a bunch
of pages with a single (or few) updates/deletes than it is to have a
bunch of pages with a single insert. This patch is mostly meant to
trigger vacuums sooner on large insert-only or bulk loaded tables.
Though, it is more common to have a cluster of hot pages than
uniformly distributed updates and deletes...
I've estimated the unfrozen percentage of the table by adding a new
field to pg_class, relallfrozen, which is updated in the same places
as relallvisible.Wouldn't relallvisible be sufficient here? We'll skip all-visible pages
unless this is an anti-wraparound vacuum, at which point I would think the
insert threshold goes out the window.
It's a great question. There are a couple reasons why I don't think so.
I think this might lead to triggering vacuums too often for
insert-mostly tables. For those tables, the pages that are not
all-visible will largely be just those with data that is new since the
last vacuum. And if we trigger vacuums based off of the % not
all-visible, we might decrease the number of cases where we are able
to vacuum inserted data and freeze it the first time it is vacuumed --
thereby increasing the total amount of work.
As for your point about us skipping all-visible pages except in
anti-wraparound vacuums -- that's not totally true. Autovacuums
triggered by the insert or update/delete thresholds and not by
autovacuum_freeze_max_age can also be aggressive (that's based on
vacuum_freeze_table_age). Aggressive vacuums scan all-visible pages.
And we actually want to trigger more normal aggressive (non-anti-wrap)
vacuums because anti-wraparound vacuums are not canceled by
conflicting lock requests (like those needed by DDL) -- see
PROC_VACUUM_FOR_WRAPAROUND in ProcSleep().
We also scan a surprising number of all-visible pages in practice due
to SKIP_PAGES_THRESHOLD. I was pretty taken aback while testing [1]
how many all-visible pages we scan due to this optimization. And, I'm
planning on merging [1] in the next few days, so this will also
increase the number of all-visible pages scanned during normal
vacuums.
More frequent vacuums means each vacuum scans fewer pages, but, more
interestingly, the first vacuum after a checkpoint is much more
efficient. With the patch, the first vacuum after a checkpoint emits
half as many FPIs. You can see that only 18 pages were newly dirtied.
So, with the patch, the pages being vacuumed are usually still in
shared buffers and still dirty.Are you aware of any scenarios where your proposed strategy might make
things worse? From your test results, it sounds like these vacuums ought
to usually be relatively efficient, so sending insert-only tables to the
front of the line is normally okay, but maybe that's not always true.
So, of course they aren't exactly at the front of the line since we
autovacuum based on the order in pg_class. But, I suppose if you spend
a bunch of time vacuuming an insert-mostly table you previously would
have skipped instead of some other table -- that is effectively
prioritizing the insert-mostly tables.
For insert-only/mostly tables, what you are ideally doing is vacuuming
more frequently and handling a small number of pages each vacuum of
the relation, so it has a low performance impact. I suppose if you
only have a few autovacuum workers and an equal number of massive
insert-only tables, you could end up starving other actively updated
tables of vacuum resources. But, those insert-only tables would have
to be vacuumed eventually -- and I imagine that the impact of a
massive aggressive vacuum of all of the data in those tables would be
more disruptive than some extra bloat in your other tables.
I'd be interested if other people with more field experience can
imagine starvation scenarios that would be much worse with this patch.
What kinds of starvation scenarios do you normally see?
In terms of specific, dramatic differences in behavior (since this
wouldn't be hidden behind a guc) people might be surprised by how soon
tables start being vacuumed after a huge COPY FREEZE.
- Melanie
On Fri, Feb 07, 2025 at 02:21:07PM -0500, Melanie Plageman wrote:
On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
My first reaction is to question whether
it makes send to have two strategies for this sort of thing:
autovacuum_vacuum_max_threshold for updates/deletes and this for inserts.
Perhaps we don't want to more aggressively clean up bloat (except for the
very largest tables via the hard cap), but we do want to more aggressively
mark newly-inserted tuples frozen. I'm curious what you think.The goal with insert-only tables is to set the whole page frozen in
the VM. So, the number of pages is more important than the total
number of tuples inserted. Whereas, with updates/deletes, it seems
like the total amount of garbage (# tuples) needing cleaning is more
important.
I think this is a reasonable position. To be clear, I don't have a problem
with having different strategies, or even with swapping
autovacuum_vacuum_max_threshold with a similar change, if it's the right
thing to do. I just want to be able to articulate why they're different.
Wouldn't relallvisible be sufficient here? We'll skip all-visible pages
unless this is an anti-wraparound vacuum, at which point I would think the
insert threshold goes out the window.It's a great question. There are a couple reasons why I don't think so.
I think this might lead to triggering vacuums too often for
insert-mostly tables. For those tables, the pages that are not
all-visible will largely be just those with data that is new since the
last vacuum. And if we trigger vacuums based off of the % not
all-visible, we might decrease the number of cases where we are able
to vacuum inserted data and freeze it the first time it is vacuumed --
thereby increasing the total amount of work.
Rephrasing to make sure I understand correctly: you're saying that using
all-frozen would trigger less frequent insert vacuums, which would give us
a better chance of freezing more than more frequent insert vacuums
triggered via all-visible? My suspicion is that the difference would tend
to be quite subtle in practice, but I have no concrete evidence to back
that up.
--
nathan
On Fri, Feb 7, 2025 at 3:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Feb 07, 2025 at 02:21:07PM -0500, Melanie Plageman wrote:
On Fri, Feb 7, 2025 at 12:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
Wouldn't relallvisible be sufficient here? We'll skip all-visible pages
unless this is an anti-wraparound vacuum, at which point I would think the
insert threshold goes out the window.It's a great question. There are a couple reasons why I don't think so.
I think this might lead to triggering vacuums too often for
insert-mostly tables. For those tables, the pages that are not
all-visible will largely be just those with data that is new since the
last vacuum. And if we trigger vacuums based off of the % not
all-visible, we might decrease the number of cases where we are able
to vacuum inserted data and freeze it the first time it is vacuumed --
thereby increasing the total amount of work.Rephrasing to make sure I understand correctly: you're saying that using
all-frozen would trigger less frequent insert vacuums, which would give us
a better chance of freezing more than more frequent insert vacuums
triggered via all-visible? My suspicion is that the difference would tend
to be quite subtle in practice, but I have no concrete evidence to back
that up.
You understood me correctly.
As for relallfrozen, one of the justifications for adding it to
pg_class is actually for the visibility it would provide. We have no
way of knowing how many all-visible but not all-frozen pages there are
on users' systems without pg_visibility. If users had this
information, they could potentially tune their freeze-related settings
more aggressively. Regularly reading the whole visibility map with
pg_visibilitymap_summary() is pretty hard to justify on most
production systems. But querying pg_class every 10 minutes or
something is much more reasonable.
- Melanie
On Fri, Feb 07, 2025 at 03:57:49PM -0500, Melanie Plageman wrote:
As for relallfrozen, one of the justifications for adding it to
pg_class is actually for the visibility it would provide. We have no
way of knowing how many all-visible but not all-frozen pages there are
on users' systems without pg_visibility. If users had this
information, they could potentially tune their freeze-related settings
more aggressively. Regularly reading the whole visibility map with
pg_visibilitymap_summary() is pretty hard to justify on most
production systems. But querying pg_class every 10 minutes or
something is much more reasonable.
If we need it anyway, then I have no objections to using a freeze-related
metric for a freeze-related feature.
Okay, I'll actually look at the patches next...
--
nathan
On Fri, Feb 07, 2025 at 03:05:09PM -0600, Nathan Bossart wrote:
Okay, I'll actually look at the patches next...
Ugh, it's already been 10 days since I said that. A couple of thoughts on
0001:
I'm not sure I understand the reason for capping relallfrozen to
relallvisible. From upthread, I gather this is mostly to deal with manual
statistics manipulation, but my first reaction is that we should just let
those values be bogus. Is there something that fundamentally requires
relallfrozen to be <= relallvisible? These are only estimates, so I don't
think it would be that surprising for them to defy this expectation.
Should we allow manipulating relallfrozen like we do relallvisible? My
assumption is that would even be required for the ongoing statistics
import/export work.
Upthread, you mentioned that you weren't seeing relallfrozen in
pg_class_d.h. I checked on my machine and see it there as expected. Are
you still missing it?
--
nathan
On Mon, Feb 17, 2025 at 11:11 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
On Fri, Feb 07, 2025 at 03:05:09PM -0600, Nathan Bossart wrote:
Okay, I'll actually look at the patches next...
Thanks for taking a look!
I'm not sure I understand the reason for capping relallfrozen to
relallvisible. From upthread, I gather this is mostly to deal with manual
statistics manipulation, but my first reaction is that we should just let
those values be bogus. Is there something that fundamentally requires
relallfrozen to be <= relallvisible? These are only estimates, so I don't
think it would be that surprising for them to defy this expectation.
I wasn't quite sure what to do here. I see your perspective: for
example, reltuples can't possibly be more than relpages but we don't
do any validation of that. My rationale wasn't exactly principled, so
I'll change it to not cap relallfrozen.
This makes me think I should also not cap relallfrozen when using it
in relation_needs_vacanalyze(). There I cap it to relallvisible and
relallvisible is capped to relpages. One of the ideas behind letting
people modify these stats in pg_class is that they can change a single
field to see what the effect on their system is, right?
Should we allow manipulating relallfrozen like we do relallvisible? My
assumption is that would even be required for the ongoing statistics
import/export work.
Why would it be required for the statistics import/export work?
Upthread, you mentioned that you weren't seeing relallfrozen in
pg_class_d.h. I checked on my machine and see it there as expected. Are
you still missing it?
I see it now. No idea what was happening.
- Melanie
On Wed, Feb 19, 2025 at 04:36:05PM -0500, Melanie Plageman wrote:
This makes me think I should also not cap relallfrozen when using it
in relation_needs_vacanalyze(). There I cap it to relallvisible and
relallvisible is capped to relpages. One of the ideas behind letting
people modify these stats in pg_class is that they can change a single
field to see what the effect on their system is, right?
Right. Capping these values to reflect reality seems like it could make
that more difficult.
Should we allow manipulating relallfrozen like we do relallvisible? My
assumption is that would even be required for the ongoing statistics
import/export work.Why would it be required for the statistics import/export work?
It's probably not strictly required, but my naive expectation would be that
we'd handle relallfrozen just like relallvisible, which appears to be
dumped in the latest stats import/export patch. Is there any reason we
shouldn't do the same for relallfrozen?
--
nathan
On Wed, Feb 19, 2025 at 4:59 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Feb 19, 2025 at 04:36:05PM -0500, Melanie Plageman wrote:
This makes me think I should also not cap relallfrozen when using it
in relation_needs_vacanalyze(). There I cap it to relallvisible and
relallvisible is capped to relpages. One of the ideas behind letting
people modify these stats in pg_class is that they can change a single
field to see what the effect on their system is, right?Right. Capping these values to reflect reality seems like it could make
that more difficult.
Attache v7 doesn't cap the result for manual stats updating done with
relation_statistics_update(). I did, however, keep the cap for the
places where vacuum/analyze/create index update the stats. There the
number for relallfrozen is coming directly from visibilitymap_count(),
so it should be correct. I could perhaps add an assert instead, but I
didn't think that really made sense. An assert is meant to help the
developer and what could the developer do about the visibility map
being corrupted.
Should we allow manipulating relallfrozen like we do relallvisible? My
assumption is that would even be required for the ongoing statistics
import/export work.Why would it be required for the statistics import/export work?
It's probably not strictly required, but my naive expectation would be that
we'd handle relallfrozen just like relallvisible, which appears to be
dumped in the latest stats import/export patch. Is there any reason we
shouldn't do the same for relallfrozen?
Nope I don't think so, but I also don't know about how people are
envisioning using a manually updated relallvisible.
- Melanie