New criteria for autovacuum

knizhnik@garret.ru

9 months ago

In reply to: Aleksander Alekseev (#2)

Re: New criteria for autovacuum

On 03/04/2025 6:29 pm, Aleksander Alekseev wrote:

I have mixed feelings about addressing this. Although this behavior is
somewhat counterintuitive, if the user has a read-only lookup table
he/she can always execute VACUUM manually. In order to relieve him of
this unbearable burden we are going to need to introduce some overhead
that will affect all the users (not to mention people maintaining the
code). This would be convenient for sure but I'm not entirely sure if
it's worth it.

Overhead seems to be minimal (update of one statistic field in case of
heap fetch in index-only scan) and information about visibility checks
performed by IOS seems to be useful in any case, even if it is not used
by autovacuum.

So I am not sure how frequent this scenario is (when index-only scan has
to perform extra heap checks or is just not used because of VM
examination), but if it can be prevented without much efforts or
overhead, then why not to implement it?

samimseih@gmail.com

9 months ago

In reply to: Aleksander Alekseev (#3)

Re: New criteria for autovacuum

Interestingly it takes unusually long for my toy database:

There is nothing in between these two lines.

To my humble knowledge, CHECKOINT shouldn't set hint bits and should
take that long. At this point I don't know what's going on.

From what I can tell in your example, you ran the manual vacuum ( session 1)
while you had an open transaction (session 2), so vacuum could not remove
the dead tuples or update the visibility map. Once you committed session 2,
autovacuum came in and did its job after the autovacuum_naptime passed
(default 1 minute). Checkpoint does not update the visibility map, only
vacuum does.

IMO, I don't think we need this patch for vacuum, as simply making sure
autovacuum runs more frequently on the table that is accessed via
index-only scans often is a way to deal with this already, i.e
lowering autovacuum_vacuum_scale_factor. Maybe others have
a different opinion?

13 also introduced autovacuum_vacuum_scale_factor to deal with append
only tables that only saw their first vacuum for wraparound
prevention ( 200 million transactions by default ) and that made
index-only scans slow
because of an outdated visibility map as well as the wraparound vacuum being
more disruptive.

As far as extra metrics go for the scenario in which and index only scan must
visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a
idx_tup_fetch
counter which increases anytime an index scan visits the table, i.e.
index-only scan
with heap fetches or a regular index scan. I think having a counter
specifically for
heap fetches due to index-only scans could be valuable.

--
Sami Imseih
Amazon Web Services (AWS)

melanieplageman@gmail.com

9 months ago

In reply to: Sami Imseih (#5)

Re: New criteria for autovacuum

On Thu, Apr 3, 2025 at 4:37 PM Sami Imseih <samimseih@gmail.com> wrote:

From what I can tell in your example, you ran the manual vacuum ( session 1)
while you had an open transaction (session 2), so vacuum could not remove
the dead tuples or update the visibility map. Once you committed session 2,
autovacuum came in and did its job after the autovacuum_naptime passed
(default 1 minute). Checkpoint does not update the visibility map, only
vacuum does.

IMO, I don't think we need this patch for vacuum, as simply making sure
autovacuum runs more frequently on the table that is accessed via
index-only scans often is a way to deal with this already, i.e
lowering autovacuum_vacuum_scale_factor. Maybe others have
a different opinion?

Yea, I mean if you have a table that isn't meeting the threshold for
being vacuumed by autovacuum based on inserted/modified tuples that
you are querying a lot, then you should probably change the autovacuum
table storage options for that table.

I don't fully understand if the example provided here is in that
situation (i.e. autovac not being triggered because they were below
the thresholds).

Historically, we haven't updated the VM when doing on-access pruning
because we wanted vacuum to have to vacuum those pages and freeze them
before an aggressive vacuum was required. This release, in
052026c9b903, I added functionality to vacuum to scan more all-visible
pages during regular vacuums. I think this enables us to update the VM
during on-access pruning. This is something I plan to work on in 19.
It seems like it would alleviate situations like this.

As far as extra metrics go for the scenario in which and index only scan must
visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a
idx_tup_fetch
counter which increases anytime an index scan visits the table, i.e.
index-only scan
with heap fetches or a regular index scan. I think having a counter
specifically for
heap fetches due to index-only scans could be valuable.

What do you imagine using the heap fetches during index-only scans counter for?

- Melanie

samimseih@gmail.com

9 months ago

In reply to: Melanie Plageman (#6)

Re: New criteria for autovacuum

I think this enables us to update the VM
during on-access pruning. This is something I plan to work on in 19.
It seems like it would alleviate situations like this.

IMO, index-only scans hitting the heap have always caught users off guard,
especially because scan performance fluctuates between good and bad.
Reducing this performance volatility is a great improvement.

As far as extra metrics go for the scenario in which and index only scan must
visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a
idx_tup_fetch
counter which increases anytime an index scan visits the table, i.e.
index-only scan
with heap fetches or a regular index scan. I think having a counter
specifically for
heap fetches due to index-only scans could be valuable.

What do you imagine using the heap fetches during index-only scans counter for?

A user can only determine that they are using index-only scans through EXPLAIN,
which limits visibility. It would be beneficial to provide additional
metrics to
support autovacuum/vacuum tuning for workloads that rely on index-only scans
and require them to be as efficient as possible.

Currently, pg_stat_all_tables/indexes displays the number of index scans,
but it would also be helpful to show the number of index-only scans
specifically,
along with "heap fetches". This would help in identifying the "heap
fetches" issue
and confirming whether vacuum tuning had any impact to reduce "heap fetches".

The existing indx_tup_fetch combines heap fetches from normal and
index-only scans,
so it could be useful but not perfect for this task.

--
Sami Imseih
Amazon Web Services (AWS)

knizhnik@garret.ru

9 months ago

In reply to: Aleksander Alekseev (#3)

Re: New criteria for autovacuum

On 03/04/2025 6:50 pm, Aleksander Alekseev wrote:

Hi,

... and it is claimed that autovacuum will never be triggered in order
to set hint bits, assuming we never modify the table again.

Actually I waited a bit and got a better EXPLAIN:

```
Index Only Scan using humidity_idx on humidity (cost=0.42..181.36
rows=1970 width=4) (actual time=0.372..16.869 rows=2904.00 loops=1)
Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time
zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone)
AND (city = 'M
oscow'::text))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=44
Planning Time: 0.520 ms
Execution Time: 17.980 ms
(7 rows)
```

This happens when CHECKPOINT starts:

```
2025-04-03 18:36:23.435 MSK [209952] LOG: checkpoint starting: time
```

Interestingly it takes unusually long for my toy database:

```
2025-04-03 18:40:53.082 MSK [209952] LOG: checkpoint complete: wrote
3522 buffers (5.4%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0
removed, 5 recycled; write=269.463 s, sync=0.029 s, total=269.647 s;
sync files=32, longest=0.004 s, average=0.001 s; distance=68489 kB,
estimate=68489 kB; lsn=0/F4F3870, redo lsn=0/F167DD0
```

There is nothing in between these two lines.

To my humble knowledge, CHECKOINT shouldn't set hint bits and should
take that long. At this point I don't know what's going on.

This is `master` branch, b82e7eddb023.

Checkpoint is not setting hint bits and not updating VM.
it just writes dirty pages to disk.

My patch includes simple test reproducing the problem. You can check
that VM is never updated in this case and explicit checkpoint doesn't
solve the issue.
Certainly manual vacuum will help. But user should somehow managed to
notice that index-only scan is not used or used and perform larger
number of heap fetches and understand that problem is with not updated
VM and vacuum is needed to fix it.

In your example both sessions are inserting data into the table. Vacuum
performed in one session doesn't take in account records created by
uncommitted transaction in another session.
So I do not think that plan in your case is improved because of
checkpoint. Most likely autovacuum was just triggered for this table and
updates VM.

What is needed to reproduce the problem?
1. Table with populated data
2. Presence of transaction with assigned XID which prevents vacuum from
marking pages of this table as all visible
3. Vacuum or autovacuum processed this table (to eliminate dead tuple
and reset number of inserted tuples since last vacuum).

After this 3 steps autovacuum will never be called for this table (at
least until forced vacuum caused by wraparound).
And IOS will not be used or be very inefficient fot this table.

melanieplageman@gmail.com

9 months ago

In reply to: Konstantin Knizhnik (#8)

Re: New criteria for autovacuum

On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:

What is needed to reproduce the problem?
1. Table with populated data
2. Presence of transaction with assigned XID which prevents vacuum from
marking pages of this table as all visible
3. Vacuum or autovacuum processed this table (to eliminate dead tuple
and reset number of inserted tuples since last vacuum).

After this 3 steps autovacuum will never be called for this table (at
least until forced vacuum caused by wraparound).
And IOS will not be used or be very inefficient fot this table.

ISTM, this is more an issue of ins_since_vacuum being reset to 0 in
pstat_report_vacuum() even though those inserted tuples weren't
necessarily frozen and their pages not set all-visible. I don't know
exactly how we could modify that logic, but insert-triggered vacuums
are meant to set pages all-visible and freeze tuples, and if they
don't do that, it doesn't quite make sense to zero out the counter
that could trigger another one.

That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.

- Melanie

#10

Robert Haas

robertmhaas@gmail.com

9 months ago

In reply to: Melanie Plageman (#9)

Re: New criteria for autovacuum

On Fri, Apr 4, 2025 at 12:11 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.

This. It would be really useful to have some kind of a system for
figuring out when -- in terms of XIDs -- we ought to vacuum which
table. I think that's a hard problem, but it would help a lot of
people.

I do not think the approach in the proposed patch is correct at all.
The proposed new check would have exactly the same problem as the
existing one -- this could easily trigger vacuuming at a time when the
relevant tuples can't yet be made all-visible, in which case we'd just
do a lot of VACUUM work for nothing. That's already a problem with
autovacuum in some scenarios, and I bet this would be way worse.

--
Robert Haas
EDB: http://www.enterprisedb.com

#11

knizhnik@garret.ru

9 months ago

In reply to: Melanie Plageman (#9)

Re: New criteria for autovacuum

On 04/04/2025 7:10 pm, Melanie Plageman wrote:

On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik<knizhnik@garret.ru> wrote:

What is needed to reproduce the problem?
1. Table with populated data
2. Presence of transaction with assigned XID which prevents vacuum from
marking pages of this table as all visible
3. Vacuum or autovacuum processed this table (to eliminate dead tuple
and reset number of inserted tuples since last vacuum).

After this 3 steps autovacuum will never be called for this table (at
least until forced vacuum caused by wraparound).
And IOS will not be used or be very inefficient fot this table.

ISTM, this is more an issue of ins_since_vacuum being reset to 0 in
pstat_report_vacuum() even though those inserted tuples weren't
necessarily frozen and their pages not set all-visible. I don't know
exactly how we could modify that logic, but insert-triggered vacuums
are meant to set pages all-visible and freeze tuples, and if they
don't do that, it doesn't quite make sense to zero out the counter
that could trigger another one.

That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.

- Melanie

From logical point of view I agree with you: taken in account number of
inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to
`ins_all_visible_since_vacuum` and count only all-visible tuples. If
newly inserted tuple is not visible to all, then it should not be
accounted in statistic and trigger autovacuum. But I have completely no
idea of how to efficiently maintain such counter: we should keep track
of xids of all recently inserted tuples and on each transaction commit
determine which one of them become all-visible.

And your suggestion just to not reset `ins_since_vacuum` until all of
them becomes all-visible may be easier to implement, but under permanent
workload it can lead to situation when `ins_since_vacuum` is never reset
and at each vacuum iteration cause vacuuming of the table. Which may
cause significant degrade of performance. Even without long-living
transactions.

So I still think that maintaining count of heap visibility check is the
best alternative. It quite easy to implement, adds almost no overhead
and this information indicates efficiency of index-only scan. So it
seems to be useful even if not used by autovacuum.

Yes, long-living transactions and vacuum are "antagonists". If there is
long-living transaction, then forcing autovacuum because of number of
visibility checks criteria can also (as in case of not reseting
`ins_since_vacuum` counter) cause degrade of performance because of too
frequent and useless autovacuum runs for the table. But there is big
difference: using `checks_since_vacuum` criteria we trigger autovacuum
next time only when this counter exceeds threshold. Which should not
happen fast because this counter is reset after each vacuum. Unlike
`ins_since_vacuum` counter which you suggested not to reset until pages
are marked as all-visible by vacuum. In the last case autovacuum will be
invoked for the table each `autovacum_naptime` seconds.

#12

melanieplageman@gmail.com

9 months ago

In reply to: Konstantin Knizhnik (#11)

Re: New criteria for autovacuum

On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:

From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.

And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.

Right, you definitely can't just never reset it to 0. As it is
currently defined, the behavior is correct, that is how many tuples
were inserted since the last vacuum.

As for your proposed patch, I agree with Robert that triggering
vacuums using the stat you proposed just adds to the problem we know
we already have with wasted vacuuming work due to long-running
transactions.

A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.

- Melanie

#13

[0]: /messages/by-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA@mail.gmail.com

samimseih@gmail.com

9 months ago

In reply to: Konstantin Knizhnik (#11)

Re: New criteria for autovacuum

From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples.
If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum.

cumulative stats are not updated for an in-flight transaction, so
effectively, these rows
are not accounted for by vacuum.

I did just realize however that it is strange that a rolledback
session will accumulate n_tup_ins and n_ins_since_vacuum

# session 1
test=# create table t (id int);
CREATE TABLE
test=# begin;
BEGIN
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1

# session 2

test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
0 | 0 | 0 | 0 | 0 |
0
(1 row)

# session 1
ROLLBACK

# session 2
test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
3 | 0 | 0 | 0 | 3 |
3
(1 row)

ROLLBACKs should not be the norm, but this looks like a bug to me
as it may trigger vacuum based on insert threshold more often.
Thoughts?

So I still think that maintaining count of heap visibility check is the best alternative.
It quite easy to implement, adds almost no overhead and this information indicates efficiency
of index-only scan. So it seems to be useful even if not used by autovacuum.

I agree. It could aid in a user that wants to schedule some manual
vacuum for tables
that need very optimal index-only scans, or to per-table tune autovac
for those types of
tables. I actually think we should add a few columns as mentioned here [0]/messages/by-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA@mail.gmail.com

That being said, long-running transactions are a problem for
autovacuum in general. Even if you track this stat you are proposing
about heap fetches by index only scans, you won't know if the long
running transaction is over and thus if it makes sense to try and
trigger an autovacuum for that table again anyway.

This. It would be really useful to have some kind of a system for
figuring out when -- in terms of XIDs -- we ought to vacuum which
table. I think that's a hard problem, but it would help a lot of
people.

hmm, isn't that what anti-wraparound autovac does? or, I may
have missed the point here completely

--
Sami Imseih
Amazon Web Services (AWS)

#14

knizhnik@garret.ru

9 months ago

In reply to: Melanie Plageman (#12)

Re: New criteria for autovacuum

On 04/04/2025 10:41 pm, Melanie Plageman wrote:

On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik<knizhnik@garret.ru> wrote:

From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.

And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.

Right, you definitely can't just never reset it to 0. As it is
currently defined, the behavior is correct, that is how many tuples
were inserted since the last vacuum.

As for your proposed patch, I agree with Robert that triggering
vacuums using the stat you proposed just adds to the problem we know
we already have with wasted vacuuming work due to long-running
transactions.

If we already have a problem with wasted vacuuming then IMHO it should
be addressed (for example by keeping last_autovacuum_xid_horizon). I do
no think that the proposed patch adds something new to the problem. Yes
it will trigger autovacuum after each N visibility checks. And if there
is still some long running transaction, then this vacuum is useless. But
the same is true for criteria based on number of dead tuples.

And from my point of view additional autovacuum runs are less critical
than missed autovacuum runs (when table contains garbage or not updarted
VM but VM is not launched).

A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.

I really considered this alternative when thinking about the solution of
the problem. It is more consistent with hint bit approach.
I declined it in favor of this solution because of the following reasons:

1. Index-only scan holds read-only lock on heap page. In order to update
it, we need to upgrade this lock to exclusive.
2. We need to check visibility for all elements on the page (actually do
something like `heap_page_is_all_visible`) but if there is large number
elements at the page it can be quite expensive. And I afraid that it can
slowdown speed of index-only scan. Yes, only in "slow case" - when it
has to access heap to perform visibility check. But still it may be not
acceptable. Also it is not clear how to mark page as already checked.
Otherwise we will have to repeat this check for all tids referring this
page.
3. `heap_page_is_all_visible` is local to lazyvaccum.c. So to use it in
index-only scan we either have to make it global, either cut&paste it's
code. Just removing "static" is not possible, because it is using local
`LVRelState`, so some refactoring is needed in any case.
4. We need to wal-log VM page and heap pages in case of setting
all-visible bit. It is quite expensive operation. Doing it inside
index-only scan can significantly increase time of select. Certainly
Postgres is not a real-time DBMS. But still it is better to provide some
predictable query execution time. This is why I think that it is better
to do such workt in background (in vaccum).

#15

melanieplageman@gmail.com

9 months ago

In reply to: Konstantin Knizhnik (#14)

Re: New criteria for autovacuum

On Sat, Apr 5, 2025 at 2:02 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:

A more targeted solution to your specific problem would be to update
the visibility map on access. Then, the first time you have to fetch
that heap page, you could mark it all-visible (assuming the long
running transaction has ended) and then the next index-only scan
wouldn't have to do the same heap fetch. It doesn't add any overhead
in the case that the long running transaction has not ended, unlike
trying to trigger another autovacuum.

I really considered this alternative when thinking about the solution of the problem. It is more consistent with hint bit approach.
I declined it in favor of this solution because of the following reasons:

1. Index-only scan holds read-only lock on heap page. In order to update it, we need to upgrade this lock to exclusive.
2. We need to check visibility for all elements on the page (actually do something like `heap_page_is_all_visible`) but if there is large number elements at the page it can be quite expensive. And I afraid that it can slowdown speed of index-only scan. Yes, only in "slow case" - when it has to access heap to perform visibility check. But still it may be not acceptable. Also it is not clear how to mark page as already checked. Otherwise we will have to repeat this check for all tids referring this page.
3. `heap_page_is_all_visible` is local to lazyvaccum.c. So to use it in index-only scan we either have to make it global, either cut&paste it's code. Just removing "static" is not possible, because it is using local `LVRelState`, so some refactoring is needed in any case.
4. We need to wal-log VM page and heap pages in case of setting all-visible bit. It is quite expensive operation. Doing it inside index-only scan can significantly increase time of select. Certainly Postgres is not a real-time DBMS. But still it is better to provide some predictable query execution time. This is why I think that it is better to do such workt in background (in vaccum).

I wasn't thinking about adding a new VM setting functionality to index
only scan in particular. heapam_index_fetch_tuple() already calls
heap_page_prune_opt() which will do pruning under certain conditions.
I was thinking that we start updating the VM after pruning in the
on-access case too (not just when pruning is invoked by vacuum).

If you look at the callers of heap_page_prune_opt(), it includes
bitmap heap scan and also heap_prepare_pagescan() which is invoked as
part of sequential scans and other operations.

- Melanie

#16