Behaviour when autovacuum is canceled

Started by Martín Fernándezover 7 years ago8 messagesgeneral
Jump to latest
#1Martín Fernández
fmartin91@gmail.com

Hello,

I'm working on a high volume transaction database and we are starting to tune our autovacuum setting to improve our vacuuming performance.

Once thing that we know about autovacuum is that is can be automatically canceled if a dependent transaction is blocked by the autovacuum transaction. In a situation like this one, autovacuum would be canceled. 

My question is, what will happen to the work that autovacuum has achieved before being canceled ? Is that work lost ? I tried to look for the answer in the code and I'm not completely sure what will happen. From what I could understand (that can be totally wrong), the vacuum process is split in multiple small transactions. If the autovacuum is canceled, could it be possible that only the latest transaction work be lost ?

Sorry if my understanding is not the correct one, understanding how to tune the autovacuum seems really complicated to me.

Thanks before hand! 

Martín

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Martín Fernández (#1)
Re: Behaviour when autovacuum is canceled

On Thu, Sep 13, 2018 at 3:45 PM, Martín Fernández <fmartin91@gmail.com>
wrote:

From what I could understand (that can be totally wrong), the vacuum
process is split in multiple small transactions. If the autovacuum is
canceled, could it be possible that only the latest transaction work be
lost

From the docs:

"VACUUM cannot be executed inside a transaction block."

As it is non-transactional any work it performs is live immediately and
irrevocably as it occurs.

David J.

#3Martín Fernández
fmartin91@gmail.com
In reply to: David G. Johnston (#2)
Re: Behaviour when autovacuum is canceled

David,

Thanks a lot for the quick reply. 

I clearly misunderstood the references in the code. 

Best,

Martín

On Thu, Sep 13th, 2018 at 7:55 PM, "David G. Johnston" <david.g.johnston@gmail.com> wrote:

Show quoted text

On Thu, Sep 13, 2018 at 3:45 PM, Martín Fernández < fmartin91@gmail.com > wrote:

From what I could understand (that can be totally wrong), the vacuum
process is split in multiple small transactions. If the autovacuum is
canceled, could it be possible that only the latest transaction work be
lost 

From the docs:

"VACUUM cannot be executed inside a transaction block."

As it is non-transactional any work it performs is live immediately and
irrevocably as it occurs.

David J.

#4Martín Fernández
fmartin91@gmail.com
In reply to: Martín Fernández (#3)
Re: Behaviour when autovacuum is canceled

David,

Your last comment applies for cleaning up indexes as well ? We performed a simple test in our production database to understand behaviour and we got a result that surprised us based on your last comment.

We basically started a VACUUM on a given table, waited for one index to process (captured cleaned rows count) and cancel the VACUUM. When we run another VACUUM on the same table the dead rows removed from the first index was a number slightly higher than the value logged on the first VACUUM. This behaviour made us feel that the work done to clean dead tuples on the first index was performed again. 

Thanks!

Martín

On Thu, Sep 13th, 2018 at 8:0 PM, "Martín Fernández" <fmartin91@gmail.com> wrote:

Show quoted text

David,

Thanks a lot for the quick reply. 

I clearly misunderstood the references in the code. 

Best,

Martín

On Thu, Sep 13th, 2018 at 7:55 PM, "David G. Johnston" < david.g.johnston@gmail.com

wrote:

On Thu, Sep 13, 2018 at 3:45 PM, Martín Fernández < fmartin91@gmail.com > wrote:

From what I could understand (that can be totally wrong), the vacuum
process is split in multiple small transactions. If the autovacuum is
canceled, could it be possible that only the latest transaction work be
lost 

From the docs:

"VACUUM cannot be executed inside a transaction block."

As it is non-transactional any work it performs is live immediately and
irrevocably as it occurs.

David J.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martín Fernández (#4)
Re: Behaviour when autovacuum is canceled

=?UTF-8?q?Mart=C3=ADn_Fern=C3=A1ndez?= <fmartin91@gmail.com> writes:

We basically started a VACUUM on a given table, waited for one index to process (captured cleaned rows count) and cancel the VACUUM. When we run another VACUUM on the same table the dead rows removed from the first index was a number slightly higher than the value logged on the first VACUUM. This behaviour made us feel that the work done to clean dead tuples on the first index was performed again. 

The unit of work that doesn't have to be repeated if VACUUM is canceled
is:

1. Scan a bunch of heap pages to identify dead tuples;
2. Scan *all* the table's indexes to remove the corresponding index entries;
3. Rescan those heap pages to actually remove the tuples.

It sounds like you canceled partway through phase 2.

The actual size of this unit of work is the number of dead-tuple TIDs
that will fit in maintenance_work_mem (at six or eight bytes apiece,
I forget whether it's aligned...). Normally, people make
maintenance_work_mem big so that they can reduce the number of index
scan cycles needed to complete vacuuming a table. But if you're
concerned about reducing the amount of work lost to a cancel,
you might try *reducing* maintenance_work_mem. This will make
vacuum slower overall (more index scans), but you have a better
chance that it will manage to actually remove some tuples before
getting canceled.

Or you could look at fixing the access patterns that are causing
so many autovacuum cancels.

regards, tom lane

#6Martín Fernández
fmartin91@gmail.com
In reply to: Tom Lane (#5)
Re: Behaviour when autovacuum is canceled

Tom,

Thanks for the detailed explanation. I can start mapping your explanation with the source code I've been reading :)

We are in the process of tuning our autovacuum settings (on some tables) and stop relying on crontabs that are performing manual vacuums. 

By performing this changes we are going to start relying more heavily on the autovacuum work and the concern of "lost work" caused by autovacuum canceling itself when locking contention happen showed up. I'm guessing that we might be over thinking this and the canceling is not going to happen as frequently as we think it will.

Martín

On Thu, Sep 13th, 2018 at 9:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

=?UTF-8?q?Mart=C3=ADn_Fern=C3=A1ndez?= < fmartin91@gmail.com > writes:

We basically started a VACUUM on a given table, waited for one index to

process (captured cleaned rows count) and cancel the VACUUM. When we run
another VACUUM on the same table the dead rows removed from the first
index was a number slightly higher than the value logged on the first
VACUUM. This behaviour made us feel that the work done to clean dead
tuples on the first index was performed again. 

The unit of work that doesn't have to be repeated if VACUUM is canceled
is:

1. Scan a bunch of heap pages to identify dead tuples;
2. Scan *all* the table's indexes to remove the corresponding index
entries;
3. Rescan those heap pages to actually remove the tuples.

It sounds like you canceled partway through phase 2.

The actual size of this unit of work is the number of dead-tuple TIDs
that will fit in maintenance_work_mem (at six or eight bytes apiece,
I forget whether it's aligned...). Normally, people make
maintenance_work_mem big so that they can reduce the number of index
scan cycles needed to complete vacuuming a table. But if you're
concerned about reducing the amount of work lost to a cancel,
you might try *reducing* maintenance_work_mem. This will make
vacuum slower overall (more index scans), but you have a better
chance that it will manage to actually remove some tuples before
getting canceled.

Or you could look at fixing the access patterns that are causing
so many autovacuum cancels.

regards, tom lane

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Martín Fernández (#6)
Re: Behaviour when autovacuum is canceled

On 2018-Sep-13, Mart�n Fern�ndez wrote:

By performing this changes we are going to start relying more heavily
on the autovacuum work and the concern of "lost work" caused by
autovacuum canceling itself when locking contention happen showed up.
I'm guessing that we might be over thinking this and the canceling is
not going to happen as frequently as we think it will.

Any DDL run on a table will cancel an autovacuum over that table (except
for-wraparound autovacuums). If these are rare, you don't need to worry
about that too much. If they are frequent enough that autovacuum will
be cancelled regularly in one table, you'll be sad.

If you're running vacuum by hand, you'd probably see your DDL blocking
behind VACUUM, which would be very noticeable. I think if you don't
have trouble today without having tuned the system carefully to avoid
such trouble, you're not likely to have trouble with autovacuum either.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Martín Fernández
fmartin91@gmail.com
In reply to: Alvaro Herrera (#7)
Re: Behaviour when autovacuum is canceled

Alvaro,

Thanks for the insight, was really helpful!

Best,

Martín

On Fri, Sep 14th, 2018 at 12:41 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Show quoted text

On 2018-Sep-13, Martín Fernández wrote:

By performing this changes we are going to start relying more heavily
on the autovacuum work and the concern of "lost work" caused by
autovacuum canceling itself when locking contention happen showed up.
I'm guessing that we might be over thinking this and the canceling is
not going to happen as frequently as we think it will.

Any DDL run on a table will cancel an autovacuum over that table (except
for-wraparound autovacuums). If these are rare, you don't need to worry
about that too much. If they are frequent enough that autovacuum will
be cancelled regularly in one table, you'll be sad.

If you're running vacuum by hand, you'd probably see your DDL blocking
behind VACUUM, which would be very noticeable. I think if you don't
have trouble today without having tuned the system carefully to avoid
such trouble, you're not likely to have trouble with autovacuum either.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services