[DOC] Document auto vacuum interruption

Started by James Colemanover 6 years ago8 messages
#1James Coleman
jtc331@gmail.com
1 attachment(s)

We've discussed this internally many times, but today finally decided
to write up a doc patch.

Autovacuum holds a SHARE UPDATE EXCLUSIVE lock, but other processes
can cancel autovacuum if blocked by that lock unless the autovacuum is
to prevent wraparound.This can result in very surprising behavior:
imagine a system that needs to run ANALYZE manually before batch jobs
to ensure reasonable query plans. That ANALYZE will interrupt attempts
to run autovacuum, and pretty soon the table is far more bloated than
expected, and query plans (ironically) degrade further.

Attached is a patch to document that behavior (as opposed to just in
the code at src/backend/storage/lmgr/proc.c:1320-1321).

James Coleman

Attachments:

autovacuum-interruption-v1.patchapplication/octet-stream; name=autovacuum-interruption-v1.patchDownload
commit 097a06d35d3bae67aa82baa56cddccc654df3c79
Author: James Coleman <jtc331@gmail.com>
Date:   Thu Jul 25 15:06:32 2019 -0500

    Document autovacuum interruption
    
    It's important users be able to know (without looking at the source
    code) that running DDL or DDL-like commands frequently can interrupt
    autovacuum enough that it will effectively never run, resulting in
    lots of pain with dead tuples.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 1972c20a8f..1b6fe468a1 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -825,6 +825,24 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
     <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
     are not considered in the balancing algorithm.
    </para>
+
+   <para>
+    Autovacuum workers generally avoid interfering with other commands.
+    If a process attempts to acquire a <literal>SHARE UPDATE EXCLUSIVE</literal>
+    lock (the lock type held by autovacuum), lock acquisition will interrupt
+    the autovacuum.  However if the autovacuum is running to prevent transaction
+    ID wraparound (i.e., the autovacuum query name in the
+    <structname>pg_stat_activity</structname> view ends with
+    <literal>(for wraparound)</quote>), the autovacuum is not automatically
+    interrupted.
+
+   <warning>
+    <para>
+      Regularly running commands requiring a <literal>SHARE UPDATE EXCLUSIVE</literal>
+      lock (e.g., ANALYZE) can effectively prevent autovacuums from ever
+      completing.
+    </para>
+   </warning>
   </sect2>
  </sect1>
 
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: James Coleman (#1)
Re: [DOC] Document auto vacuum interruption

On Fri, Jul 26, 2019 at 1:45 AM James Coleman <jtc331@gmail.com> wrote:

We've discussed this internally many times, but today finally decided
to write up a doc patch.

Thanks, I think something on the lines of what you have written can
help some users to understand the behavior in this area and there
doesn't seem to be any harm in giving such information to the user.

Autovacuum holds a SHARE UPDATE EXCLUSIVE lock, but other processes
can cancel autovacuum if blocked by that lock unless the autovacuum is
to prevent wraparound.This can result in very surprising behavior:
imagine a system that needs to run ANALYZE manually before batch jobs
to ensure reasonable query plans. That ANALYZE will interrupt attempts
to run autovacuum, and pretty soon the table is far more bloated than
expected, and query plans (ironically) degrade further.

+    If a process attempts to acquire a <literal>SHARE UPDATE
EXCLUSIVE</literal>
+    lock (the lock type held by autovacuum), lock acquisition will interrupt
+    the autovacuum.

I think it is not only for a process that tries to acquire a lock in
SHARE UPDATE EXCLUSIVE mode, rather when a process tries to acquire
any lock mode that conflicts with SHARE UPDATE EXCLUSIVE. For the
conflicting lock modes, you can refer docs [1]https://www.postgresql.org/docs/devel/explicit-locking.html (See Table 13.2.
Conflicting Lock Modes).

[1]: https://www.postgresql.org/docs/devel/explicit-locking.html

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#3James Coleman
jtc331@gmail.com
In reply to: Amit Kapila (#2)
1 attachment(s)
Re: [DOC] Document auto vacuum interruption

On Sat, Aug 31, 2019 at 10:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 26, 2019 at 1:45 AM James Coleman <jtc331@gmail.com> wrote:

We've discussed this internally many times, but today finally decided
to write up a doc patch.

Thanks, I think something on the lines of what you have written can
help some users to understand the behavior in this area and there
doesn't seem to be any harm in giving such information to the user.

Autovacuum holds a SHARE UPDATE EXCLUSIVE lock, but other processes
can cancel autovacuum if blocked by that lock unless the autovacuum is
to prevent wraparound.This can result in very surprising behavior:
imagine a system that needs to run ANALYZE manually before batch jobs
to ensure reasonable query plans. That ANALYZE will interrupt attempts
to run autovacuum, and pretty soon the table is far more bloated than
expected, and query plans (ironically) degrade further.

+    If a process attempts to acquire a <literal>SHARE UPDATE
EXCLUSIVE</literal>
+    lock (the lock type held by autovacuum), lock acquisition will interrupt
+    the autovacuum.

I think it is not only for a process that tries to acquire a lock in
SHARE UPDATE EXCLUSIVE mode, rather when a process tries to acquire
any lock mode that conflicts with SHARE UPDATE EXCLUSIVE. For the
conflicting lock modes, you can refer docs [1] (See Table 13.2.
Conflicting Lock Modes).

[1] - https://www.postgresql.org/docs/devel/explicit-locking.html

Updated patch attached. I changed the wording to be about conflicting
locks rather than a single lock type, added a link to the conflicting
locks table, and fixed a few sgml syntax issues in the original.

James Coleman

Attachments:

autovacuum-interruption-v2.patchapplication/octet-stream; name=autovacuum-interruption-v2.patchDownload
commit 8f7568e4c438d9e6264458d7472747fda84ccace
Author: jcoleman <james.coleman@getbraintree.com>
Date:   Fri Sep 13 18:12:29 2019 +0000

    Document autovacuum interruption
    
    It's important users be able to know (without looking at the source
    code) that running DDL or DDL-like commands frequently can interrupt
    autovacuum enough that it will effectively never run, resulting in
    lots of pain with dead tuples.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index d95c218d39..b138fba0c7 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -825,6 +825,26 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
     <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
     are not considered in the balancing algorithm.
    </para>
+
+   <para>
+    Autovacuum workers generally avoid interfering with other commands.
+    If a process attempts to acquire a lock that conficts with the
+    <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum
+    (see <xref linkend="mvcc-locking-tables-table-lock-compatibility"/>), lock
+    acquisition will interrupt the autovacuum.  However if the autovacuum is
+    running to prevent transaction ID wraparound (i.e., the autovacuum query
+    name in the <structname>pg_stat_activity</structname> view ends with
+    <literal>(for wraparound)</literal>), the autovacuum is not automatically
+    interrupted.
+   </para>
+
+   <warning>
+    <para>
+      Regularly running commands that acquire locks conflicting with a
+      <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+      effectively prevent autovacuums from ever completing.
+    </para>
+   </warning>
   </sect2>
  </sect1>
 
#4Amit Kapila
amit.kapila16@gmail.com
In reply to: James Coleman (#3)
1 attachment(s)
Re: [DOC] Document auto vacuum interruption

On Fri, Sep 13, 2019 at 11:59 PM James Coleman <jtc331@gmail.com> wrote:

On Sat, Aug 31, 2019 at 10:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Updated patch attached. I changed the wording to be about conflicting
locks rather than a single lock type, added a link to the conflicting
locks table, and fixed a few sgml syntax issues in the original.

I see error while compiling this patch on HEAD. See the below error:
/usr/bin/xmllint --path . --noout --valid postgres.sgml
postgres.sgml:833: element xref: validity error : IDREF attribute
linkend references an unknown ID
"mvcc-locking-tables-table-lock-compatibility"
make: *** [check] Error 4

The tag id mvcc-locking-tables-table-lock-compatibility is wrong. The
other problem I see is the wrong wording in one of the literals. I
have fixed both of these issues and slightly tweaked one of the
sentence. See the updated patch attached. On which version, are you
preparing this patch? I see both HEAD and 9.4 has the problems fixed
by me.

Let me know what you think of attached? I think we can back-patch
this patch. What do you think? Does anyone else have an opinion on
this patch especially if we see any problem in back-patching this?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

autovacuum-interruption-v3.patchapplication/octet-stream; name=autovacuum-interruption-v3.patchDownload
From f73cf585727aec1fd926fdf2b729935bf6c9c6e2 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Tue, 17 Sep 2019 11:20:00 +0530
Subject: [PATCH] Doc: document autovacuum interruption.

It's important users be able to know (without looking at the source code)
that running DDL or DDL-like commands frequently can interrupt autovacuum
enough that it will effectively never run, resulting in lots of pain with
dead tuples.

Reported-by: James Coleman
Author: James Coleman
Reviewed-by: Amit Kapila
Backpatch-through: 9.4
Discussion: https://postgr.es/m/CAAaqYe-XYyNwML1=f=gnd0qWg46PnvD=BDrCZ5-L94B887XVxQ@mail.gmail.com
---
 doc/src/sgml/maintenance.sgml | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index d95c218..6957e21 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -825,6 +825,26 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
     <literal>autovacuum_vacuum_cost_limit</literal> storage parameters have been set
     are not considered in the balancing algorithm.
    </para>
+
+   <para>
+    Autovacuum workers generally don't block other commands.  If a process
+    attempts to acquire a lock that conficts with the
+    <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
+    acquisition will interrupt the autovacuum.  For conflicting lock modes,
+    see <xref linkend="table-lock-compatibility"/>.  However, if the autovacuum
+    is running to prevent transaction ID wraparound (i.e., the autovacuum query
+    name in the <structname>pg_stat_activity</structname> view ends with
+    <literal>(to prevent wraparound)</literal>), the autovacuum is not
+    automatically interrupted.
+   </para>
+
+   <warning>
+    <para>
+      Regularly running commands that acquire locks conflicting with a
+      <literal>SHARE UPDATE EXCLUSIVE</literal> lock (e.g., ANALYZE) can
+      effectively prevent autovacuums from ever completing.
+    </para>
+   </warning>
   </sect2>
  </sect1>
 
-- 
1.8.3.1

#5James Coleman
jtc331@gmail.com
In reply to: Amit Kapila (#4)
Re: [DOC] Document auto vacuum interruption

On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 13, 2019 at 11:59 PM James Coleman <jtc331@gmail.com> wrote:

On Sat, Aug 31, 2019 at 10:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Updated patch attached. I changed the wording to be about conflicting
locks rather than a single lock type, added a link to the conflicting
locks table, and fixed a few sgml syntax issues in the original.

I see error while compiling this patch on HEAD. See the below error:
/usr/bin/xmllint --path . --noout --valid postgres.sgml
postgres.sgml:833: element xref: validity error : IDREF attribute
linkend references an unknown ID
"mvcc-locking-tables-table-lock-compatibility"
make: *** [check] Error 4

The tag id mvcc-locking-tables-table-lock-compatibility is wrong.

My apologies; I'd fixed that on my local copy before sending my last
email, but I must have somehow grabbed the wrong patch file to attach
to the email.

The
other problem I see is the wrong wording in one of the literals. I
have fixed both of these issues and slightly tweaked one of the
sentence. See the updated patch attached. On which version, are you
preparing this patch? I see both HEAD and 9.4 has the problems fixed
by me.

Let me know what you think of attached? I think we can back-patch
this patch. What do you think? Does anyone else have an opinion on
this patch especially if we see any problem in back-patching this?

The attached looks great!

I was working on HEAD for the patch, but this concern has been an
issue for quite a long time. We were running into it on 9.6 in
production, for example. And given how frequently it seems like there
are large-scale production issues related to auto vacuum, I think any
amount of back patching we can do to make that footgun less likely
would be a good thing.

James Coleman

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: James Coleman (#5)
Re: [DOC] Document auto vacuum interruption

On Tue, Sep 17, 2019 at 5:48 PM James Coleman <jtc331@gmail.com> wrote:

On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Let me know what you think of attached? I think we can back-patch
this patch. What do you think? Does anyone else have an opinion on
this patch especially if we see any problem in back-patching this?

The attached looks great!

I was working on HEAD for the patch, but this concern has been an
issue for quite a long time. We were running into it on 9.6 in
production, for example. And given how frequently it seems like there
are large-scale production issues related to auto vacuum, I think any
amount of back patching we can do to make that footgun less likely
would be a good thing.

Okay, I will commit this tomorrow unless someone has any comments or objections.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#6)
Re: [DOC] Document auto vacuum interruption

On Wed, Sep 18, 2019 at 10:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Sep 17, 2019 at 5:48 PM James Coleman <jtc331@gmail.com> wrote:

On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Let me know what you think of attached? I think we can back-patch
this patch. What do you think? Does anyone else have an opinion on
this patch especially if we see any problem in back-patching this?

The attached looks great!

I was working on HEAD for the patch, but this concern has been an
issue for quite a long time. We were running into it on 9.6 in
production, for example. And given how frequently it seems like there
are large-scale production issues related to auto vacuum, I think any
amount of back patching we can do to make that footgun less likely
would be a good thing.

Okay, I will commit this tomorrow unless someone has any comments or objections.

Pushed with minor changes. There was one extra space in a few lines
and the tag for back-branches (from 10~9.4) was slightly different.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#8James Coleman
jtc331@gmail.com
In reply to: Amit Kapila (#7)
Re: [DOC] Document auto vacuum interruption

On Thu, Sep 19, 2019 at 5:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Sep 18, 2019 at 10:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Sep 17, 2019 at 5:48 PM James Coleman <jtc331@gmail.com> wrote:

On Tue, Sep 17, 2019 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Let me know what you think of attached? I think we can back-patch
this patch. What do you think? Does anyone else have an opinion on
this patch especially if we see any problem in back-patching this?

The attached looks great!

I was working on HEAD for the patch, but this concern has been an
issue for quite a long time. We were running into it on 9.6 in
production, for example. And given how frequently it seems like there
are large-scale production issues related to auto vacuum, I think any
amount of back patching we can do to make that footgun less likely
would be a good thing.

Okay, I will commit this tomorrow unless someone has any comments or objections.

Pushed with minor changes. There was one extra space in a few lines
and the tag for back-branches (from 10~9.4) was slightly different.

I completely forgot to reply to this; thanks Amit for working on this.

James