Autovac cancellation is broken in v14

Started by Jeff Janesover 5 years ago4 messages
#1Jeff Janes
jeff.janes@gmail.com

If I create a large table with "CREATE TABLE ... AS SELECT ... from
generate_series(1,3e7)" with no explicit transactions, then once it is done
I wait for autovac to kick in, then when I try to build an index on that
table (or drop the table) the autovac doesn't go away on its own.

Bisects down to:

commit 5788e258bb26495fab65ff3aa486268d1c50b123
Author: Andres Freund <andres@anarazel.de>
Date: Wed Jul 15 15:35:07 2020 -0700

snapshot scalability: Move PGXACT->vacuumFlags to
ProcGlobal->vacuumFlags.

Which makes sense given the parts of the code this touches, although I
don't understand exactly what the problem is. The problem persists in HEAD
(77c7267c37).

Cheers,

Jeff

#2Jeff Janes
jeff.janes@gmail.com
In reply to: Jeff Janes (#1)
1 attachment(s)
Re: Autovac cancellation is broken in v14

On Thu, Aug 27, 2020 at 3:10 PM Jeff Janes <jeff.janes@gmail.com> wrote:

If I create a large table with "CREATE TABLE ... AS SELECT ... from
generate_series(1,3e7)" with no explicit transactions, then once it is done
I wait for autovac to kick in, then when I try to build an index on that
table (or drop the table) the autovac doesn't go away on its own.

After a bit more poking at this, I think we are checking if we ourselves
are an autovac process, not doing the intended check of whether the other
guy is one.

Where would be a good spot to add a regression test for this?
"isolation_regression" ?
Cheers,

Jeff

Attachments:

fix_autovac_cancel.patchapplication/octet-stream; name=fix_autovac_cancel.patchDownload
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index f6113b2d24..2964b2fc79 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1334,7 +1334,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			 * Only do it if the worker is not working to protect against Xid
 			 * wraparound.
 			 */
-			vacuumFlags = ProcGlobal->vacuumFlags[proc->pgxactoff];
+			vacuumFlags = ProcGlobal->vacuumFlags[autovac->pgxactoff];
 			if ((vacuumFlags & PROC_IS_AUTOVACUUM) &&
 				!(vacuumFlags & PROC_VACUUM_FOR_WRAPAROUND))
 			{
#3Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#2)
Re: Autovac cancellation is broken in v14

Hi,

On 2020-08-27 16:20:30 -0400, Jeff Janes wrote:

On Thu, Aug 27, 2020 at 3:10 PM Jeff Janes <jeff.janes@gmail.com> wrote:

If I create a large table with "CREATE TABLE ... AS SELECT ... from
generate_series(1,3e7)" with no explicit transactions, then once it is done
I wait for autovac to kick in, then when I try to build an index on that
table (or drop the table) the autovac doesn't go away on its own.

After a bit more poking at this, I think we are checking if we ourselves
are an autovac process, not doing the intended check of whether the other
guy is one.

Ugh, good catch.

Where would be a good spot to add a regression test for this?
"isolation_regression" ?

I'm not immediately sure how we could write a good test for this,
particularly not in the isolation tests. We'd basically have to make
sure that a table needs autovacuuming, then sleep for long enough for
autovacuum to have come around, and block autovacuum from making
progress. That latter is doable by holding a pin on a page it needs to
freeze, e.g. using a cursor. I suspect all of that would at least
require a TAP test, and might still be too fragile.

Other ideas?

Regards,

Andres

#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#3)
Re: Autovac cancellation is broken in v14

Hi,

On 2020-08-27 14:35:06 -0700, Andres Freund wrote:

On 2020-08-27 16:20:30 -0400, Jeff Janes wrote:

On Thu, Aug 27, 2020 at 3:10 PM Jeff Janes <jeff.janes@gmail.com> wrote:

If I create a large table with "CREATE TABLE ... AS SELECT ... from
generate_series(1,3e7)" with no explicit transactions, then once it is done
I wait for autovac to kick in, then when I try to build an index on that
table (or drop the table) the autovac doesn't go away on its own.

After a bit more poking at this, I think we are checking if we ourselves
are an autovac process, not doing the intended check of whether the other
guy is one.

Ugh, good catch.

Pushed the fix.

Where would be a good spot to add a regression test for this?
"isolation_regression" ?

I'm not immediately sure how we could write a good test for this,
particularly not in the isolation tests. We'd basically have to make
sure that a table needs autovacuuming, then sleep for long enough for
autovacuum to have come around, and block autovacuum from making
progress. That latter is doable by holding a pin on a page it needs to
freeze, e.g. using a cursor. I suspect all of that would at least
require a TAP test, and might still be too fragile.

Perhaps the easiest way for this would be to have an option to have
manual VACUUMs be interruptible by other backends. That seems like a
useful option anyway? I'll start a new thread.

Greetings,

Andres Freund