Wake up autovacuum launcher from postmaster when a worker exits

Started by Heikki Linnakangas3 days ago4 messages
#1Heikki Linnakangas
hlinnaka@iki.fi
1 attachment(s)

When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the
launcher. I propose moving that responsibility to the postmaster, because:

* It's simpler IMHO

* The postmaster is already responsible for sending the signal if fork()
fails

* It makes it consistent with background workers. When a background
worker exits, the postmaster sends the signal to the launching process
(if requested).

* Postmaster doesn't need to worry about sending the signal to the wrong
process if the launcher's PID is reused, because it always has
up-to-date PID information, because the launcher is postmaster's child
process. That risk was negligible to begin with, but this eliminates
completely, so we don't need the comment excusing it it anymore.

I'm a little surprised it wasn't done this way to begin with, so I
wonder if I'm missing something?

- Heikki

Attachments:

v1-0001-Wake-up-autovacuum-launcher-from-postmaster-when-.patchtext/x-patch; charset=UTF-8; name=v1-0001-Wake-up-autovacuum-launcher-from-postmaster-when-.patchDownload
From ab4362c9e85ad4ad288c80042b7b862bccd2a326 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 8 Jan 2026 21:36:32 +0200
Subject: [PATCH v1 1/1] Wake up autovacuum launcher from postmaster when a
 worker exits

When an autovacuum worker exits, the launcher needs to be notified
with SIGUSR2, so that it can rebalance and possibly launch a new
worker. The launcher must be notified only after the worker has
finished ProcKill(), so that the worker slot is available for a new
worker. Before this commit, the autovacuum worker was responsible for
that, which required a slightly complicated dance to pass the
launcher's PID from FreeWorkerInfo() to ProcKill() in a global
variable.

Simplify that by moving the responsibility of the signaling to the
postmaster. The postmaster was already doing it when it failed to fork
a worker process, so it seems logical to make it responsible for
notifying the launcher on worker exit too. That's also how the
notification on background worker exit is done.
---
 src/backend/postmaster/autovacuum.c | 22 ----------------------
 src/backend/postmaster/postmaster.c |  8 ++++++++
 src/backend/storage/lmgr/proc.c     |  4 ----
 src/include/postmaster/autovacuum.h |  3 ---
 4 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3e507d23cc9..22379de1e31 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -323,9 +323,6 @@ avl_dbase  *avl_dbase_array;
 /* Pointer to my own WorkerInfo, valid on each worker */
 static WorkerInfo MyWorkerInfo = NULL;
 
-/* PID of launcher, valid only in worker while shutting down */
-int			AutovacuumLauncherPid = 0;
-
 static Oid	do_start_worker(void);
 static void ProcessAutoVacLauncherInterrupts(void);
 pg_noreturn static void AutoVacLauncherShutdown(void);
@@ -1604,11 +1601,6 @@ AutoVacWorkerMain(const void *startup_data, size_t startup_data_len)
 		do_autovacuum();
 	}
 
-	/*
-	 * The launcher will be notified of my death in ProcKill, *if* we managed
-	 * to get a worker slot at all
-	 */
-
 	/* All done, go away */
 	proc_exit(0);
 }
@@ -1623,20 +1615,6 @@ FreeWorkerInfo(int code, Datum arg)
 	{
 		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
 
-		/*
-		 * Wake the launcher up so that he can launch a new worker immediately
-		 * if required.  We only save the launcher's PID in local memory here;
-		 * the actual signal will be sent when the PGPROC is recycled.  Note
-		 * that we always do this, so that the launcher can rebalance the cost
-		 * limit setting of the remaining workers.
-		 *
-		 * We somewhat ignore the risk that the launcher changes its PID
-		 * between us reading it and the actual kill; we expect ProcKill to be
-		 * called shortly after us, and we assume that PIDs are not reused too
-		 * quickly after a process exits.
-		 */
-		AutovacuumLauncherPid = AutoVacuumShmem->av_launcherpid;
-
 		dlist_delete(&MyWorkerInfo->wi_links);
 		MyWorkerInfo->wi_dboid = InvalidOid;
 		MyWorkerInfo->wi_tableoid = InvalidOid;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 921d73226d6..d6133bfebc6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2664,6 +2664,14 @@ CleanupBackend(PMChild *bp,
 	if (bp_bgworker_notify)
 		BackgroundWorkerStopNotifications(bp_pid);
 
+	/*
+	 * If it was an autovacuum worker, wake up the launcher so that it can
+	 * immediately launch a new worker or rebalance to cost limit setting of
+	 * the remaining workers.
+	 */
+	if (bp_bkend_type == B_AUTOVAC_WORKER && AutoVacLauncherPMChild != NULL)
+		signal_child(AutoVacLauncherPMChild, SIGUSR2);
+
 	/*
 	 * If it was a background worker, also update its RegisteredBgWorker
 	 * entry.
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 66274029c74..063826ae576 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1035,10 +1035,6 @@ ProcKill(int code, Datum arg)
 	ProcGlobal->spins_per_delay = update_spins_per_delay(ProcGlobal->spins_per_delay);
 
 	SpinLockRelease(ProcStructLock);
-
-	/* wake autovac launcher if needed -- see comments in FreeWorkerInfo */
-	if (AutovacuumLauncherPid != 0)
-		kill(AutovacuumLauncherPid, SIGUSR2);
 }
 
 /*
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index e43067d0260..5aa0f3a8ac1 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -44,9 +44,6 @@ extern PGDLLIMPORT int autovacuum_multixact_freeze_max_age;
 extern PGDLLIMPORT double autovacuum_vac_cost_delay;
 extern PGDLLIMPORT int autovacuum_vac_cost_limit;
 
-/* autovacuum launcher PID, only valid when worker is shutting down */
-extern PGDLLIMPORT int AutovacuumLauncherPid;
-
 extern PGDLLIMPORT int Log_autovacuum_min_duration;
 extern PGDLLIMPORT int Log_autoanalyze_min_duration;
 
-- 
2.47.3

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: Wake up autovacuum launcher from postmaster when a worker exits

On Thu, Jan 08, 2026 at 09:57:38PM +0200, Heikki Linnakangas wrote:

When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the launcher. I
propose moving that responsibility to the postmaster, because:

This seems generally reasonable to me. So does the patch.

* It makes it consistent with background workers. When a background worker
exits, the postmaster sends the signal to the launching process (if
requested).

I've wondered about making autovacuum workers proper background workers.

I'm a little surprised it wasn't done this way to begin with, so I wonder if
I'm missing something?

This code dates back to commit e2a186b03c. I skimmed through the nearby
thread [0]/messages/by-id/flat/20070404233954.GK19251@alvh.no-ip.org and didn't immediately notice any discussion about this. My
guess is that it seemed simpler to directly alert the launcher, since it's
the one that needs to take action.

[0]: /messages/by-id/flat/20070404233954.GK19251@alvh.no-ip.org

--
nathan

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: Wake up autovacuum launcher from postmaster when a worker exits

On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the
launcher. I propose moving that responsibility to the postmaster, because:

* It's simpler IMHO

* The postmaster is already responsible for sending the signal if fork()
fails

* It makes it consistent with background workers. When a background
worker exits, the postmaster sends the signal to the launching process
(if requested).

* Postmaster doesn't need to worry about sending the signal to the wrong
process if the launcher's PID is reused, because it always has
up-to-date PID information, because the launcher is postmaster's child
process. That risk was negligible to begin with, but this eliminates
completely, so we don't need the comment excusing it it anymore.

It sounds reasonable to me too. +1.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#4li carol
carol.li2025@outlook.com
In reply to: Masahiko Sawada (#3)
RE: Wake up autovacuum launcher from postmaster when a worker exits

On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the
launcher. I propose moving that responsibility to the postmaster, because:

* It's simpler IMHO

* The postmaster is already responsible for sending the signal if
fork() fails

* It makes it consistent with background workers. When a background
worker exits, the postmaster sends the signal to the launching process
(if requested).

* Postmaster doesn't need to worry about sending the signal to the
wrong process if the launcher's PID is reused, because it always has
up-to-date PID information, because the launcher is postmaster's child
process. That risk was negligible to begin with, but this eliminates
completely, so we don't need the comment excusing it it anymore.

It sounds reasonable to me too. +1.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Hi all,

I have completed the testing for this patch with a focus on the signaling logic from postmaster to launcher.

Test Environment:
- autovacuum_max_workers = 1
- autovacuum_naptime = 5s
- Multiple databases (4 total)

Observations:
I performed cross-database vacuum tests. With the 1-worker limit, I observed that the launcher starts the next worker for a pending database immediately (within the ~1.25s scheduled stagger) after the previous worker exits.
The logs show a seamless handover between worker processes (e.g., Worker A exits at 16:06:20.209, and the next scheduled Worker B starts at 16:06:21.447).

2026-01-09 16:06:20.209 CST [1918017] LOG: automatic vacuum of table "db_test2.public.t2": index scans: 0
pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned
tuples: 50000 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 813, which was 1 XIDs old when operation ended
new relfrozenxid: 813, which is 3 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 2.794 MB/s
buffer usage: 697 hits, 0 reads, 4 dirtied
WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2026-01-09 16:06:21.447 CST [1918022] DEBUG: autovacuum: processing database "template1"
2026-01-09 16:06:22.699 CST [1918027] DEBUG: autovacuum: processing database "postgres"
2026-01-09 16:06:23.946 CST [1918044] DEBUG: autovacuum: processing database "db_test1"
2026-01-09 16:06:23.958 CST [1918044] LOG: automatic vacuum of table "db_test1.public.t1": index scans: 0
pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned
tuples: 50000 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 814, which was 1 XIDs old when operation ended
new relfrozenxid: 814, which is 7 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 2.793 MB/s
buffer usage: 697 hits, 0 reads, 4 dirtied
WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full
memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each)
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s

This confirms that the postmaster successfully notifies the launcher and the worker slot is freed appropriately before the notification.

The patch looks correct and robust. +1 from my side.

Best Regards,
Yuan Li(carol)