[PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

Started by Baji Shaik2 months ago8 messageshackers

baji.pgdev@gmail.com

2 months ago

Hi,

When a REPACK (CONCURRENTLY) session is terminated via
pg_terminate_backend(), the REPACK decoding worker keeps running
indefinitely and holds its temporary replication slot.

To reproduce:

-- Session 1: start a long REPACK
REPACK (CONCURRENTLY) big_table;

-- Session 2: kill it
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query LIKE '%REPACK%';

-- The slot persists:
SELECT slot_name, active FROM pg_replication_slots;
-- repack_NNNN | t (still active, cannot be dropped)

Root cause:

pg_terminate_backend() causes ereport(FATAL) via ProcDiePending.
FATAL exits bypass PG_FINALLY blocks, so stop_repack_decoding_worker()
is never called. The decoding worker is left running.

Fix:

Register an on_proc_exit callback when the decoding worker starts.
The callback calls TerminateBackgroundWorker() to signal the worker.
We do not wait for the worker to exit in the callback (WaitLatch is
not safe during proc_exit); the worker's RS_TEMPORARY slot is dropped
automatically when the worker process exits.

Thanks,
Baji Shaik
AWS RDS

Sami Imseih

samimseih@gmail.com

2 months ago

In reply to: Baji Shaik (#1)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

Hi,

Thanks for reporting. This indeed looks like a bug.

With pg_terminate_backend, the logical replication worker has no
way to know that it needs to stop, as the PG_FINALLY is not
reached in this case.

I think registering a callback to terminate the worker is the proper fix,
but I don't think on_proc_exit() is the right place to register the
callback.

With 0001 applied and building with asserts, I see a segfault.

postgres=# select pg_terminate_backend(26707);
pg_terminate_backend
----------------------
t
(1 row)

```
postgres=# select 1;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
postgres=?#
```

```
2026-05-12 21:50:33.866 CDT [26569] LOG: client backend (PID 26707)
was terminated by signal 11: Segmentation fault: 11
2026-05-12 21:50:33.866 CDT [26569] LOG: terminating any other active
server processes
2026-05-12 21:50:33.872 CDT [26569] LOG: all server processes
terminated; reinitializing
2026-05-12 21:50:33.882 CDT [27131] LOG: database system was
interrupted; last known up at 2026-05-12 21:45:39 CDT
2026-05-12 21:50:34.278 CDT [27131] LOG: database system was not
properly shut down; automatic recovery in progress
2026-05-12 21:50:34.281 CDT [27131] LOG: redo starts at 13/619E9470
```

From lldb on my Mac, I see

```
Process 22683 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x7f7f7f7f7f7f7f7f)
frame #0: 0x00000001044c607c
postgres`TerminateBackgroundWorker(handle=0x7f7f7f7f7f7f7f7f) at
bgworker.c:1324:2 [opt]
1321 BackgroundWorkerSlot *slot;
1322 bool signal_postmaster = false;
1323
-> 1324 Assert(handle->slot < max_worker_processes);
1325 slot = &BackgroundWorkerData->slot[handle->slot];
1326
1327 /* Set terminate flag in shared memory, unless
slot has been reused. */
```

The 0x7f7f7f7f7f7f7f7f is the CLOBBER_FREED_MEMORY fill pattern from
wipe_mem(). The handle's memory context has already been destroyed by
the time on_proc_exit callbacks run.

A better fix is to use before_shmem_exit instead, which is for
user-level cleanup.

/* ----------------------------------------------------------------
* before_shmem_exit
*
* Register early callback to perform user-level cleanup,

If we do that, we can also wait for the worker to shutdown, so we can use
stop_repack_decoding_worker();

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)

Alvaro Herrera

alvherre@2ndquadrant.com

2 months ago

In reply to: Baji Shaik (#1)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

On 2026-May-12, Baji Shaik wrote:

Root cause:

pg_terminate_backend() causes ereport(FATAL) via ProcDiePending.
FATAL exits bypass PG_FINALLY blocks, so stop_repack_decoding_worker()
is never called. The decoding worker is left running.

Fix:

Register an on_proc_exit callback when the decoding worker starts.

I think a better fix for this is to use PG_ENSURE_ERROR_CLEANUP(). That
way we avoid leaving callbacks in place, which would not be great if the
same backend does a lot of REPACKs: after a dozen or so, it dies with

FATAL: out of before_shmem_exit slots

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Baji Shaik

baji.pgdev@gmail.com

2 months ago

In reply to: Alvaro Herrera (#3)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

Hi Alvaro, Sami,

Thank you both for the feedback.

Sami, thanks for identifying the use-after-free with on_proc_exit
and providing the v2 patch with before_shmem_exit. That helped
narrow down the right approach.

On Sun, May 17, 2026 at 3:46 PM Alvaro Herrera <alvherre@kurilemu.de> wrote:

I think a better fix for this is to use PG_ENSURE_ERROR_CLEANUP(). That
way we avoid leaving callbacks in place, which would not be great if the
same backend does a lot of REPACKs: after a dozen or so, it dies with

FATAL: out of before_shmem_exit slots

Alvaro, you're right that PG_ENSURE_ERROR_CLEANUP is the better
approach here. Attached is v3 which uses it instead of
before_shmem_exit or on_proc_exit.

Summary of the issues with v1 and v2:

- v1 (on_proc_exit): Crashes with assertions enabled because
memory contexts are already destroyed by the time on_proc_exit
callbacks run. The worker handle is clobbered with 0x7f
(CLOBBER_FREED_MEMORY).

- v2 (before_shmem_exit): Works correctly for a single REPACK,
but leaks a callback slot on each successful REPACK
(CONCURRENTLY). After ~15 REPACKs in the same session:
FATAL: out of before_shmem_exit slots.

v3 uses PG_ENSURE_ERROR_CLEANUP which:
- Handles both ERROR and FATAL exits
- Automatically cancels the callback on normal completion
(no slot leak)
- Runs before memory contexts are destroyed (no use-after-free)

The existing PG_TRY/PG_FINALLY block is replaced with
PG_ENSURE_ERROR_CLEANUP for the concurrent path, and a plain
rebuild_relation() call for the non-concurrent path.

Tested with --enable-cassert --enable-debug --enable-injection-points:
- All 245 regression tests pass (including cluster)
- All 8 injection point tests pass (including repack and repack_toast)
- pg_terminate_backend cleans up worker and slot without crash
- 20 REPACK (CONCURRENTLY) in same session completes without
slot exhaustion

I have not added a dedicated regression test for the
pg_terminate_backend scenario yet, but I can write one using
injection points if needed.

Thanks,
Baji Shaik

Alvaro Herrera

alvherre@2ndquadrant.com

2 months ago

In reply to: Baji Shaik (#4)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

On 2026-May-17, Baji Shaik wrote:

v3 uses PG_ENSURE_ERROR_CLEANUP which:
- Handles both ERROR and FATAL exits
- Automatically cancels the callback on normal completion
(no slot leak)
- Runs before memory contexts are destroyed (no use-after-free)

Yeah, looks good. I have pushed it, with some comment wordsmithing and
other cosmetic changes.

While looking at it, I realized that I didn't like the way
stop_repack_decoding_worker() works, mainly because if there's no
handle, we leak everything else -- and the way we initialize things
means we leak the shared memory segment. This is maybe a rare case and
just a small memory leak, but it seems better to do it nicely. So
here's a followup patch that reworks that code. This also forced me to
understand more clearly what is going on, so I rewrote the comments.

- 20 REPACK (CONCURRENTLY) in same session completes without
slot exhaustion

FWIW I tested this by doing "repack (concurrently) foo \watch 0.1" and
letting it run for some time. I happened to notice that if I have two
psqls running, one with the above and the second with the equivalent for
table bar, when they run together, each runs more quickly than when only
one of them is running. I don't know what causes this; I suspect/assume
it's because the WAL messages for initial historic snapshot creation
from one gets the other running.

I have not added a dedicated regression test for the
pg_terminate_backend scenario yet, but I can write one using
injection points if needed.

I don't feel a need for that.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Antonin Houska

ah@cybertec.at

2 months ago

In reply to: Alvaro Herrera (#5)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

Alvaro Herrera <alvherre@kurilemu.de> wrote:

On 2026-May-17, Baji Shaik wrote:

v3 uses PG_ENSURE_ERROR_CLEANUP which:
- Handles both ERROR and FATAL exits
- Automatically cancels the callback on normal completion
(no slot leak)
- Runs before memory contexts are destroyed (no use-after-free)

Yeah, looks good. I have pushed it, with some comment wordsmithing and
other cosmetic changes.

While looking at it, I realized that I didn't like the way
stop_repack_decoding_worker() works, mainly because if there's no
handle, we leak everything else -- and the way we initialize things
means we leak the shared memory segment. This is maybe a rare case and
just a small memory leak, but it seems better to do it nicely. So
here's a followup patch that reworks that code. This also forced me to
understand more clearly what is going on, so I rewrote the comments.

The call of shm_mq_detach() got lost, or do you rely on dsm_detach() to call
shm_mq_detach_callback() ? The latter does not free ->mqh_buffer. Since each
REPACK runs in a separate transaction, I wouldn't consider that a leak, but I
still think that explicit call of shm_mq_detach() makes the code a bit easier
to read (i.e. no need for the developer to check if the detaching happens
automatically).

 	/*
-	 * If we could not cancel the current sleep due to ERROR, do that before
-	 * we detach from the shared memory the condition variable is located in.
-	 * If we did not, the bgworker ERROR handling code would try and fail
-	 * badly.
+	 * Now detach from our shared memory segment.  In error cases there might
+	 * still be messages from the worker in the queue, which ProcessInterrupts
+	 * would try to read; this is pointless (and causes an assertion failure),
+	 * so set the global pointer to NULL to have ProcessRepackMessages ignore
+	 * them.
 	 */
-	ConditionVariableCancelSleep();
-
-	dsm_detach(decoding_worker->seg);
+	dsmseg = decoding_worker->seg;
 	pfree(decoding_worker);
 	decoding_worker = NULL;
+
+	/* We must also cancel the current sleep, if one is still set up */
+	ConditionVariableCancelSleep();
+
+	if (dsmseg != NULL)
+		dsm_detach(dsmseg);

I suppose the reason for the assertion failure was reading from the queue
after the backend had detached from it? Thanks for fixing that.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Alvaro Herrera

alvherre@2ndquadrant.com

2 months ago

In reply to: Antonin Houska (#6)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

On 2026-May-20, Antonin Houska wrote:

Alvaro Herrera <alvherre@kurilemu.de> wrote:

The call of shm_mq_detach() got lost, or do you rely on dsm_detach() to call
shm_mq_detach_callback() ? The latter does not free ->mqh_buffer.

Hmm, this I think this is a code documentation bug then, because the
comment for shm_mq_attach() says

* If seg != NULL, the queue will be automatically detached when that dynamic
* shared memory segment is detached.

I think it's strange, or buggy even, to say that "the queue is
automatically detached", but that you still have to call dsm_mq_detach()
afterwards.

I can put back the shm_mq_detach() call, of course.

+	 * Now detach from our shared memory segment.  In error cases there might
+	 * still be messages from the worker in the queue, which ProcessInterrupts
+	 * would try to read; this is pointless (and causes an assertion failure),
+	 * so set the global pointer to NULL to have ProcessRepackMessages ignore
+	 * them.

I suppose the reason for the assertion failure was reading from the queue
after the backend had detached from it? Thanks for fixing that.

Yeah, that's exactly what happened.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La vida es para el que se aventura"

Alvaro Herrera

alvherre@2ndquadrant.com

2 months ago

In reply to: Alvaro Herrera (#7)

Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

On 2026-May-20, Alvaro Herrera wrote:

On 2026-May-20, Antonin Houska wrote:

The call of shm_mq_detach() got lost, or do you rely on dsm_detach() to call
shm_mq_detach_callback() ? The latter does not free ->mqh_buffer.

I can put back the shm_mq_detach() call, of course.

I did that and pushed.

Many thanks!

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos. Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)

[PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

Attachments:

Attachments:

Attachments:

Attachments: