Logical replication launcher did not automatically restart when got SIGKILL

Started by cca55076 months ago19 messages

cca5507

cca5507@qq.com

6 months ago

1 attachment(s)

Hi, hackers

I found the $SUBJECT, the main reason is that RegisteredBgWorker::rw_pid has not been cleaned.

Attach a patch to fix it.

--
Regards,
ChangAo Chen

Attachments:

v1-0001-logical-replication-launcher-did-not-automaticall.patchapplication/octet-stream; charset=ISO-8859-1; name=v1-0001-logical-replication-launcher-did-not-automaticall.patchDownload

From b50be46d728f4aca19956373244286853a9a9a7a Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Tue, 15 Jul 2025 17:05:03 +0800
Subject: [PATCH v1] logical replication launcher did not automatically restart
 when got SIGKILL.

The main reason is that RegisteredBgWorker::rw_pid has not been cleaned.
---
 src/backend/postmaster/bgworker.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 116ddf7b835..11930024e78 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -192,6 +192,7 @@ BackgroundWorkerShmemInit(void)
 			slot->terminate = false;
 			slot->pid = InvalidPid;
 			slot->generation = 0;
+			rw->rw_pid = 0;
 			rw->rw_shmem_slot = slotno;
 			rw->rw_worker.bgw_notify_pid = 0;	/* might be reinit after crash */
 			memcpy(&slot->worker, &rw->rw_worker, sizeof(BackgroundWorker));
-- 
2.34.1

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: cca5507 (#1)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Tue, Jul 15, 2025 at 2:56 PM cca5507 <cca5507@qq.com> wrote:

Hi, hackers

I found the $SUBJECT, the main reason is that RegisteredBgWorker::rw_pid has not been cleaned.

Attach a patch to fix it.

Thank You for reporting this. The problem exists and the patch works
as expected.

In the patch, we are resetting the PID during shared memory
initialization. Is there a better place to handle PID reset in the
case of a SIGKILL, possibly within a cleanup flow? For example, during
a regular shutdown, we reset the launcher (background worker) PID in
CleanupBackend(). Or is this the only possibility?

thanks
Shveta

cca5507

cca5507@qq.com

6 months ago

In reply to: shveta malik (#2)

Re: Logical replication launcher did not automatically restart when got SIGKILL

Hi,

Reset the PID in ResetBackgroundWorkerCrashTimes() may also works, but I'm not sure which is better.

--
Regards,
ChangAo Chen

Fujii Masao

masao.fujii@oss.nttdata.com

6 months ago

In reply to: shveta malik (#2)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On 2025/07/15 19:34, shveta malik wrote:

On Tue, Jul 15, 2025 at 2:56 PM cca5507 <cca5507@qq.com> wrote:

Hi, hackers

I found the $SUBJECT, the main reason is that RegisteredBgWorker::rw_pid has not been cleaned.

Attach a patch to fix it.

Thanks for the report!

This issue appears to have been introduced by commit 28a520c0b77. As a result,
not only the logical replication launcher but also other background workers
(like autoprewarm) may fail to restart after a server crash.

Thank You for reporting this. The problem exists and the patch works
as expected.

In the patch, we are resetting the PID during shared memory
initialization. Is there a better place to handle PID reset in the
case of a SIGKILL, possibly within a cleanup flow? For example, during
a regular shutdown, we reset the launcher (background worker) PID in
CleanupBackend(). Or is this the only possibility?

From a quick look at the code, it seems that the second half of CleanupBackend()
is responsible for cleaning up background workers and resetting rw_pid to 0.
However, in the crash case, the function exits immediately after calling
HandleChildCrash(), skipping that cleanup:

if (crashed)
{
HandleChildCrash(bp_pid, exitstatus, procname);
return;
}

This could be the problem? Shouldn't the background worker cleanup still
happen even in the crash case?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Fujii Masao

masao.fujii@oss.nttdata.com

6 months ago

In reply to: Fujii Masao (#4)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On 2025/07/16 0:08, Fujii Masao wrote:

On 2025/07/15 19:34, shveta malik wrote:

On Tue, Jul 15, 2025 at 2:56 PM cca5507 <cca5507@qq.com> wrote:

Hi, hackers

I found the $SUBJECT, the main reason is that RegisteredBgWorker::rw_pid has not been cleaned.

Attach a patch to fix it.

Thanks for the report!

This issue appears to have been introduced by commit 28a520c0b77. As a result,
not only the logical replication launcher but also other background workers
(like autoprewarm) may fail to restart after a server crash.

I found that the same issue was previously reported here [1]/messages/by-id/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com,
and a patch has been added to the current commitfest [2]https://commitfest.postgresql.org/patch/5844/.

Regards,

[1]: /messages/by-id/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com
[2]: https://commitfest.postgresql.org/patch/5844/

--
Fujii Masao
NTT DATA Japan Corporation

cca5507

cca5507@qq.com

6 months ago

In reply to: Fujii Masao (#5)

Re: Logical replication launcher did not automatically restart when got SIGKILL

Hi,

The v1-0002 in [1]https://commitfest.postgresql.org/patch/5844/ will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?

[1]: https://commitfest.postgresql.org/patch/5844/
https://commitfest.postgresql.org/patch/5844/

--
Regards,
ChangAo Chen

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: cca5507 (#6)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Wed, Jul 16, 2025 at 8:51 AM cca5507 <cca5507@qq.com> wrote:

Hi,

The v1-0002 in [1] will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?

Shall ReportBackgroundWorkerExit() be skipped for 'crashed' background worker?

If we look at code prior to commit 28a520c0b77, there we were setting
'rw_crashed_at' in CleanupBackgroundWorker() and then
HandleChildCrash() was resetting the pid and exiting with no
additional processing.

thanks
Shveta

Fujii Masao

masao.fujii@gmail.com

6 months ago

In reply to: shveta malik (#7)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Thu, Jul 17, 2025 at 6:58 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Jul 16, 2025 at 8:51 AM cca5507 <cca5507@qq.com> wrote:

Hi,

The v1-0002 in [1] will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?

Shall ReportBackgroundWorkerExit() be skipped for 'crashed' background worker?

If we look at code prior to commit 28a520c0b77, there we were setting
'rw_crashed_at' in CleanupBackgroundWorker() and then
HandleChildCrash() was resetting the pid and exiting with no
additional processing.

It seems we don't need to set rw_crashed_at in crash cases,
since it's always reset to 0 by ResetBackgroundWorkerCrashTimes()
in restart-after-crash code. So, the only additional step we need may be
resetting rw_pid to 0.

Instead of modifying CleanupBackend() to do this, another option
could be to reset rw_pid during restart-after-crash code, for example,
inside ResetBackgroundWorkerCrashTimes(). Thought?

Regards,

--
Fujii Masao

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Fujii Masao (#8)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Thu, Jul 24, 2025 at 2:39 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jul 17, 2025 at 6:58 PM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Jul 16, 2025 at 8:51 AM cca5507 <cca5507@qq.com> wrote:

Hi,

The v1-0002 in [1] will call ReportBackgroundWorkerExit() which will send SIGUSR1 to 'bgw_notify_pid', but it may already exit in HandleChildCrash(), is this ok?

Shall ReportBackgroundWorkerExit() be skipped for 'crashed' background worker?

If we look at code prior to commit 28a520c0b77, there we were setting
'rw_crashed_at' in CleanupBackgroundWorker() and then
HandleChildCrash() was resetting the pid and exiting with no
additional processing.

It seems we don't need to set rw_crashed_at in crash cases,
since it's always reset to 0 by ResetBackgroundWorkerCrashTimes()
in restart-after-crash code.

Yes, that seems the case,

So, the only additional step we need may be
resetting rw_pid to 0.

I agree.

Instead of modifying CleanupBackend() to do this, another option
could be to reset rw_pid during restart-after-crash code, for example,
inside ResetBackgroundWorkerCrashTimes(). Thought?

Sounds reasonable.
Thinking out loud, when cleaning up after a backend or background
worker crash, process_pm_child_exit() is invoked, which subsequently
calls both CleanupBackend() and HandleChildCrash(). After the cleanup
completes, process_pm_child_exit() calls PostmasterStateMachine() to
move to the next state. As part of that, PostmasterStateMachine()
invokes ResetBackgroundWorkerCrashTimes() (only in crash
scenarios/FatalError), to reset a few things. Since it also resets
rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
well there.

thanks
Shveta

#10

Fujii Masao

masao.fujii@gmail.com

6 months ago

In reply to: shveta malik (#9)

1 attachment(s)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Thu, Jul 24, 2025 at 6:46 PM shveta malik <shveta.malik@gmail.com> wrote:

Sounds reasonable.
Thinking out loud, when cleaning up after a backend or background
worker crash, process_pm_child_exit() is invoked, which subsequently
calls both CleanupBackend() and HandleChildCrash(). After the cleanup
completes, process_pm_child_exit() calls PostmasterStateMachine() to
move to the next state. As part of that, PostmasterStateMachine()
invokes ResetBackgroundWorkerCrashTimes() (only in crash
scenarios/FatalError), to reset a few things. Since it also resets
rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
well there.

Thanks!
Attached is a patch that fixes the issue by resetting rw_pid in
ResetBackgroundWorkerCrashTimes().

We should probably add a regression test for this case,
but I'd prefer to commit the fix first and work on the test separately.
Andrey Rudometov proposed a test patch in thread [1]/messages/by-id/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com,
which we might use as a starting point.

Regards,

[1]: /messages/by-id/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com

--
Fujii Masao

Attachments:

v2-0001-Fix-background-worker-not-restarting-after-crash-.patchapplication/octet-stream; name=v2-0001-Fix-background-worker-not-restarting-after-crash-.patchDownload

From f1faa31a43ac7365d7681c01fe39977469bfe107 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 25 Jul 2025 09:50:35 +0900
Subject: [PATCH v2] Fix background worker not restarting after
 crash-and-restart cycle.

Previously, if a background worker crashed (e.g., due to a SIGKILL) and
the server restarted due to restart_after_crash being enabled,
the worker was not restarted as expected. Background workers without
the never-restart flag should automatically restart in this case.

This issue was introduced in commit 28a520c0b77, which failed to reset
the rw_pid field in the RegisteredBgWorker struct for the crashed worker.

This commit fixes the problem by resetting rw_pid for all eligible
background workers during the crash-and-restart cycle.

Back-patched to v18, where the bug was introduced.

Bug fix patches were proposed by Andrey Rudometov and ChangAo Chen,
but this commit uses a different approach.

Reported-by: Andrey Rudometov <unlimitedhikari@gmail.com>
Reported-by: ChangAo Chen <cca5507@qq.com>
Author: Andrey Rudometov <unlimitedhikari@gmail.com>
Author: ChangAo Chen <cca5507@qq.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com
Discussion: https://postgr.es/m/tencent_E00A056B3953EE6440F0F40F80EC30427D09@qq.com
Backpatch-through: 18
---
 src/backend/postmaster/bgworker.c   | 1 +
 src/backend/postmaster/postmaster.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 116ddf7b835..1ad65c237c3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -613,6 +613,7 @@ ResetBackgroundWorkerCrashTimes(void)
 			 * resetting.
 			 */
 			rw->rw_crashed_at = 0;
+			rw->rw_pid = 0;
 
 			/*
 			 * If there was anyone waiting for it, they're history.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index cca9b946e53..e01d9f0cfe8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2630,6 +2630,13 @@ CleanupBackend(PMChild *bp,
 	}
 	bp = NULL;
 
+	/*
+	 * In a crash case, exit immediately without resetting background worker
+	 * state. However, if restart_after_crash is enabled, the background
+	 * worker state (e.g., rw_pid) still needs be reset so the worker can
+	 * restart after crash recovery. This reset is handled in
+	 * ResetBackgroundWorkerCrashTimes(), not here.
+	 */
 	if (crashed)
 	{
 		HandleChildCrash(bp_pid, exitstatus, procname);
-- 
2.50.1

#11

cca5507

cca5507@qq.com

6 months ago

In reply to: Fujii Masao (#10)

Re: Logical replication launcher did not automatically restart when got SIGKILL

Hi,

The v2-0001 LGTM!

--
Regards,
ChangAo Chen

#12

shveta malik

shveta.malik@gmail.com

6 months ago

In reply to: Fujii Masao (#10)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Fri, Jul 25, 2025 at 7:17 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jul 24, 2025 at 6:46 PM shveta malik <shveta.malik@gmail.com> wrote:

Sounds reasonable.
Thinking out loud, when cleaning up after a backend or background
worker crash, process_pm_child_exit() is invoked, which subsequently
calls both CleanupBackend() and HandleChildCrash(). After the cleanup
completes, process_pm_child_exit() calls PostmasterStateMachine() to
move to the next state. As part of that, PostmasterStateMachine()
invokes ResetBackgroundWorkerCrashTimes() (only in crash
scenarios/FatalError), to reset a few things. Since it also resets
rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
well there.

Thanks!
Attached is a patch that fixes the issue by resetting rw_pid in
ResetBackgroundWorkerCrashTimes().

The patch LGTM.

We should probably add a regression test for this case,
but I'd prefer to commit the fix first and work on the test separately.
Andrey Rudometov proposed a test patch in thread [1],
which we might use as a starting point.

Sounds good.

thanks
Shveta

#13

Fujii Masao

masao.fujii@gmail.com

6 months ago

In reply to: shveta malik (#12)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Fri, Jul 25, 2025 at 5:25 PM shveta malik <shveta.malik@gmail.com> wrote:

On Fri, Jul 25, 2025 at 7:17 AM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jul 24, 2025 at 6:46 PM shveta malik <shveta.malik@gmail.com> wrote:

Sounds reasonable.
Thinking out loud, when cleaning up after a backend or background
worker crash, process_pm_child_exit() is invoked, which subsequently
calls both CleanupBackend() and HandleChildCrash(). After the cleanup
completes, process_pm_child_exit() calls PostmasterStateMachine() to
move to the next state. As part of that, PostmasterStateMachine()
invokes ResetBackgroundWorkerCrashTimes() (only in crash
scenarios/FatalError), to reset a few things. Since it also resets
rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as
well there.

Thanks!
Attached is a patch that fixes the issue by resetting rw_pid in
ResetBackgroundWorkerCrashTimes().

The patch LGTM.

Thanks to both ChangAo and Shveta for the review!
I've pushed the patch and back-patched it to v18.

We should probably add a regression test for this case,
but I'd prefer to commit the fix first and work on the test separately.
Andrey Rudometov proposed a test patch in thread [1],
which we might use as a starting point.

Sounds good.

This proposed patch adds a new regression test file to verify background
worker restarts when restart_after_crash is enabled. However, since
we already have 013_crash_restart.pl for testing that scenario,
I’m thinking it might be sufficient to check that the logical replication
launcher, i.e., the background worker, restarts properly there.

For example, we could add a check like the following to 013_crash_restart.pl.
Thoughts?

-----------------
is($node->safe_psql('postgres',
"SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type =
'logical replication launcher'"),
't',
'logical replication launcher is running after crash');
-----------------

Regards,

--
Fujii Masao

#14

cca5507

cca5507@qq.com

6 months ago

In reply to: Fujii Masao (#13)

Re: Logical replication launcher did not automatically restart when got SIGKILL

> For example, we could add a check like the following to 013_crash_restart.pl.
> Thoughts?
> 
> -----------------
> is($node->safe_psql('postgres',
> "SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type =
> 'logical replication launcher'"),
> 't',
> 'logical replication launcher is running after crash');
> -----------------

Agree, but we need note that the logical replication launcher won't be registered in some case. (see ApplyLauncherRegister())

--
Regards,
ChangAo Chen

#15

Fujii Masao

masao.fujii@gmail.com

6 months ago

In reply to: cca5507 (#14)

1 attachment(s)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Fri, Jul 25, 2025 at 10:53 PM cca5507 <cca5507@qq.com> wrote:

For example, we could add a check like the following to 013_crash_restart.pl.
Thoughts?

-----------------
is($node->safe_psql('postgres',
"SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type =
'logical replication launcher'"),
't',
'logical replication launcher is running after crash');
-----------------

Patch attached.
I confirmed that the test fails when applied to a version before
commit b5d084c5353 (i.e., before the bug was fixed), and it passes
on HEAD with the patch applied.

Agree, but we need note that the logical replication launcher won't be registered in some case. (see ApplyLauncherRegister())

Since 013_crash_restart.pl doesn't set any parameters that would
prevent the logical replication launcher from starting, I think
we don't need to worry about that. No??

Regards,

--
Fujii Masao

Attachments:

v1-0001-Add-regression-test-for-background-worker-restart.patchapplication/octet-stream; name=v1-0001-Add-regression-test-for-background-worker-restart.patchDownload

From d3a07c4dd54bb4c1a5df9fc7412839c3112ba5e0 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Fri, 25 Jul 2025 23:46:01 +0900
Subject: [PATCH v1] Add regression test for background worker restart after
 crash.

Previously, if a background worker crashed and the server restarted
with restart_after_crash enabled, the worker was not restarted
as expected. This issue was fixed by commit b5d084c5353,
which ensures that background workers without the never-restart flag
are correctly restarted after a crash-and-restart cycle.

To guard against regressions, this commit adds a test that verifies
a background worker successfully restarts in such a scenario.
---
 src/test/recovery/t/013_crash_restart.pl | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/test/recovery/t/013_crash_restart.pl b/src/test/recovery/t/013_crash_restart.pl
index debfa635c36..fadbfb3c4f2 100644
--- a/src/test/recovery/t/013_crash_restart.pl
+++ b/src/test/recovery/t/013_crash_restart.pl
@@ -228,6 +228,13 @@ is( $node->safe_psql(
 	'before-orderly-restart',
 	'can still write after crash restart');

+# Confirm that the logical replication launcher, a background worker
+# without the never-restart flag, has also restarted successfully.
+is($node->safe_psql('postgres',
+	"SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type = 'logical replication launcher'"),
+	't',
+	'logical replication launcher restarted after crash');
+
 # Just to be sure, check that an orderly restart now still works
 $node->restart();

-- 
2.50.1

#16

cca5507

cca5507@qq.com

6 months ago

In reply to: Fujii Masao (#15)

Re: Logical replication launcher did not automatically restart when got SIGKILL

Hi,

The test case seems to have a problem:

We cannot ensure that the SELECT happens after the pg_stat_activity can show the logical replication launcher.

With the following patch the test will fail (without the patch it may happen very rarely):

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 742d9ba68e9..1e155587c55 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1162,6 +1162,7 @@ ApplyLauncherMain(Datum main_arg)
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* Establish connection to nailed catalogs (we only ever access
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;* pg_subscription).
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*/
+&nbsp; &nbsp; &nbsp; &nbsp;sleep(100);
&nbsp; &nbsp; &nbsp; &nbsp; BackgroundWorkerInitializeConnection(NULL, NULL, 0);
&nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; /*

--
Regards,
ChangAo Chen

#17

Fujii Masao

masao.fujii@gmail.com

6 months ago

In reply to: cca5507 (#16)

1 attachment(s)

Re: Logical replication launcher did not automatically restart when got SIGKILL

On Sat, Jul 26, 2025 at 6:27 PM cca5507 <cca5507@qq.com> wrote:

Hi,

The test case seems to have a problem:

We cannot ensure that the SELECT happens after the pg_stat_activity can show the logical replication launcher.

Thanks for the review! You're right. We should use poll_query_until()
instead of safe_psql() to handle this properly.
I've attached an updated patch with that change.

Regards,

--
Fujii Masao

Attachments:

v2-0001-Add-regression-test-for-background-worker-restart.patchapplication/octet-stream; name=v2-0001-Add-regression-test-for-background-worker-restart.patchDownload

From 2b28f26a3aebfd8fabe9ebcf037668e818d0d749 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Sun, 27 Jul 2025 13:59:08 +0900
Subject: [PATCH v2] Add regression test for background worker restart after
 crash.

Previously, if a background worker crashed and the server restarted
with restart_after_crash enabled, the worker was not restarted
as expected. This issue was fixed by commit b5d084c5353,
which ensures that background workers without the never-restart flag
are correctly restarted after a crash-and-restart cycle.

To guard against regressions, this commit adds a test that verifies
a background worker successfully restarts in such a scenario.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Discussion: https://postgr.es/m/CAHGQGwHF-PdUOgiXCH_8K5qBm8b13h0Qt=dSoFXZybXQdbf-tw@mail.gmail.com
---
 src/test/recovery/t/013_crash_restart.pl | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/test/recovery/t/013_crash_restart.pl b/src/test/recovery/t/013_crash_restart.pl
index debfa635c36..4c5af018ee4 100644
--- a/src/test/recovery/t/013_crash_restart.pl
+++ b/src/test/recovery/t/013_crash_restart.pl
@@ -228,6 +228,13 @@ is( $node->safe_psql(
 	'before-orderly-restart',
 	'can still write after crash restart');

+# Confirm that the logical replication launcher, a background worker
+# without the never-restart flag, has also restarted successfully.
+is($node->poll_query_until('postgres',
+	"SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type = 'logical replication launcher'"),
+	'1',
+	'logical replication launcher restarted after crash');
+
 # Just to be sure, check that an orderly restart now still works
 $node->restart();

-- 
2.50.1

#18

cca5507

cca5507@qq.com

6 months ago

In reply to: Fujii Masao (#17)

Re: Logical replication launcher did not automatically restart when got SIGKILL

> I've attached an updated patch with that change.

LGTM!

--
Regards,
ChangAo Chen

#19

Fujii Masao

masao.fujii@gmail.com