[PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
The caller (LockBufferForCleanup) loops back, sets up another
deadlock_timeout,
and the signal gets sent again every interval.
The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
by adding a second ProcWaitForSignal() after the deadlock-check signal.
The buffer-pin path was left with an XXX comment asking "should we fix
this?".
The attached patch applies the same fix: after sending the deadlock-check
signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
so the startup process waits for UnpinBuffer() rather than looping
and re-signaling.
Patch attached.
Attachments:
0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patchapplication/octet-stream; name=0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patchDownload+11-10
On Sun, Apr 19, 2026 at 2:47 PM JoongHyuk Shin <sjh910805@gmail.com> wrote:
In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
The caller (LockBufferForCleanup) loops back, sets up another deadlock_timeout,
and the signal gets sent again every interval.The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
by adding a second ProcWaitForSignal() after the deadlock-check signal.
The buffer-pin path was left with an XXX comment asking "should we fix this?".The attached patch applies the same fix: after sending the deadlock-check
signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
so the startup process waits for UnpinBuffer() rather than looping
and re-signaling.Patch attached.
Thanks for the patch! LGTM.
Since this change improves recovery-conflict behavior rather than fixing a bug,
it doesn't seem to need backpatching and we may need to wait until v20
development opens (probably July) before committing it.
While reading the patch and ResolveRecoveryConflictWithBufferPin(), I also
noticed that got_standby_delay_timeout is not initialized to false before
enabling the timeout. This is unrelated to the patch, and I think it is
harmless in the current code, but would it be better to initialize it there,
as we already do for got_standby_deadlock_timeout?
if (ltime != 0)
{
+ got_standby_delay_timeout = false;
timeouts[cnt].id = STANDBY_TIMEOUT;
timeouts[cnt].type = TMPARAM_AT;
timeouts[cnt].fin_time = ltime;
Regards,
--
Fujii Masao
On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
Since this change improves recovery-conflict behavior rather than fixing a bug,
it doesn't seem to need backpatching and we may need to wait until v20
development opens (probably July) before committing it.
Yeah, this one is an improvement, not an actual bug, so let's wait for
v20 if worth doing (I did not check it).
--
Michael
Thanks for the review.
v2 attached, with the suggested initialization added for symmetry.
Agreed this is an improvement rather than a bug fix,
so I've updated the CF tag to Performance accordingly.
I also verified the fix locally on a primary-standby setup,
using the buffer-pin conflict scenario from src/test/recovery/t/
031_recovery_conflict.pl
(aborted INSERT + cursor on standby + VACUUM FREEZE on primary).
On master, strace showed 9 SIGUSR1 broadcasts to the conflicting backend
over a 10-second window (one per deadlock_timeout).
With the patch applied, only 1 broadcast over the same window.
Patch attached.
--
JoongHyuk Shin
On Tue, Apr 21, 2026 at 2:55 PM Michael Paquier <michael@paquier.xyz> wrote:
Show quoted text
On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
Since this change improves recovery-conflict behavior rather than fixing
a bug,
it doesn't seem to need backpatching and we may need to wait until v20
development opens (probably July) before committing it.Yeah, this one is an improvement, not an actual bug, so let's wait for
v20 if worth doing (I did not check it).
--
Michael