Add progressive backoff to XactLockTableWait functions

Started by Xuneng Zhou11 months ago32 messageshackers

xunengzhou@gmail.com

11 months ago

Hi hackers,

This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().

As Kevin reported in this thread [1]/messages/by-id/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com, XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.

The patch implements a time-based threshold approach based on Fujii’s
idea [1]/messages/by-id/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.

[1]: /messages/by-id/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com

Best regards,
Xuneng

Fujii Masao

masao.fujii@gmail.com

11 months ago

In reply to: Xuneng Zhou (#1)

Re: Add progressive backoff to XactLockTableWait functions

On 2025/06/08 23:33, Xuneng Zhou wrote:

Hi hackers,

This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().

As Kevin reported in this thread [1], XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.

The patch implements a time-based threshold approach based on Fujii’s
idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.

Thanks for the patch!

When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.

What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().

If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.

Based on that, I think a change like the following could work well.
Thought?

----------------------------------------
         XactLockTableWaitInfo info;
         ErrorContextCallback callback;
         bool            first = true;
+       int             left_till_hibernate = 5000;

<snip>

                 if (!first)
                 {
                         CHECK_FOR_INTERRUPTS();
-                       pg_usleep(1000L);
+
+                       if (left_till_hibernate > 0)
+                       {
+                               pg_usleep(1000L);
+                               left_till_hibernate--;
+                       }
+                       else
+                               pg_usleep(1000000L);
----------------------------------------

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

Add progressive backoff to XactLockTableWait functions

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: