Add progressive backoff to XactLockTableWait functions
Hi hackers,
This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().
As Kevin reported in this thread [1]/messages/by-id/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com, XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.
The patch implements a time-based threshold approach based on Fujii’s
idea [1]/messages/by-id/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.
[1]: /messages/by-id/CAM45KeELdjhS-rGuvN=ZLJ_asvZACucZ9LZWVzH7bGcD12DDwg@mail.gmail.com
Best regards,
Xuneng
Attachments:
0001-Add-progressive-backoff-to-XactLockTableWait.patchapplication/octet-stream; name=0001-Add-progressive-backoff-to-XactLockTableWait.patchDownload+44-5
On 2025/06/08 23:33, Xuneng Zhou wrote:
Hi hackers,
This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().As Kevin reported in this thread [1], XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.The patch implements a time-based threshold approach based on Fujii’s
idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.
Thanks for the patch!
When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.
What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().
If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.
Based on that, I think a change like the following could work well.
Thought?
----------------------------------------
XactLockTableWaitInfo info;
ErrorContextCallback callback;
bool first = true;
+ int left_till_hibernate = 5000;
<snip>
if (!first)
{
CHECK_FOR_INTERRUPTS();
- pg_usleep(1000L);
+
+ if (left_till_hibernate > 0)
+ {
+ pg_usleep(1000L);
+ left_till_hibernate--;
+ }
+ else
+ pg_usleep(1000000L);
----------------------------------------
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Hi,
Thanks for the feedback!
On Thu, Jun 12, 2025 at 10:02 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.
+1, this could be a problem
What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().
1s should be generally good
If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.
That makes sense to me.
Based on that, I think a change like the following could work well.
Thought?
I'll update the patch accordingly.
Best regards,
Xuneng
Hi,
Attached is v2 of the patch to add threshold-based sleep to
XactLockTableWait functions.
Changes from v1:
- Simplified approach based on Fujii's feedback [1]/messages/by-id/7c72c5d1-4d2f-46f7-8b68-dd96905f8c42@oss.nttdata.com: instead of exponential
backoff,
we now sleep 1ms for the first 5 seconds, then switch directly to 1s
sleeps
- Reduced the threshold from 10 seconds to 5 seconds to avoid excessive
delays
[1]: /messages/by-id/7c72c5d1-4d2f-46f7-8b68-dd96905f8c42@oss.nttdata.com
/messages/by-id/7c72c5d1-4d2f-46f7-8b68-dd96905f8c42@oss.nttdata.com
Best regards,
Xuneng
Attachments:
v2-0001-Add-threshold-based-sleep-to-XactLockTableWait-functions.patchapplication/octet-stream; name=v2-0001-Add-threshold-based-sleep-to-XactLockTableWait-functions.patchDownload+22-5
Hi,
Although it’s clear that replacing tight 1 ms polling loops will reduce CPU
usage, I'm curious about the hard numbers. To that end, I ran a 60 s
logical-replication slot–creation workload on a standby using three
different XactLockTableWait() variants—on an 8-core, 16 GB AMD system—and
collected both profiling traces and hardware-counter metrics.
1. Hardware‐counter results
[image: image.png]
- CPU cycles drop by 58% moving from 1 ms to exp. backoff, and another
25% to the 1 s threshold variant.
- Cache‐misses and context‐switches see similarly large reductions.
- IPC remains around 0.45, dipping slightly under longer sleeps.
2. Flame‐graph
See attached files
Best regards,
Xuneng
Hi,
On Sun, Jun 15, 2025 at 4:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi,
Attached is v2 of the patch to add threshold-based sleep to
XactLockTableWait functions.Changes from v1:
- Simplified approach based on Fujii's feedback [1]: instead of
exponential backoff,
we now sleep 1ms for the first 5 seconds, then switch directly to 1s
sleeps
- Reduced the threshold from 10 seconds to 5 seconds to avoid excessive
delays
When applying the v2 patch for benchmarking, warnings about trailing
whitespaces are emitted. I’ve removed them—please find the updated v3 patch
attached.
Best regards,
Xuneng
Attachments:
v3-0001-Add-threshold-based-sleep-to-XactLockTableWait-functions.patchapplication/octet-stream; name=v3-0001-Add-threshold-based-sleep-to-XactLockTableWait-functions.patchDownload+22-5
Hi,
On Tue, Jun 17, 2025 at 9:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi,
Although it’s clear that replacing tight 1 ms polling loops will reduce
CPU usage, I'm curious about the hard numbers. To that end, I ran a 60 s
logical-replication slot–creation workload on a standby using three
different XactLockTableWait() variants—on an 8-core, 16 GB AMD system—and
collected both profiling traces and hardware-counter metrics.1. Hardware‐counter results
[image: image.png]
- CPU cycles drop by 58% moving from 1 ms to exp. backoff, and another
25% to the 1 s threshold variant.
- Cache‐misses and context‐switches see similarly large reductions.
- IPC remains around 0.45, dipping slightly under longer sleeps.
Gmail does not seem to support embedded images, so I’ve included it as an
attachment.
Best regards,
Xuneng
Hi,
On 2025-06-08 22:33:39 +0800, Xuneng Zhou wrote:
This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().As Kevin reported in this thread [1], XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.The patch implements a time-based threshold approach based on Fujii’s
idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.
ISTM that this is going to wrong way - the real problem is that we seem to
have extended periods where XactLockTableWait() doesn't actually work, not
that the sleep time is too short. The sleep in XactLockTableWait() was
intended to address a very short race, not something that's essentially
unbound.
Greetings,
Andres Freund
Hi Andres,
Thanks for looking into this!
On Tue, Jun 17, 2025 at 10:17 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2025-06-08 22:33:39 +0800, Xuneng Zhou wrote:
This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().As Kevin reported in this thread [1], XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.The patch implements a time-based threshold approach based on Fujii’s
idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.ISTM that this is going to wrong way - the real problem is that we seem to
have extended periods where XactLockTableWait() doesn't actually work, not
that the sleep time is too short.
Yeah, XactLockTableWait() works well when the targeted XID lock exists and
can be waited for. However, this assumption breaks down on hot standby
because transactions from the primary aren't running locally—no exclusive
locks are registered in the standby's lock table, leading to potentially
unbounded polling instead of proper blocking waits.
The purpose of XactLockTableWait() is to wait for a specified transaction
to commit or abort, which remains semantically correct for
SnapBuildWaitSnapshot() during logical decoding. How about adding special
handling for the hot standby case within XactLockTableWait() by detecting
RecoveryInProgres(). Although after studying the various calls of
XactLockTableWait(), I'm uncertain whether this condition is proper and
sufficient to avoid affecting other use cases.
The sleep in XactLockTableWait() was
intended to address a very short race, not something that's essentially
unbound.
The sleep in function is used to address the possible race between txn in
ProcArray and locktable that can occur in
building snapshots for logical decoding. But in the hot standby case, the
race does not exist and it becomes a polling.
Best regards,
Xuneng
Hi,
Here's patch version 4.
1. The problem
I conducted further investigation on this issue. The 1ms sleep in
XactLockTableWait that falls back to polling was not problematic in certain
scenarios prior to v16. It became an potential issue after the "Allow
logical decoding on standbys" feature was introduced [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0fdab27ad68a059a1663fa5ce48d76333f1bd74c. After that
commit, we can now create logical replication slots on hot standby servers
and perform many other logical decoding operations. [2]https://bdrouvot.github.io/2023/04/19/postgres-16-highlight-logical-decoding-on-standby/
2. The analysis
Is this issue limited to calling sites related to logical decoding? I
believe so. As we've discussed, the core issue is that primary transactions
are not running locally on the standby, which breaks the assumption of
XactLockTableWait—that there is an xid lock to wait on. Other call stacks
of this function, except those from logical decoding, are unlikely to
encounter this problem because they are unlikely to be invoked from standby
servers.
Analysis of XactLockTableWait calling sites:
Problematic Case (Hot Standby):
- Logical decoding operations in snapbuild.c during snapshot building
- This is the case that can run on hot standby and experience the issue
Non-problematic Cases (Primary Only):
- Heap operations (heapam.c): DML operations not possible on read-only
standby
- B-tree operations (nbtinsert.c): Index modifications impossible on
standby
- Logical apply workers (execReplication.c): Cannot start during recovery
(BgWorkerStart_RecoveryFinished)
- All other cases: Require write operations unavailable on standby
3. The proposed solution
If the above analysis is sound, one potential fix would be to add separate
branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry in
the lock table for waiting. We could implement a new function for this
logic, but it's hard to find a proper place for it to fit well: lmgr.c is
for locks, while standby.c or procarray.c are not that ideal either.
Therefore, I placed the special handling for standby in
SnapBuildWaitSnapshot.
The waiting strategy for this version uses threshold-based sleep. A more
ideal approach might be a continuous sleep that's only woken up when the
transaction completes, but I didn't find the proper infrastructure to
support it.
The CHECK_FOR_INTERRUPTS() calls in the sleep sections have been removed.
With separate handling for primary and standby, unbounded waiting seems
unlikely to occur here.
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0fdab27ad68a059a1663fa5ce48d76333f1bd74c
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0fdab27ad68a059a1663fa5ce48d76333f1bd74c
[2]: https://bdrouvot.github.io/2023/04/19/postgres-16-highlight-logical-decoding-on-standby/
https://bdrouvot.github.io/2023/04/19/postgres-16-highlight-logical-decoding-on-standby/
Best regards,
Xuneng
Show quoted text
Attachments:
v4-0001-Add-threshold-based-sleep-to-SnapBuildWaitSnapshot.patchapplication/octet-stream; name=v4-0001-Add-threshold-based-sleep-to-SnapBuildWaitSnapshot.patchDownload+39-4
On 2025/06/24 1:32, Xuneng Zhou wrote:
Hi,
Here's patch version 4.
1. The problem
I conducted further investigation on this issue. The 1ms sleep in XactLockTableWait that falls back to polling was not problematic in certain scenarios prior to v16. It became an potential issue after the "Allow logical decoding on standbys" feature was introduced [1]. After that commit, we can now create logical replication slots on hot standby servers and perform many other logical decoding operations. [2]
2. The analysis
Is this issue limited to calling sites related to logical decoding? I believe so. As we've discussed, the core issue is that primary transactions are not running locally on the standby, which breaks the assumption of XactLockTableWait—that there is an xid lock to wait on. Other call stacks of this function, except those from logical decoding, are unlikely to encounter this problem because they are unlikely to be invoked from standby servers.
Analysis of XactLockTableWait calling sites:
Problematic Case (Hot Standby):
- Logical decoding operations in snapbuild.c during snapshot building
- This is thecase that can run on hot standby and experience the issue
Non-problematic Cases (Primary Only):
- Heap operations (heapam.c): DML operations not possible on read-only standby
- B-tree operations (nbtinsert.c): Index modifications impossible on standby
- Logical apply workers (execReplication.c): Cannot start during recovery (BgWorkerStart_RecoveryFinished)
- All other cases: Require write operations unavailable on standby
3. The proposed solution
If the above analysis is sound, one potential fix would be to add separate branching for standby in XactLockTableWait. However, this seems inconsistent with the function's definition—there's simply no lock entry in the lock table for waiting. We could implement a new function for this logic,
To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
But if these functions aren't intended to be used during recovery and
the loop shouldn't reach that many iterations, I'm also okay with
the v4 approach of introducing a separate function specifically for recovery.
But this amakes me wonder if we should add something like
Assert(!RecoveryInProgress()) to those two functions, to prevent them
from being used during recovery in the future.
but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not that ideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot.
Since the purpose and logic of the new function are similar to
XactLockTableWait(), I think it would be better to place it nearby
in lmgr.c, even though it doesn't handle a lock directly. That would
help keep the related logic together and improve readability.
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.
I think neither v3 nor v4 are viable patches.
Greetings,
Andres Freund
On 2025/07/02 23:04, Andres Freund wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.
On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
But are you suggesting that this doesn't need to be addressed?
Or do you have another idea for how to handle it?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Hi,
On July 2, 2025 10:15:09 AM EDT, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/07/02 23:04, Andres Freund wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
Right.
But are you suggesting that this doesn't need to be addressed?
No.
Or do you have another idea for how to handle it?
We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that's what needs to happen, instead on putting on another layer of hacks.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 2025/07/02 23:19, Andres Freund wrote:
Hi,
On July 2, 2025 10:15:09 AM EDT, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
On 2025/07/02 23:04, Andres Freund wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
Right.
But are you suggesting that this doesn't need to be addressed?
No.
Or do you have another idea for how to handle it?
We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that's what needs to happen, instead on putting on another layer of hacks.
Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
Hi,
On Wed, Jul 2, 2025 at 10:56 PM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:
On 2025/07/02 23:19, Andres Freund wrote:
Hi,
On July 2, 2025 10:15:09 AM EDT, Fujii Masao <
masao.fujii@oss.nttdata.com> wrote:
On 2025/07/02 23:04, Andres Freund wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, thisseems
inconsistent with the function's definition—there's simply no lock
entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
Right.
But are you suggesting that this doesn't need to be addressed?
No.
Or do you have another idea for how to handle it?
We have all the information to make it work properly on standby. I've
not find through the code to figure out not, but that's what needs to
happen, instead on putting on another layer of hacks.Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?
My take is that XactLockTableWait() isn’t really designed to work on
standby. Instead of adding another layer on top like v4 did, maybe we can
tweak the function itself to support standby. One possible approach could
be to add a check like RecoveryInProgress() to handle the logic when
running on a standby.
Best regards,
Xuneng
Thanks for the feedbacks!
To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
But if these functions aren't intended to be used during recovery and
the loop shouldn't reach that many iterations, I'm also okay with
the v4 approach of introducing a separate function specifically for
recovery.But this amakes me wonder if we should add something like
Assert(!RecoveryInProgress()) to those two functions, to prevent them
from being used during recovery in the future.but it's hard to find a proper place for it to fit well: lmgr.c is for
locks, while standby.c or procarray.c are not that ideal either. Therefore,
I placed the special handling for standby in SnapBuildWaitSnapshot.Since the purpose and logic of the new function are similar to
XactLockTableWait(), I think it would be better to place it nearby
in lmgr.c, even though it doesn't handle a lock directly. That would
help keep the related logic together and improve readability.
Now I see two possible approaches here. One is to extend
XactLockTableWait(), and the other is to introduce a new wait function
specifically for standby. For the first option, adding standby-specific
logic might not align well with the function’s name or its original design.
If we go with a new function, we might need to consider what other
scenarios it could be reused in. If this is the only place it would apply,
whether it’s worth introducing a separate function just for this case.
Best regards,
Xuneng
Show quoted text
Hi,
On Thu, Jul 3, 2025 at 9:30 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, thisseems
inconsistent with the function's definition—there's simply no lock
entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep
time
after 5000 loop iterations, which has negligible performance impact.
I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
Right.
But are you suggesting that this doesn't need to be addressed?
No.
Or do you have another idea for how to handle it?
We have all the information to make it work properly on standby. I've
not find through the code to figure out not, but that's what needs to
happen, instead on putting on another layer of hacks.Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?My take is that XactLockTableWait() isn’t really designed to work on
standby. Instead of adding another layer on top like v4 did, maybe we can
tweak the function itself to support standby. One possible approach could
be to add a check like RecoveryInProgress() to handle the logic when
running on a standby.
After thinking about this further, Andres's suggestion might be replacing
polling(whether smart or not) with event-driven like waiting in
XactLockTableWait. To achieve this, implementing a new notification
mechanism for standby servers seems to be required. From what I can
observe, the codebase appears to lack IPC infrastructure for waiting on
remote transaction completion and receiving notifications when those
transactions finish. I'm not familiar with this area, so additional inputs
would be very helpful here.
On 2025/07/04 17:57, Xuneng Zhou wrote:
Hi,
On Thu, Jul 3, 2025 at 9:30 AM Xuneng Zhou <xunengzhou@gmail.com <mailto:xunengzhou@gmail.com>> wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
Right.
But are you suggesting that this doesn't need to be addressed?
No.
Or do you have another idea for how to handle it?
We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that's what needs to happen, instead on putting on another layer of hacks.
Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?My take is that XactLockTableWait() isn’t really designed to work on standby. Instead of adding another layer on top like v4 did, maybe we can tweak the function itself to support standby. One possible approach could be to add a check like RecoveryInProgress() to handle the logic when running on a standby.
After thinking about this further, Andres's suggestion might be replacing polling(whether smart or not) with event-driven like waiting in XactLockTableWait. To achieve this, implementing a new notification mechanism for standby servers seems to be required. From what I can observe, the codebase appears to lack IPC infrastructure for waiting on remote transaction completion and receiving notifications when those transactions finish. I'm not familiar with this area, so additional inputs would be very helpful here.
Your guess might be right, or maybe not. It's hard for me to say for sure.
It seems better to wait for Andres to explain his idea in more detail,
rather than trying to guess...
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
On 2025-07-05 01:14:45 +0900, Fujii Masao wrote:
On 2025/07/04 17:57, Xuneng Zhou wrote:
Hi,
On Thu, Jul 3, 2025 at 9:30 AM Xuneng Zhou <xunengzhou@gmail.com <mailto:xunengzhou@gmail.com>> wrote:
Hi,
On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
On 2025/06/24 1:32, Xuneng Zhou wrote:
3. The proposed solution
If the above analysis is sound, one potential fix would be to add
separate branching for standby in XactLockTableWait. However, this seems
inconsistent with the function's definition—there's simply no lock entry
in the lock table for waiting. We could implement a new function for
this logic,To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
Right.
But are you suggesting that this doesn't need to be addressed?
No.
Or do you have another idea for how to handle it?
We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that's what needs to happen, instead on putting on another layer of hacks.
Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?My take is that XactLockTableWait() isn’t really designed to work on standby. Instead of adding another layer on top like v4 did, maybe we can tweak the function itself to support standby. One possible approach could be to add a check like RecoveryInProgress() to handle the logic when running on a standby.
After thinking about this further, Andres's suggestion might be replacing polling(whether smart or not) with event-driven like waiting in XactLockTableWait. To achieve this, implementing a new notification mechanism for standby servers seems to be required. From what I can observe, the codebase appears to lack IPC infrastructure for waiting on remote transaction completion and receiving notifications when those transactions finish. I'm not familiar with this area, so additional inputs would be very helpful here.
Your guess might be right, or maybe not. It's hard for me to say for sure.
It seems better to wait for Andres to explain his idea in more detail,
rather than trying to guess...
My position is basically:
1) We should *never* add new long-duration polling loops to postgres. We've
regretted it every time. It just ends up masking bugs and biting us in
scenarios we didn't predict (increased wakeups increasing power usage,
increased latency because our more eager wakeup mechanisms were racy).
2) We should try rather hard to not even have any new very short lived polling
code. The existing code in XactLockTableWait() isn't great, even on the
primary, but the window during the polling addresses is really short, so
it's *kinda* acceptable.
3) There are many ways to address the XactLockTableWait() issue here. One way
would be to simply make XactLockTableWait() work on standbys, by
maintaining the lock table. Another would be to teach it to add some
helper to procarray.c that allows XactLockTableWait() to work based on the
KnownAssignedXids machinery.
I don't have a clear preference for how to make this work in a non-polling
way. But it's clear to me that making it poll smarter is the completely wrong
direction.
Greetings,
Andres Freund