wake up logical workers after ALTER SUBSCRIPTION
Hi hackers,
While working on avoiding unnecessary wakeups in logical/worker.c (as was
done for walreceiver.c in 05a7be9), I noticed that the tests began taking
much longer. This seems to be caused by the reduced frequency of calls to
maybe_reread_subscription() in LogicalRepApplyLoop(). Presently,
LogicalRepApplyLoop() only waits for up to a second, so the subscription
info is re-read by workers relatively frequently. If LogicalRepApplyLoop()
sleeps for longer, the subscription info may not be read for much longer.
I think the fix for this problem can be considered independently, as
relying on frequent wakeups seems less than ideal, and the patch seems to
provide a small improvement even before applying the
avoid-unnecessary-wakeups patch. On my machine, the attached patch
improved 'check-world -j8' run time by ~12 seconds (from 3min 8sec to 2min
56 sec) and src/test/subscription test time by ~17 seconds (from 139
seconds to 122 seconds).
I put the new logic in launcher.c, but it might make more sense to put it
in logical/worker.c. I think that might require some new #includes in a
couple of files, but otherwise, the patch would likely look about the same.
Thoughts?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-wake-up-logical-workers-after-ALTER-SUBSCRIPTION.patchtext/x-diff; charset=us-asciiDownload+44-1
On Tue, Nov 22, 2022 at 1:41 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On my machine, the attached patch
improved 'check-world -j8' run time by ~12 seconds (from 3min 8sec to 2min
56 sec) and src/test/subscription test time by ~17 seconds (from 139
seconds to 122 seconds).
Nice!
Maybe a comment to explain why a single variable is enough? And an
assertion that it wasn't already set? And a note to future self: this
would be a candidate user of the nearby SetLatches() patch (which is
about moving SetLatch() syscalls out from under LWLocks, though this
one may not be very hot).
On Tue, Nov 22, 2022 at 03:16:05PM +1300, Thomas Munro wrote:
Maybe a comment to explain why a single variable is enough?
This crossed my mind shortly after sending my previous message. Looking
closer, I see that several types of ALTER SUBSCRIPTION do not call
PreventInTransactionBlock(), so a single variable might not be enough.
Perhaps we can put a list in TopTransactionContext. I'll do some more
investigation and report back.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Hi Nathan,
I have done almost same thing locally for [1]https://commitfest.postgresql.org/40/3581/, but I thought your code seemed better.
Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be executed
inside the transaction. So if two subscriptions are altered in the same
transaction, only one of them will awake. Is it expected behavior?
I think we can hold a suboid list and record oids when the subscription are
altered, and then the backend process can consume all of list cells at the end of
the transaction.
How do you think?
[1]: https://commitfest.postgresql.org/40/3581/
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Tue, Nov 22, 2022 at 03:03:52AM +0000, Hayato Kuroda (Fujitsu) wrote:
Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be executed
inside the transaction. So if two subscriptions are altered in the same
transaction, only one of them will awake. Is it expected behavior?I think we can hold a suboid list and record oids when the subscription are
altered, and then the backend process can consume all of list cells at the end of
the transaction.
I think you are correct. I did it this way in v2. I've also moved the
bulk of the logic to logical/worker.c.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-wake-up-logical-workers-after-ALTER-SUBSCRIPTION.patchtext/x-diff; charset=us-asciiDownload+55-1
Dear Nathan,
I think you are correct. I did it this way in v2. I've also moved the
bulk of the logic to logical/worker.c.
Thanks for updating! It becomes better. Further comments:
01. AlterSubscription()
```
+ LogicalRepWorkersWakeupAtCommit(subid);
+
```
Currently subids will be recorded even if the subscription is not modified.
I think LogicalRepWorkersWakeupAtCommit() should be called inside the if (update_tuple).
02. LogicalRepWorkersWakeupAtCommit()
```
+ oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+ on_commit_wakeup_workers_subids = lappend_oid(on_commit_wakeup_workers_subids,
+ subid);
```
If the subscription is altered twice in the same transaction, the same subid will be recorded twice.
I'm not sure whether it may be caused some issued, but list_member_oid() can be used to avoid that.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Tuesday, November 22, 2022 1:39 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Nov 22, 2022 at 03:03:52AM +0000, Hayato Kuroda (Fujitsu) wrote:
Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be
executed inside the transaction. So if two subscriptions are altered
in the same transaction, only one of them will awake. Is it expectedbehavior?
I think we can hold a suboid list and record oids when the
subscription are altered, and then the backend process can consume all
of list cells at the end of the transaction.I think you are correct. I did it this way in v2. I've also moved the bulk of
the logic to logical/worker.c.
Hi, thanks for updating.
I just quickly had a look at your patch and had one minor question.
With this patch, when we execute alter subscription in a sub transaction
and additionally rollback to it, is there any possibility that
we'll wake up the workers that don't need to do so ?
I'm not sure if this brings about some substantial issue,
but just wondering if there is any need of improvement for this.
Best Regards,
Takamichi Osumi
On Tuesday, November 22, 2022 2:49 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Dear Nathan,
I think you are correct. I did it this way in v2. I've also moved
the bulk of the logic to logical/worker.c.Thanks for updating! It becomes better. Further comments:
01. AlterSubscription()
``` + LogicalRepWorkersWakeupAtCommit(subid); + ```Currently subids will be recorded even if the subscription is not modified.
I think LogicalRepWorkersWakeupAtCommit() should be called inside the if
(update_tuple).
I think an exception would be REFRESH PULLICATION in which case update_tuple is
false, but it seems better to wake up apply worker in this case as well,
because the apply worker is also responsible to start table sync workers for
newly subscribed tables(in process_syncing_tables()).
Besides, it seems not a must to wake up apply worker for ALTER SKIP TRANSACTION,
Although there might be no harm for waking up in this case.
02. LogicalRepWorkersWakeupAtCommit()
``` + oldcxt = MemoryContextSwitchTo(TopTransactionContext); + on_commit_wakeup_workers_subids = lappend_oid(on_commit_wakeup_workers_subids, + subid); ```If the subscription is altered twice in the same transaction, the same subid will
be recorded twice.
I'm not sure whether it may be caused some issued, but list_member_oid() can
be used to avoid that.
+1, list_append_unique_oid might be better.
Best regards,
Hou zj
On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
While working on avoiding unnecessary wakeups in logical/worker.c (as was
done for walreceiver.c in 05a7be9), I noticed that the tests began taking
much longer. This seems to be caused by the reduced frequency of calls to
maybe_reread_subscription() in LogicalRepApplyLoop().
I think it would be interesting to know why tests started taking more
time after a reduced frequency of calls to
maybe_reread_subscription(). IIRC, we anyway call
maybe_reread_subscription for each xact.
--
With Regards,
Amit Kapila.
On Tue, Nov 22, 2022 at 07:25:36AM +0000, houzj.fnst@fujitsu.com wrote:
On Tuesday, November 22, 2022 2:49 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com>
Thanks for updating! It becomes better. Further comments:
01. AlterSubscription()
``` + LogicalRepWorkersWakeupAtCommit(subid); + ```Currently subids will be recorded even if the subscription is not modified.
I think LogicalRepWorkersWakeupAtCommit() should be called inside the if
(update_tuple).I think an exception would be REFRESH PULLICATION in which case update_tuple is
false, but it seems better to wake up apply worker in this case as well,
because the apply worker is also responsible to start table sync workers for
newly subscribed tables(in process_syncing_tables()).Besides, it seems not a must to wake up apply worker for ALTER SKIP TRANSACTION,
Although there might be no harm for waking up in this case.
In v3, I moved the call to LogicalRepWorkersWakeupAtCommit() to the end of
the function. This should avoid waking up workers in some cases where it's
unnecessary (e.g., if ALTER SUBSCRIPTION ERRORs in a subtransaction), but
there are still cases where we'll wake up the workers unnecessarily. I
think this is unlikely to cause any real problems in practice.
02. LogicalRepWorkersWakeupAtCommit()
``` + oldcxt = MemoryContextSwitchTo(TopTransactionContext); + on_commit_wakeup_workers_subids = lappend_oid(on_commit_wakeup_workers_subids, + subid); ```If the subscription is altered twice in the same transaction, the same subid will
be recorded twice.
I'm not sure whether it may be caused some issued, but list_member_oid() can
be used to avoid that.+1, list_append_unique_oid might be better.
Done in v3.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-wake-up-logical-workers-after-ALTER-SUBSCRIPTION.patchtext/x-diff; charset=us-asciiDownload+55-1
On Tue, Nov 22, 2022 at 04:59:28PM +0530, Amit Kapila wrote:
On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
While working on avoiding unnecessary wakeups in logical/worker.c (as was
done for walreceiver.c in 05a7be9), I noticed that the tests began taking
much longer. This seems to be caused by the reduced frequency of calls to
maybe_reread_subscription() in LogicalRepApplyLoop().I think it would be interesting to know why tests started taking more
time after a reduced frequency of calls to
maybe_reread_subscription(). IIRC, we anyway call
maybe_reread_subscription for each xact.
At the moment, commands like ALTER SUBSCRIPTION don't wake up the logical
workers for the target subscription, so the next call to
maybe_reread_subscription() may not happen for a while. Presently, we'll
only sleep up to a second in the apply loop, but with my new
prevent-unnecessary-wakeups patch, we may sleep for much longer. This
causes wait_for_subscription_sync to take more time after some ALTER
SUBSCRIPTION commands.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Dear Nathan,
Thank you for updating the patch!
In v3, I moved the call to LogicalRepWorkersWakeupAtCommit() to the end of
the function. This should avoid waking up workers in some cases where it's
unnecessary (e.g., if ALTER SUBSCRIPTION ERRORs in a subtransaction), but
there are still cases where we'll wake up the workers unnecessarily. I
think this is unlikely to cause any real problems in practice.
I understood you could accept false-positive event to avoid missing true-negative
like ALTER SUBSCRIPTION REFRESH. +1.
02. LogicalRepWorkersWakeupAtCommit()
``` + oldcxt = MemoryContextSwitchTo(TopTransactionContext); + on_commit_wakeup_workers_subids = lappend_oid(on_commit_wakeup_workers_subids, + subid); ```If the subscription is altered twice in the same transaction, the same subid will
be recorded twice.
I'm not sure whether it may be caused some issued, but list_member_oid() can
be used to avoid that.+1, list_append_unique_oid might be better.
Done in v3.
I have no comments for the v3 patch.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Thu, Nov 24, 2022 at 05:26:27AM +0000, Hayato Kuroda (Fujitsu) wrote:
I have no comments for the v3 patch.
Thanks for reviewing! Does anyone else have thoughts on the patch?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes are worthwhile. I think there's a good chance it would become a
game of whac-a-mole.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes are worthwhile. I think there's a good chance it would become a
game of whac-a-mole.
Aren't they all bugs, though, making our tests and maybe even real
systems slower than they need to be?
On Wed, Nov 30, 2022 at 5:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes are worthwhile. I think there's a good chance it would become a
game of whac-a-mole.Aren't they all bugs, though, making our tests and maybe even real
systems slower than they need to be?
(Which isn't to suggest that it's your job to fix them, but please do
share what you have if you run out of whack-a-mole steam, since we
seem to have several people keen to finish those moles off.)
Dear Nathan,
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes are worthwhile. I think there's a good chance it would become a
game of whac-a-mole.
I think at least this feature is needed for waking up workers that are slept due to the min_apply_delay.
The author supposed this patch and pinned our thread[1]/messages/by-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9@TYCPR01MB8373.jpnprd01.prod.outlook.com.
[1]: /messages/by-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9@TYCPR01MB8373.jpnprd01.prod.outlook.com
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Wed, Nov 30, 2022 at 05:27:40PM +1300, Thomas Munro wrote:
On Wed, Nov 30, 2022 at 5:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes are worthwhile. I think there's a good chance it would become a
game of whac-a-mole.Aren't they all bugs, though, making our tests and maybe even real
systems slower than they need to be?
Yeah, you're right, it's probably worth proceeding with this particular
thread even if we don't end up porting the suppress-unnecessary-wakeups
patch to logical/worker.c.
(Which isn't to suggest that it's your job to fix them, but please do
share what you have if you run out of whack-a-mole steam, since we
seem to have several people keen to finish those moles off.)
I don't mind fixing it! There are a couple more I'd like to track down
before posting another revision.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Tue, Nov 29, 2022 at 09:04:41PM -0800, Nathan Bossart wrote:
I don't mind fixing it! There are a couple more I'd like to track down
before posting another revision.
Okay, here is a new version of the patch. This seems to clear up
everything that I could find via the tests.
Thanks to this effort, I discovered that we need to include
wal_retrieve_retry_interval in our wait time calculations after failed
tablesyncs (for the suppress-unnecessary-wakeups patch). I'll make that
change and post that patch in a new thread.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0001-wake-up-logical-workers-after-ALTER-SUBSCRIPTION-.patchtext/x-diff; charset=us-asciiDownload+70-1
On Thu, Dec 01, 2022 at 04:21:30PM -0800, Nathan Bossart wrote:
Okay, here is a new version of the patch. This seems to clear up
everything that I could find via the tests.
I cleaned up the patch a bit.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com