Suppressing useless wakeups in walreceiver

Started by Thomas Munroabout 4 years ago51 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

While working on WaitEventSet-ifying various codepaths, I found it
strange that walreceiver wakes up 10 times per second while idle.
Here's a draft patch to compute the correct sleep time.

Attachments:

0001-Suppress-useless-wakeups-in-walreceiver.patchtext/x-patch; charset=US-ASCII; name=0001-Suppress-useless-wakeups-in-walreceiver.patchDownload+158-101
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Thomas Munro (#1)
Re: Suppressing useless wakeups in walreceiver

Hello.

At Thu, 27 Jan 2022 23:50:04 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

While working on WaitEventSet-ifying various codepaths, I found it
strange that walreceiver wakes up 10 times per second while idle.
Here's a draft patch to compute the correct sleep time.

Agree to the objective. However I feel the patch makes the code
somewhat less readable maybe because WalRcvComputeNextWakeup hides the
timeout deatils. Of course other might thing differently.

-              ping_sent = false;
-              XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1,
-                         startpointTLI);
+              WalRcvComputeNextWakeup(&state,
+                          WALRCV_WAKEUP_TIMEOUT,
+                          last_recv_timestamp);
+              WalRcvComputeNextWakeup(&state,
+                          WALRCV_WAKEUP_PING,
+                          last_recv_timestamp);

The calculated target times are not used within the closest loop and
the loop is quite busy. WALRCV_WAKEUP_PING is the replacement of
ping_sent, but I think the computation of both two target times would
be better done after the loop only when the "if (len > 0)" block was
passed.

-          XLogWalRcvSendReply(false, false);
+          XLogWalRcvSendReply(&state, GetCurrentTimestamp(),
+                    false, false);

The GetCurrentTimestamp() is same with last_recv_timestamp when the
recent recv() had any bytes received. So we can avoid the call to
GetCurrentTimestamp in that case. If we do the change above, the same
flag notifies the necessity of separete GetCurrentTimestamp().

I understand the reason for startpointTLI being stored in WalRcvInfo
but don't understand about primary_has_standby_xmin. It is just moved
from a static variable of XLogWalRcvSendHSFeedback to the struct
member that is modifed and read only by the same function.

The enum item WALRCV_WAKEUP_TIMEOUT looks somewhat uninformative. How
about naming it WALRCV_WAKEUP_TERMIATE (or WALRCV_WAKEUP_TO_DIE)?

WALRCV_WAKEUP_TIMEOUT and WALRCV_WAKEUP_PING doesn't fire when
wal_receiver_timeout is zero. In that case we should not set the
timeouts at all to avoid spurious wakeups?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: Suppressing useless wakeups in walreceiver

On Fri, Jan 28, 2022 at 8:26 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 27 Jan 2022 23:50:04 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

While working on WaitEventSet-ifying various codepaths, I found it
strange that walreceiver wakes up 10 times per second while idle.
Here's a draft patch to compute the correct sleep time.

Agree to the objective. However I feel the patch makes the code
somewhat less readable maybe because WalRcvComputeNextWakeup hides the
timeout deatils. Of course other might thing differently.

Thanks for looking!

The reason why I put the timeout computation into a function is
because there are about 3 places you need to do it: at the beginning,
when rescheduling for next time, and when the configuration file
changes and you might want to recompute them. The logic to decode the
GUCs and compute the times would be duplicated. I have added a
comment about that, and tried to make the code clearer.

Do you have another idea?

-              ping_sent = false;
-              XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1,
-                         startpointTLI);
+              WalRcvComputeNextWakeup(&state,
+                          WALRCV_WAKEUP_TIMEOUT,
+                          last_recv_timestamp);
+              WalRcvComputeNextWakeup(&state,
+                          WALRCV_WAKEUP_PING,
+                          last_recv_timestamp);

The calculated target times are not used within the closest loop and
the loop is quite busy. WALRCV_WAKEUP_PING is the replacement of
ping_sent, but I think the computation of both two target times would
be better done after the loop only when the "if (len > 0)" block was
passed.

Those two wakeup times should only be adjusted when data is received.
The calls should be exactly where last_recv_timestamp is set in
master, no?

-          XLogWalRcvSendReply(false, false);
+          XLogWalRcvSendReply(&state, GetCurrentTimestamp(),
+                    false, false);

The GetCurrentTimestamp() is same with last_recv_timestamp when the
recent recv() had any bytes received. So we can avoid the call to
GetCurrentTimestamp in that case. If we do the change above, the same
flag notifies the necessity of separete GetCurrentTimestamp().

Yeah, I agree that we should remove more GetCurrentTimestamp() calls.
Here's another idea: let's remove last_recv_timestamp, and just use
'now', being careful to update it after sleeping and in the receive
loop (in case it gets busy for a long time), so that it's always fresh
enough.

I understand the reason for startpointTLI being stored in WalRcvInfo
but don't understand about primary_has_standby_xmin. It is just moved
from a static variable of XLogWalRcvSendHSFeedback to the struct
member that is modifed and read only by the same function.

Yeah, this was unnecessary refactoring. I removed that change (we
could look into moving more state into the new state object instead of
using static locals and globals, but that can be for another thread, I
got carried away...)

The enum item WALRCV_WAKEUP_TIMEOUT looks somewhat uninformative. How
about naming it WALRCV_WAKEUP_TERMIATE (or WALRCV_WAKEUP_TO_DIE)?

Ok, let's try TERMINATE.

WALRCV_WAKEUP_TIMEOUT and WALRCV_WAKEUP_PING doesn't fire when
wal_receiver_timeout is zero. In that case we should not set the
timeouts at all to avoid spurious wakeups?

Oh, yeah, I forgot to handle wal_receiver_timeout = 0. Fixed.

Attachments:

v2-0001-Suppress-useless-wakeups-in-walreceiver.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Suppress-useless-wakeups-in-walreceiver.patchDownload+169-109
#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Thomas Munro (#3)
Re: Suppressing useless wakeups in walreceiver

At Fri, 28 Jan 2022 22:41:32 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

On Fri, Jan 28, 2022 at 8:26 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 27 Jan 2022 23:50:04 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

The reason why I put the timeout computation into a function is
because there are about 3 places you need to do it: at the beginning,
when rescheduling for next time, and when the configuration file
changes and you might want to recompute them. The logic to decode the
GUCs and compute the times would be duplicated. I have added a
comment about that, and tried to make the code clearer.

Do you have another idea?

Yeah, I understand that and am having no better ideas so I'm fine with
that.

-              ping_sent = false;
-              XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1,
-                         startpointTLI);
+              WalRcvComputeNextWakeup(&state,
+                          WALRCV_WAKEUP_TIMEOUT,
+                          last_recv_timestamp);
+              WalRcvComputeNextWakeup(&state,
+                          WALRCV_WAKEUP_PING,
+                          last_recv_timestamp);

The calculated target times are not used within the closest loop and
the loop is quite busy. WALRCV_WAKEUP_PING is the replacement of
ping_sent, but I think the computation of both two target times would
be better done after the loop only when the "if (len > 0)" block was
passed.

Those two wakeup times should only be adjusted when data is received.

Yes.

The calls should be exactly where last_recv_timestamp is set in
master, no?

The calcuations at other than the last one iteration are waste of
cycles and I thought the loop as a kind of tight-loop. So I don't
mind we consider the additional cycles are acceptable. (Just I'm not
sure.)

-          XLogWalRcvSendReply(false, false);
+          XLogWalRcvSendReply(&state, GetCurrentTimestamp(),
+                    false, false);

The GetCurrentTimestamp() is same with last_recv_timestamp when the
recent recv() had any bytes received. So we can avoid the call to
GetCurrentTimestamp in that case. If we do the change above, the same
flag notifies the necessity of separete GetCurrentTimestamp().

Yeah, I agree that we should remove more GetCurrentTimestamp() calls.
Here's another idea: let's remove last_recv_timestamp, and just use
'now', being careful to update it after sleeping and in the receive
loop (in case it gets busy for a long time), so that it's always fresh
enough.

I like it. It could be slightly off but it doesn't matter at all.

I understand the reason for startpointTLI being stored in WalRcvInfo
but don't understand about primary_has_standby_xmin. It is just moved
from a static variable of XLogWalRcvSendHSFeedback to the struct
member that is modifed and read only by the same function.

Yeah, this was unnecessary refactoring. I removed that change (we
could look into moving more state into the new state object instead of
using static locals and globals, but that can be for another thread, I
got carried away...)

The enum item WALRCV_WAKEUP_TIMEOUT looks somewhat uninformative. How
about naming it WALRCV_WAKEUP_TERMIATE (or WALRCV_WAKEUP_TO_DIE)?

Ok, let's try TERMINATE.

Thanks:)

WALRCV_WAKEUP_TIMEOUT and WALRCV_WAKEUP_PING doesn't fire when
wal_receiver_timeout is zero. In that case we should not set the
timeouts at all to avoid spurious wakeups?

Oh, yeah, I forgot to handle wal_receiver_timeout = 0. Fixed.

Yeah, looks fine.

+static void
+WalRcvComputeNextWakeup(WalRcvInfo *state,
+						WalRcvWakeupReason reason,
+						TimestampTz now)
+{
+	switch (reason)
+	{
...
+	default:
+		break;
+	}
+}

Mmm.. there's NUM_WALRCV_WAKEUPS.. But I think we don't need to allow
the case. Isn't it better to replace the break with Assert()?

It might be suitable for another patch, but we can do that
together. If we passed "now" to the function XLogWalRcvProcessMsg, we
would be able to remove further several calls to
GetCurrentTimestamp(), in the function itself and
ProcessWalSndrMessage (the name is improperly abbreviated..).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: Suppressing useless wakeups in walreceiver

I was surprised to see that this patch was never committed!

On Mon, Jan 31, 2022 at 04:28:11PM +0900, Kyotaro Horiguchi wrote:

+static void
+WalRcvComputeNextWakeup(WalRcvInfo *state,
+						WalRcvWakeupReason reason,
+						TimestampTz now)
+{
+	switch (reason)
+	{
...
+	default:
+		break;
+	}
+}

Mmm.. there's NUM_WALRCV_WAKEUPS.. But I think we don't need to allow
the case. Isn't it better to replace the break with Assert()?

+1

It might be suitable for another patch, but we can do that
together. If we passed "now" to the function XLogWalRcvProcessMsg, we
would be able to remove further several calls to
GetCurrentTimestamp(), in the function itself and
ProcessWalSndrMessage (the name is improperly abbreviated..).

While reading the patch, I did find myself wondering if there were some
additional opportunities for reducing the calls to GetCurrentTimestamp().
That was really the only thing that stood out to me.

Thomas, are you planning to continue with this patch? If not, I don't mind
picking it up.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#5)
Re: Suppressing useless wakeups in walreceiver

On Fri, Sep 30, 2022 at 4:51 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

Thomas, are you planning to continue with this patch? If not, I don't mind
picking it up.

Hi Nathan,

Please go ahead! One thing I had in my notes to check with this patch
is whether there might be a bug due to computing all times in
microseconds, but sleeping for a number of milliseconds without
rounding up (what I mean is that if it's trying to sleep for 900
microseconds, it might finish up sleeping for 0 milliseconds in a
tight loop a few times).

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#6)
Re: Suppressing useless wakeups in walreceiver

On Fri, Sep 30, 2022 at 05:04:43PM +1300, Thomas Munro wrote:

Please go ahead! One thing I had in my notes to check with this patch
is whether there might be a bug due to computing all times in
microseconds, but sleeping for a number of milliseconds without
rounding up (what I mean is that if it's trying to sleep for 900
microseconds, it might finish up sleeping for 0 milliseconds in a
tight loop a few times).

I think that's right. If 'next_wakeup - now' is less than 1000, 'nap' will
be 0. My first instinct is to simply round to the nearest millisecond, but
this might still result in wakeups before the calculated wakeup time, and I
think we ultimately want to nap until >= the wakeup time whenever possible
to avoid tight loops. So, I agree that it would be better to round up to
the next millisecond whenever nextWakeup > now. Given the current behavior
is to wake up every 100 milliseconds, an extra millisecond here and there
doesn't seem like it should cause any problems. And I suppose we should
assume we might nap longer already, anyway.

I do see that the wakeup time for WALRCV_WAKEUP_PING might be set to 'now'
whenever wal_receiver_timeout is 1 millisecond, but I think that's existing
behavior, and it feels like an extreme case that we probably needn't worry
about too much.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#7)
Re: Suppressing useless wakeups in walreceiver

Here is an updated patch set with the following changes:

* The creation of the struct for non-shared WAL receiver state is moved to
a prerequisite 0001 patch. This should help ease review of 0002 a bit.

* I updated the nap time calculation to round up to the next millisecond,
as discussed upthread.

* I attempted to minimize the calls to GetCurrentTimestamp(). The WAL
receiver code already calls this function pretty liberally, so I don't know
if this is important, but perhaps it could make a difference for systems
that don't have something like the vDSO to avoid real system calls.

* I removed the 'tli' argument from functions that now have an argument for
the non-shared state struct. The 'tli' is stored within the struct, so the
extra argument is unnecessary.

One thing that still bugs me a little bit about 0002 is that the calls to
GetCurrentTimestamp() feel a bit scattered and more difficult to reason
about. AFAICT 0002 keeps 'now' relatively up-to-date, but it seems
possible that a future change could easily disrupt that. I don't have any
other ideas at the moment, though.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Move-WAL-receivers-non-shared-state-to-a-new-stru.patchtext/x-diff; charset=us-asciiDownload+26-21
v3-0002-Suppress-useless-wakeups-in-walreceiver.patchtext/x-diff; charset=us-asciiDownload+148-102
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#8)
Re: Suppressing useless wakeups in walreceiver

On 2022-Oct-04, Nathan Bossart wrote:

Here is an updated patch set with the following changes:

* The creation of the struct for non-shared WAL receiver state is moved to
a prerequisite 0001 patch. This should help ease review of 0002 a bit.

I think that would be even better if you moved the API adjustments of
some functions for the new struct to 0001 as well
(XLogWalRcvSendHSFeedback etc).

* I updated the nap time calculation to round up to the next millisecond,
as discussed upthread.

I didn't look at this part very carefully, but IIRC walreceiver's
responsivity has a direct impact on latency for sync replicas. Maybe
it'd be sensible to the definition that the sleep time is rounded down
rather than up? (At least, for the cases where we have something to do
and not merely continue sleeping.)

* I attempted to minimize the calls to GetCurrentTimestamp(). The WAL
receiver code already calls this function pretty liberally, so I don't know
if this is important, but perhaps it could make a difference for systems
that don't have something like the vDSO to avoid real system calls.

Yeah, we are indeed careful about this in most places. Maybe it's not
particularly important in 2022 and also not particularly important for
walreceiver (again, except in the code paths that affect sync replicas),
but there's no reason not to continue to be careful until we discover
more widely that it doesn't matter.

* I removed the 'tli' argument from functions that now have an argument for
the non-shared state struct. The 'tli' is stored within the struct, so the
extra argument is unnecessary.

+1 (This could also be in 0001, naturally.)

One thing that still bugs me a little bit about 0002 is that the calls to
GetCurrentTimestamp() feel a bit scattered and more difficult to reason
about. AFAICT 0002 keeps 'now' relatively up-to-date, but it seems
possible that a future change could easily disrupt that. I don't have any
other ideas at the moment, though.

Hmm.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#10Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Suppressing useless wakeups in walreceiver

Thanks for taking a look.

On Wed, Oct 05, 2022 at 09:57:00AM +0200, Alvaro Herrera wrote:

On 2022-Oct-04, Nathan Bossart wrote:

* The creation of the struct for non-shared WAL receiver state is moved to
a prerequisite 0001 patch. This should help ease review of 0002 a bit.

I think that would be even better if you moved the API adjustments of
some functions for the new struct to 0001 as well
(XLogWalRcvSendHSFeedback etc).

I moved as much as I could to 0001 in v4.

* I updated the nap time calculation to round up to the next millisecond,
as discussed upthread.

I didn't look at this part very carefully, but IIRC walreceiver's
responsivity has a direct impact on latency for sync replicas. Maybe
it'd be sensible to the definition that the sleep time is rounded down
rather than up? (At least, for the cases where we have something to do
and not merely continue sleeping.)

The concern is that if we wake up early, we'll end up spinning until the
wake-up time is reached. Given the current behavior is to sleep for 100ms
at a time, and the tasks in question are governed by wal_receiver_timeout
and wal_receiver_status_interval (which are typically set to several
seconds) it seems reasonably safe to sleep up to an extra ~1ms here and
there without sacrificing responsiveness. In fact, I imagine this patch
results in a net improvement in responsiveness for these tasks.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-Move-WAL-receivers-non-shared-state-to-a-new-stru.patchtext/x-diff; charset=us-asciiDownload+48-43
v4-0002-Suppress-useless-wakeups-in-walreceiver.patchtext/x-diff; charset=us-asciiDownload+125-79
#11Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#10)
Re: Suppressing useless wakeups in walreceiver

On Wed, Oct 5, 2022 at 11:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I moved as much as I could to 0001 in v4.

Some comments on v4-0002:

1. You might want to use PG_INT64_MAX instead of INT64_MAX for portability?

2. With the below change, the time walreceiver spends in
XLogWalRcvSendReply() is also included for XLogWalRcvSendHSFeedback
right? I think it's a problem given that XLogWalRcvSendReply() can
take a while. Earlier, this wasn't the case, each function calculating
'now' separately. Any reason for changing this behaviour? I know that
GetCurrentTimestamp(); isn't cheaper all the time, but here it might
result in a change in the behaviour.
-            XLogWalRcvSendReply(false, false);
-            XLogWalRcvSendHSFeedback(false);
+            TimestampTz now = GetCurrentTimestamp();
+
+            XLogWalRcvSendReply(state, now, false, false);
+            XLogWalRcvSendHSFeedback(state, now, false);
3. I understand that TimestampTz type is treated as microseconds.
Would you mind having a comment in below places to say why we're
multiplying with 1000 or 1000000 here? Also, do we need 1000L or
1000000L or type cast to
TimestampTz?
+            state->wakeup[reason] = now + (wal_receiver_timeout / 2) * 1000;
+            state->wakeup[reason] = now + wal_receiver_timeout * 1000;
+            state->wakeup[reason] = now +
wal_receiver_status_interval * 1000000;

4. How about simplifying WalRcvComputeNextWakeup() something like below?
static void
WalRcvComputeNextWakeup(WalRcvInfo *state, WalRcvWakeupReason reason,
TimestampTz now)
{
TimestampTz ts = INT64_MAX;

switch (reason)
{
case WALRCV_WAKEUP_TERMINATE:
if (wal_receiver_timeout > 0)
ts = now + (TimestampTz) (wal_receiver_timeout * 1000L);
break;
case WALRCV_WAKEUP_PING:
if (wal_receiver_timeout > 0)
ts = now + (TimestampTz) ((wal_receiver_timeout / 2) * 1000L);
break;
case WALRCV_WAKEUP_HSFEEDBACK:
if (hot_standby_feedback && wal_receiver_status_interval > 0)
ts = now + (TimestampTz) (wal_receiver_status_interval * 1000000L);
break;
case WALRCV_WAKEUP_REPLY:
if (wal_receiver_status_interval > 0)
ts = now + (TimestampTz) (wal_receiver_status_interval * 1000000);
break;
default:
/* An error is better here. */
}

state->wakeup[reason] = ts;
}

5. Can we move below code snippets to respective static functions for
better readability and code reuse?
This:
+                /* Find the soonest wakeup time, to limit our nap. */
+                nextWakeup = INT64_MAX;
+                for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                    nextWakeup = Min(state.wakeup[i], nextWakeup);
+                nap = Max(0, (nextWakeup - now + 999) / 1000);

And this:

+                    now = GetCurrentTimestamp();
+                    for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                        WalRcvComputeNextWakeup(&state, i, now);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#11)
Re: Suppressing useless wakeups in walreceiver

On Mon, Oct 10, 2022 at 03:22:15PM +0530, Bharath Rupireddy wrote:

Some comments on v4-0002:

Thanks for taking a look.

1. You might want to use PG_INT64_MAX instead of INT64_MAX for portability?

Yes, I used PG_INT64_MAX in v5.

2. With the below change, the time walreceiver spends in
XLogWalRcvSendReply() is also included for XLogWalRcvSendHSFeedback
right? I think it's a problem given that XLogWalRcvSendReply() can
take a while. Earlier, this wasn't the case, each function calculating
'now' separately. Any reason for changing this behaviour? I know that
GetCurrentTimestamp(); isn't cheaper all the time, but here it might
result in a change in the behaviour.

Yes, if XLogWalRcvSendReply() takes a long time, we might defer sending the
hot_standby_feedback message until a later time. The only reason I changed
this was to avoid extra calls to GetCurrentTimestamp(), which might be
expensive on some platforms. Outside of the code snippet you pointed out,
I think WalReceiverMain() has a similar problem. That being said, I'm
generally skeptical that this sort of thing is detrimental given the
current behavior (i.e., wake up every 100ms), the usual values of these
GUCs (i.e., tens of seconds), and the fact that any tasks that are
inappropriately skipped will typically be retried in the next iteration of
the loop.

3. I understand that TimestampTz type is treated as microseconds.
Would you mind having a comment in below places to say why we're
multiplying with 1000 or 1000000 here? Also, do we need 1000L or
1000000L or type cast to
TimestampTz?

I used INT64CONST() in v5, as that seemed the most portable, but I stopped
short of adding comments to explain all the conversions. IMO the
conversions are relatively obvious, and the units of the GUCs can be easily
seen in guc_tables.c.

4. How about simplifying WalRcvComputeNextWakeup() something like below?

Other than saving a couple lines of code, IMO this doesn't meaningfully
simplify the function or improve readability.

5. Can we move below code snippets to respective static functions for
better readability and code reuse?
This:
+                /* Find the soonest wakeup time, to limit our nap. */
+                nextWakeup = INT64_MAX;
+                for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                    nextWakeup = Min(state.wakeup[i], nextWakeup);
+                nap = Max(0, (nextWakeup - now + 999) / 1000);

And this:

+                    now = GetCurrentTimestamp();
+                    for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                        WalRcvComputeNextWakeup(&state, i, now);

This did cross my mind, but I opted to avoid creating new functions because
1) they aren't likely to be reused very much, and 2) I actually think it
might hurt readability by forcing developers to traipse around the file to
figure out what these functions are actually doing. It's not like it would
save many lines of code, either.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-Move-WAL-receivers-non-shared-state-to-a-new-stru.patchtext/x-diff; charset=us-asciiDownload+48-43
v5-0002-Suppress-useless-wakeups-in-walreceiver.patchtext/x-diff; charset=us-asciiDownload+125-79
#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#12)
Re: Suppressing useless wakeups in walreceiver

On Mon, Oct 10, 2022 at 10:51:14AM -0700, Nathan Bossart wrote:

+                /* Find the soonest wakeup time, to limit our nap. */
+                nextWakeup = INT64_MAX;
+                for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                    nextWakeup = Min(state.wakeup[i], nextWakeup);
+                nap = Max(0, (nextWakeup - now + 999) / 1000);

Hm. We should probably be more cautious here since nextWakeup is an int64
and nap is an int. My guess is that we should just set nap to -1 if
nextWakeup > INT_MAX.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#13)
Re: Suppressing useless wakeups in walreceiver

On Mon, Oct 10, 2022 at 11:10:14AM -0700, Nathan Bossart wrote:

On Mon, Oct 10, 2022 at 10:51:14AM -0700, Nathan Bossart wrote:

+                /* Find the soonest wakeup time, to limit our nap. */
+                nextWakeup = INT64_MAX;
+                for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                    nextWakeup = Min(state.wakeup[i], nextWakeup);
+                nap = Max(0, (nextWakeup - now + 999) / 1000);

Hm. We should probably be more cautious here since nextWakeup is an int64
and nap is an int. My guess is that we should just set nap to -1 if
nextWakeup > INT_MAX.

Here's an attempt at fixing that. I ended up just changing "nap" to an
int64 and then ensuring it's no larger than INT_MAX in the call to
WaitLatchOrSocket(). IIUC we can't use -1 here because WL_TIMEOUT is set.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Move-WAL-receivers-non-shared-state-to-a-new-stru.patchtext/x-diff; charset=us-asciiDownload+48-43
v6-0002-Suppress-useless-wakeups-in-walreceiver.patchtext/x-diff; charset=us-asciiDownload+125-79
#15Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#12)
Re: Suppressing useless wakeups in walreceiver

On Mon, Oct 10, 2022 at 11:21 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

Thanks for the updated patches.

2. With the below change, the time walreceiver spends in
XLogWalRcvSendReply() is also included for XLogWalRcvSendHSFeedback
right? I think it's a problem given that XLogWalRcvSendReply() can
take a while. Earlier, this wasn't the case, each function calculating
'now' separately. Any reason for changing this behaviour? I know that
GetCurrentTimestamp(); isn't cheaper all the time, but here it might
result in a change in the behaviour.

Yes, if XLogWalRcvSendReply() takes a long time, we might defer sending the
hot_standby_feedback message until a later time. The only reason I changed
this was to avoid extra calls to GetCurrentTimestamp(), which might be
expensive on some platforms.

Agree. However...

Outside of the code snippet you pointed out,
I think WalReceiverMain() has a similar problem. That being said, I'm
generally skeptical that this sort of thing is detrimental given the
current behavior (i.e., wake up every 100ms), the usual values of these
GUCs (i.e., tens of seconds), and the fact that any tasks that are
inappropriately skipped will typically be retried in the next iteration of
the loop.

AFICS, the aim of the patch isn't optimizing around
GetCurrentTimestamp() calls and the patch does seem to change quite a
bit of them which may cause a change of the behaviour. I think that
the GetCurrentTimestamp() optimizations can go to 0003 patch in this
thread itself or we can discuss it as a separate thread to seek more
thoughts.

The 'now' in many instances in the patch may not actually represent
the true current time and it includes time taken by other operations
as well. I think this will have problems.

+            XLogWalRcvSendReply(state, now, false, false);
+            XLogWalRcvSendHSFeedback(state, now, false);
+                    now = GetCurrentTimestamp();
+                    for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                        WalRcvComputeNextWakeup(&state, i, now);
+                    XLogWalRcvSendHSFeedback(&state, now, true);
+                    now = GetCurrentTimestamp();
+                    for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                        WalRcvComputeNextWakeup(&state, i, now);
+                    XLogWalRcvSendHSFeedback(&state, now, true);

4. How about simplifying WalRcvComputeNextWakeup() something like below?

Other than saving a couple lines of code, IMO this doesn't meaningfully
simplify the function or improve readability.

IMO, the duplicate lines of code aren't necessary in
WalRcvComputeNextWakeup() and that function's code isn't that complex
by removing them.

5. Can we move below code snippets to respective static functions for
better readability and code reuse?
This:
+                /* Find the soonest wakeup time, to limit our nap. */
+                nextWakeup = INT64_MAX;
+                for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                    nextWakeup = Min(state.wakeup[i], nextWakeup);
+                nap = Max(0, (nextWakeup - now + 999) / 1000);

And this:

+                    now = GetCurrentTimestamp();
+                    for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
+                        WalRcvComputeNextWakeup(&state, i, now);

This did cross my mind, but I opted to avoid creating new functions because
1) they aren't likely to be reused very much, and 2) I actually think it
might hurt readability by forcing developers to traipse around the file to
figure out what these functions are actually doing. It's not like it would
save many lines of code, either.

The second snippet is being used twice already. IMO having static
functions (perhaps inline) is much better here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#15)
Re: Suppressing useless wakeups in walreceiver

On Tue, Oct 11, 2022 at 07:01:26AM +0530, Bharath Rupireddy wrote:

On Mon, Oct 10, 2022 at 11:21 PM Nathan Bossart

Outside of the code snippet you pointed out,
I think WalReceiverMain() has a similar problem. That being said, I'm
generally skeptical that this sort of thing is detrimental given the
current behavior (i.e., wake up every 100ms), the usual values of these
GUCs (i.e., tens of seconds), and the fact that any tasks that are
inappropriately skipped will typically be retried in the next iteration of
the loop.

AFICS, the aim of the patch isn't optimizing around
GetCurrentTimestamp() calls and the patch does seem to change quite a
bit of them which may cause a change of the behaviour. I think that
the GetCurrentTimestamp() optimizations can go to 0003 patch in this
thread itself or we can discuss it as a separate thread to seek more
thoughts.

The 'now' in many instances in the patch may not actually represent
the true current time and it includes time taken by other operations
as well. I think this will have problems.

What problems do you think this will cause?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#17Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#16)
Re: Suppressing useless wakeups in walreceiver

On Tue, Oct 11, 2022 at 8:20 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

AFICS, the aim of the patch isn't optimizing around
GetCurrentTimestamp() calls and the patch does seem to change quite a
bit of them which may cause a change of the behaviour. I think that
the GetCurrentTimestamp() optimizations can go to 0003 patch in this
thread itself or we can discuss it as a separate thread to seek more
thoughts.

The 'now' in many instances in the patch may not actually represent
the true current time and it includes time taken by other operations
as well. I think this will have problems.

What problems do you think this will cause?

now = t1;
XLogWalRcvSendReply() /* say it ran for a minute or so for whatever reasons */
XLogWalRcvSendHSFeedback() /* with patch walrecevier sends hot standby
feedback more often without properly honouring
wal_receiver_status_interval because the 'now' isn't actually the
current time as far as that function is concerned, it is
t1 + XLogWalRcvSendReply()'s time. */

Well, is this really a problem? I'm not sure about that. Let's hear from others.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#18Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#17)
Re: Suppressing useless wakeups in walreceiver

On Tue, Oct 11, 2022 at 09:34:25AM +0530, Bharath Rupireddy wrote:

now = t1;
XLogWalRcvSendReply() /* say it ran for a minute or so for whatever reasons */
XLogWalRcvSendHSFeedback() /* with patch walrecevier sends hot standby
feedback more often without properly honouring
wal_receiver_status_interval because the 'now' isn't actually the
current time as far as that function is concerned, it is
t1 + XLogWalRcvSendReply()'s time. */

Well, is this really a problem? I'm not sure about that. Let's hear from others.

For this example, the feedback message would just be sent in the next loop
iteration instead.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#19Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#18)
Re: Suppressing useless wakeups in walreceiver

On Tue, Oct 11, 2022 at 11:22 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

On Tue, Oct 11, 2022 at 09:34:25AM +0530, Bharath Rupireddy wrote:

now = t1;
XLogWalRcvSendReply() /* say it ran for a minute or so for whatever reasons */
XLogWalRcvSendHSFeedback() /* with patch walrecevier sends hot standby
feedback more often without properly honouring
wal_receiver_status_interval because the 'now' isn't actually the
current time as far as that function is concerned, it is
t1 + XLogWalRcvSendReply()'s time. */

Well, is this really a problem? I'm not sure about that. Let's hear from others.

For this example, the feedback message would just be sent in the next loop
iteration instead.

I think the hot standby feedback message gets sent too frequently
without honouring the wal_receiver_status_interval because the 'now'
is actually not the current time with your patch but 'now +
XLogWalRcvSendReply()'s time'. However, it's possible that I may be
wrong here.

/*
* Send feedback at most once per wal_receiver_status_interval.
*/
if (!TimestampDifferenceExceeds(sendTime, now,
wal_receiver_status_interval * 1000))
return;

As said upthread [1]/messages/by-id/CALj2ACV5q2dbVRKwu2PL2s_YY0owZFTRxLdX=t+dZ1iag15khA@mail.gmail.com, I think the best way to move forward is to
separate the GetCurrentTimestamp() calls optimizations into 0003.

Do you have any further thoughts on review comments posted upthread [1]/messages/by-id/CALj2ACV5q2dbVRKwu2PL2s_YY0owZFTRxLdX=t+dZ1iag15khA@mail.gmail.com?

[1]: /messages/by-id/CALj2ACV5q2dbVRKwu2PL2s_YY0owZFTRxLdX=t+dZ1iag15khA@mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Nathan Bossart (#14)
Re: Suppressing useless wakeups in walreceiver

I think in 0001 we should put more stuff in the state struct --
specifically these globals:

static int recvFile = -1;
static TimeLineID recvFileTLI = 0;
static XLogSegNo recvSegNo = 0;

The main reason is that it seems odd to have startpointTLI in the struct
used in some places together with a file-global recvFileTLI which isn't.
The way one is passed as argument and the other as part of a struct
seemed too distracting. This should reduce the number of moving parts,
ISTM.

One thing that confused me for a moment is that we have some state in
walrcv and some more state in 'state'. The difference is pretty obvious
once you look at the other, but it suggest to me that a better variable
name for the latter is 'localstate' to more obviously distinguish them.

I was tempted to suggest that LogstreamResult would also be good to have
in the new struct, but that might be going a bit too far for a first
cut.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#21Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#19)
#22Nathan Bossart
nathandbossart@gmail.com
In reply to: Alvaro Herrera (#20)
#23Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#22)
#24Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#23)
#25Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#23)
#26Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#24)
#27Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#25)
#28Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#26)
#29Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#28)
#30Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#29)
#31Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Thomas Munro (#30)
#32Thomas Munro
thomas.munro@gmail.com
In reply to: Bharath Rupireddy (#31)
#33Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#30)
#34Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#32)
#35Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#30)
#37Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#36)
#38Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#36)
#39Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#38)
#40Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#39)
#41Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#40)
#42Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#41)
#43Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#42)
#44Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#43)
#45Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#44)
#46Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#45)
#47Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#46)
#48Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#44)
#49Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#48)
#50Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#49)
#51Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#50)