suppressing useless wakeups in logical/worker.c

Started by Nathan Bossartover 3 years ago25 messageshackers

nathandbossart@gmail.com

over 3 years ago

Hi hackers,

I've attached an attempt at porting a similar change to 05a7be9 [0]/messages/by-id/CA+hUKGJGhX4r2LPUE3Oy9BX71Eum6PBcS8L3sJpScR9oKaTVaA@mail.gmail.com to
logical/worker.c. The bulk of the patch is lifted from the walreceiver
patch, but I did need to add a hack for waking up after
wal_retrieve_retry_interval to start sync workers. This hack involves a
new wakeup variable that process_syncing_tables_for_apply() sets.

For best results, this patch should be applied on top of [1]/messages/by-id/20221122004119.GA132961@nathanxps13, which is an
attempt at fixing all the stuff that only runs within a reasonable
timeframe because logical worker processes currently wake up at least once
a second. With the attached patch applied, those periodic wakeups are
gone, so we need to make sure we wake up the logical workers as needed.

[0]: /messages/by-id/CA+hUKGJGhX4r2LPUE3Oy9BX71Eum6PBcS8L3sJpScR9oKaTVaA@mail.gmail.com
[1]: /messages/by-id/20221122004119.GA132961@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

over 3 years ago

In reply to: Nathan Bossart (#1)

RE: suppressing useless wakeups in logical/worker.c

Dear Nathan,

Thank you for making the patch! I tested your patch, and it basically worked well.
About following part:

```
			ConfigReloadPending = false;
 			ProcessConfigFile(PGC_SIGHUP);
+			now = GetCurrentTimestamp();
+			for (int i = 0; i < NUM_LRW_WAKEUPS; i++)
+				LogRepWorkerComputeNextWakeup(i, now);
+
+			/*
+			 * If a wakeup time for starting sync workers was set, just set it
+			 * to right now.  It will be recalculated as needed.
+			 */
+			if (next_sync_start != PG_INT64_MAX)
+				next_sync_start = now;
 		}
```

Do we have to recalculate the NextWakeup when subscriber receives SIGHUP signal?
I think this may cause the unexpected change like following.

Assuming that wal_receiver_timeout is 60s, and wal_sender_timeout on publisher is
0s (or the network between nodes is disconnected).
And we send SIGHUP signal per 20s to subscriber's postmaster.

Currently the last_recv_time is calcurated when the worker accepts messages,
and the value is used for deciding to send a ping. The worker will exit if the
walsender does not reply.

But in your patch, the apply worker calcurates wakeup[LRW_WAKEUP_PING] and
wakeup[LRW_WAKEUP_TERMINATE] again when it gets SIGHUP, so the worker never sends
ping with requestReply = true, and never exits due to the timeout.

My case seems to be crazy, but there may be another issues if it remains.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Hayato Kuroda (Fujitsu) (#2)

Re: suppressing useless wakeups in logical/worker.c

On Mon, Dec 05, 2022 at 01:00:19PM +0000, Hayato Kuroda (Fujitsu) wrote:

But in your patch, the apply worker calcurates wakeup[LRW_WAKEUP_PING] and
wakeup[LRW_WAKEUP_TERMINATE] again when it gets SIGHUP, so the worker never sends
ping with requestReply = true, and never exits due to the timeout.

This is the case for the walreceiver patch, too. If a SIGHUP arrives just
before we are due to ping the server, the ping wakeup time will be pushed
back. To me, this seems unlikely to cause any issues in practice unless
the server is being constantly SIGHUP'd. If we wanted to fix this, we'd
probably need to recompute the wakeup times using the values currently set.
I haven't looked into this too closely, but it doesn't sound tremendously
difficult. Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#3)

Re: suppressing useless wakeups in logical/worker.c

rebased for cfbot

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Nathan Bossart (#4)

Re: suppressing useless wakeups in logical/worker.c

Nathan Bossart <nathandbossart@gmail.com> writes:

[ v2-0001-suppress-unnecessary-wakeups-in-logical-worker.c.patch ]

I took a look through this, and have a number of mostly-cosmetic
issues:

* It seems wrong that next_sync_start isn't handled as one of the
wakeup[NUM_LRW_WAKEUPS] entries. I see that it needs to be accessed from
another module; but you could handle that without exposing either enum
LogRepWorkerWakeupReason or the array, by providing getter/setter
functions for process_syncing_tables_for_apply() to call.

* This code is far too intimately familiar with the fact that TimestampTz
is an int64 count of microseconds. (I'm picky about that because I
remember that they were once something else, so I wonder if someday
they will be different again.) You could get rid of the PG_INT64_MAX
usages by replacing those with the timestamp infinity macro DT_NOEND;
and I'd even be on board with adding a less-opaque alternate name for
that to datatype/timestamp.h. The various magic-constant multipliers
could perhaps be made less magic by using TimestampTzPlusMilliseconds().

* I think it might be better to construct the enum like this:

+typedef enum LogRepWorkerWakeupReason
+{
+	LRW_WAKEUP_TERMINATE,
+	LRW_WAKEUP_PING,
+	LRW_WAKEUP_STATUS
+#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1)
+} LogRepWorkerWakeupReason;

so that you don't have to have a default: case in switches on the
enum value. I'm more worried about somebody adding an enum value
and missing updating a switch statement elsewhere than I am about
somebody adding an enum value and neglecting to update the
immediately-adjacent macro.

* The updates of "now" in LogicalRepApplyLoop seem rather
randomly placed, and I'm not entirely convinced that we'll
always be using a reasonably up-to-date value. Can't we
just update it right before each usage?

* This special handling of next_sync_start seems mighty ugly:

+            /* Also consider special wakeup time for starting sync workers. */
+            if (next_sync_start < now)
+            {
+                /*
+                 * Instead of spinning while we wait for the sync worker to
+                 * start, wait a bit before retrying (unless there's an earlier
+                 * wakeup time).
+                 */
+                nextWakeup = Min(now + INT64CONST(100000), nextWakeup);
+            }
+            else
+                nextWakeup = Min(next_sync_start, nextWakeup);

Do we really need the slop? If so, is there a reason why it
shouldn't apply to all possible sources of nextWakeup? (It's
going to be hard to fold next_sync_start into the wakeup[]
array unless you can make this not a special case.)

* It'd probably be worth enlarging the comment for
LogRepWorkerComputeNextWakeup to explain why its API is like that,
perhaps "We ask the caller to pass in the value of "now" because
this frequently avoids multiple calls of GetCurrentTimestamp().
It had better be a reasonably-up-to-date value, though."

regards, tom lane

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Tom Lane (#5)

Re: suppressing useless wakeups in logical/worker.c

On Tue, Jan 24, 2023 at 06:45:08PM -0500, Tom Lane wrote:

I took a look through this, and have a number of mostly-cosmetic
issues:

Thanks for the detailed review.

* It seems wrong that next_sync_start isn't handled as one of the
wakeup[NUM_LRW_WAKEUPS] entries. I see that it needs to be accessed from
another module; but you could handle that without exposing either enum
LogRepWorkerWakeupReason or the array, by providing getter/setter
functions for process_syncing_tables_for_apply() to call.

* This code is far too intimately familiar with the fact that TimestampTz
is an int64 count of microseconds. (I'm picky about that because I
remember that they were once something else, so I wonder if someday
they will be different again.) You could get rid of the PG_INT64_MAX
usages by replacing those with the timestamp infinity macro DT_NOEND;
and I'd even be on board with adding a less-opaque alternate name for
that to datatype/timestamp.h. The various magic-constant multipliers
could perhaps be made less magic by using TimestampTzPlusMilliseconds().

* I think it might be better to construct the enum like this:
+typedef enum LogRepWorkerWakeupReason
+{
+	LRW_WAKEUP_TERMINATE,
+	LRW_WAKEUP_PING,
+	LRW_WAKEUP_STATUS
+#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1)
+} LogRepWorkerWakeupReason;
so that you don't have to have a default: case in switches on the
enum value. I'm more worried about somebody adding an enum value
and missing updating a switch statement elsewhere than I am about
somebody adding an enum value and neglecting to update the
immediately-adjacent macro.

I did all of this in v3.

* The updates of "now" in LogicalRepApplyLoop seem rather
randomly placed, and I'm not entirely convinced that we'll
always be using a reasonably up-to-date value. Can't we
just update it right before each usage?

This came up for walreceiver.c, too. The concern is that
GetCurrentTimestamp() might be rather expensive on systems without
something like the vDSO. I don't know how common that is. If we can rule
that out, then I agree that we should just update it right before each use.

* This special handling of next_sync_start seems mighty ugly:
+            /* Also consider special wakeup time for starting sync workers. */
+            if (next_sync_start < now)
+            {
+                /*
+                 * Instead of spinning while we wait for the sync worker to
+                 * start, wait a bit before retrying (unless there's an earlier
+                 * wakeup time).
+                 */
+                nextWakeup = Min(now + INT64CONST(100000), nextWakeup);
+            }
+            else
+                nextWakeup = Min(next_sync_start, nextWakeup);
Do we really need the slop? If so, is there a reason why it
shouldn't apply to all possible sources of nextWakeup? (It's
going to be hard to fold next_sync_start into the wakeup[]
array unless you can make this not a special case.)

I'm not positive it is absolutely necessary. AFAICT the function that
updates this particular wakeup time is conditionally called, so it at least
seems theoretically possible that we could end up spinning in a tight loop
until we attempt to start a new tablesync worker. But perhaps this is
unlikely enough that we needn't worry about it.

I noticed that this wakeup time wasn't being updated when the number of
active tablesync workers is >= max_sync_workers_per_subscription. In v3, I
tried to handle this by setting the wakeup time to a second later for this
case. I think you could ordinarily depend on the tablesync worker's
notify_pid to wake up the apply worker, but that wouldn't work if the apply
worker has restarted.

Ultimately, this particular wakeup time seems to be a special case, and I
probably need to think about it some more. If you have ideas, I'm all
ears.

* It'd probably be worth enlarging the comment for
LogRepWorkerComputeNextWakeup to explain why its API is like that,
perhaps "We ask the caller to pass in the value of "now" because
this frequently avoids multiple calls of GetCurrentTimestamp().
It had better be a reasonably-up-to-date value, though."

I did this in v3. I noticed that many of your comments also applied to the
similar patch that was recently applied to walreceiver.c, so I created
another patch to fix that up.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#6)

Re: suppressing useless wakeups in logical/worker.c

On Thu, Jan 26, 2023 at 12:50 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

I did this in v3. I noticed that many of your comments also applied to the
similar patch that was recently applied to walreceiver.c, so I created
another patch to fix that up.

Can we also use TimestampDifferenceMilliseconds()? It knows about
rounding up for WaitLatch().

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Thomas Munro (#7)

Re: suppressing useless wakeups in logical/worker.c

On Thu, Jan 26, 2023 at 01:23:41PM +1300, Thomas Munro wrote:

Can we also use TimestampDifferenceMilliseconds()? It knows about
rounding up for WaitLatch().

I think we might risk overflowing "long" when all the wakeup times are
DT_NOEND:

* This is typically used to calculate a wait timeout for WaitLatch()
* or a related function. The choice of "long" as the result type
* is to harmonize with that. It is caller's responsibility that the
* input timestamps not be so far apart as to risk overflow of "long"
* (which'd happen at about 25 days on machines with 32-bit "long").

Maybe we can adjust that function or create a new one to deal with this.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Nathan Bossart (#8)

Re: suppressing useless wakeups in logical/worker.c

Nathan Bossart <nathandbossart@gmail.com> writes:

I think we might risk overflowing "long" when all the wakeup times are
DT_NOEND:

* This is typically used to calculate a wait timeout for WaitLatch()
* or a related function. The choice of "long" as the result type
* is to harmonize with that. It is caller's responsibility that the
* input timestamps not be so far apart as to risk overflow of "long"
* (which'd happen at about 25 days on machines with 32-bit "long").

Maybe we can adjust that function or create a new one to deal with this.

It'd probably be reasonable to file down that sharp edge by instead
specifying that TimestampDifferenceMilliseconds will clamp overflowing
differences to LONG_MAX. Maybe there should be a clamp on the underflow
side too ... but should it be to LONG_MIN or to zero?

BTW, as long as we're discussing roundoff gotchas, I noticed while
testing your previous patch that there's some inconsistency between
TimestampDifferenceExceeds and TimestampDifferenceMilliseconds.
What you submitted at [1]/messages/by-id/20230110174345.GA1292607@nathanxps13 did this:

+            if (TimestampDifferenceExceeds(last_start, now,
+                                           wal_retrieve_retry_interval))
+                ...
+            else
+            {
+                long        elapsed;
+
+                elapsed = TimestampDifferenceMilliseconds(last_start, now);
+                wait_time = Min(wait_time, wal_retrieve_retry_interval - elapsed);
+            }

and I discovered that that could sometimes busy-wait by repeatedly
falling through to the "else", but then calculating elapsed ==
wal_retrieve_retry_interval and hence setting wait_time to zero.
I fixed it in the committed version [2]https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=5a3a95385 by always computing "elapsed"
and then checking if that's strictly less than
wal_retrieve_retry_interval, but I bet there's existing code with the
same issue. I think we need to take a closer look at making
TimestampDifferenceMilliseconds' roundoff behavior match the outcome of
TimestampDifferenceExceeds comparisons.

regards, tom lane

[1]: /messages/by-id/20230110174345.GA1292607@nathanxps13
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=5a3a95385

#10

Thomas Munro

thomas.munro@gmail.com

over 3 years ago

In reply to: Tom Lane (#9)

Re: suppressing useless wakeups in logical/worker.c

On Thu, Jan 26, 2023 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

I think we might risk overflowing "long" when all the wakeup times are
DT_NOEND:

* This is typically used to calculate a wait timeout for WaitLatch()
* or a related function. The choice of "long" as the result type
* is to harmonize with that. It is caller's responsibility that the
* input timestamps not be so far apart as to risk overflow of "long"
* (which'd happen at about 25 days on machines with 32-bit "long").

Maybe we can adjust that function or create a new one to deal with this.

It'd probably be reasonable to file down that sharp edge by instead
specifying that TimestampDifferenceMilliseconds will clamp overflowing
differences to LONG_MAX. Maybe there should be a clamp on the underflow
side too ... but should it be to LONG_MIN or to zero?

That got me curious... Why did WaitLatch() use long in the first
place? I see that it was in Heikki's original sketch[1]/messages/by-id/4C72E85C.3000201@enterprisedb.com, but I can't
think of any implementation reason for it. Note that the current
implementation of WaitLatch() et al will reach WaitEventSetWait()'s
assertion that the timeout is <= INT_MAX, so a LONG_MAX clamp isn't
right without further clamping. Then internally,
WaitEventSetWaitBlock() takes an int, so there is an implicit cast to
int. If I had to guess I'd say the reasons for long in the API are
lost, and the WES rewrite used in "int" because that's what poll() and
epoll_wait() wanted.

[1]: /messages/by-id/4C72E85C.3000201@enterprisedb.com

#11

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Thomas Munro (#10)

Re: suppressing useless wakeups in logical/worker.c

Thomas Munro <thomas.munro@gmail.com> writes:

On Thu, Jan 26, 2023 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It'd probably be reasonable to file down that sharp edge by instead
specifying that TimestampDifferenceMilliseconds will clamp overflowing
differences to LONG_MAX. Maybe there should be a clamp on the underflow
side too ... but should it be to LONG_MIN or to zero?

That got me curious... Why did WaitLatch() use long in the first
place?

Good question. It's not a great choice, because of the inherent
platform specificity. OTOH, I'm not sure it's worth the pain
to change now.

regards, tom lane

#12

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Tom Lane (#11)

Re: suppressing useless wakeups in logical/worker.c

I wrote:

It'd probably be reasonable to file down that sharp edge by instead
specifying that TimestampDifferenceMilliseconds will clamp overflowing
differences to LONG_MAX. Maybe there should be a clamp on the underflow
side too ... but should it be to LONG_MIN or to zero?

After looking closer, I see that TimestampDifferenceMilliseconds
already explicitly states that its output is intended for WaitLatch
and friends, which makes it perfectly sane for it to clamp the result
to [0, INT_MAX] rather than depending on the caller to not pass
out-of-range values.

I checked existing callers, and found one place in basebackup_copy.c
that had not read the memo about TimestampDifferenceMilliseconds
never returning a negative value, and another in postmaster.c that
had not read the memo about Min() and Max() being macros. There
are none that are unhappy about clamping to INT_MAX, and at least
one that was already assuming it could just cast the result to int.

Hence, I propose the attached. I haven't gone as far as to change
the result type from long to int; that seems like a lot of code
churn (if we are to update WaitLatch and all callers to match)
and it won't really buy anything semantically.

regards, tom lane

#13

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Tom Lane (#12)

Re: suppressing useless wakeups in logical/worker.c

On Thu, Jan 26, 2023 at 01:54:08PM -0500, Tom Lane wrote:

After looking closer, I see that TimestampDifferenceMilliseconds
already explicitly states that its output is intended for WaitLatch
and friends, which makes it perfectly sane for it to clamp the result
to [0, INT_MAX] rather than depending on the caller to not pass
out-of-range values.

* This is typically used to calculate a wait timeout for WaitLatch()
* or a related function.  The choice of "long" as the result type
- * is to harmonize with that.  It is caller's responsibility that the
- * input timestamps not be so far apart as to risk overflow of "long"
- * (which'd happen at about 25 days on machines with 32-bit "long").
+ * is to harmonize with that; furthermore, we clamp the result to at most
+ * INT_MAX milliseconds, because that's all that WaitLatch() allows.
*
- * Both inputs must be ordinary finite timestamps (in current usage,
- * they'll be results from GetCurrentTimestamp()).
+ * At least one input must be an ordinary finite timestamp, else the "diff"
+ * calculation might overflow.  We do support stop_time == TIMESTAMP_INFINITY,
+ * which will result in INT_MAX wait time.

I wonder if we should explicitly reject negative timestamps to eliminate
any chance of int64 overflow, too. Alternatively, we could detect that the
operation will overflow and return either 0 or INT_MAX, but I assume
there's minimal use of this function with negative timestamps.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Nathan Bossart (#13)

Re: suppressing useless wakeups in logical/worker.c

Nathan Bossart <nathandbossart@gmail.com> writes:

On Thu, Jan 26, 2023 at 01:54:08PM -0500, Tom Lane wrote:

- * Both inputs must be ordinary finite timestamps (in current usage,
- * they'll be results from GetCurrentTimestamp()).
+ * At least one input must be an ordinary finite timestamp, else the "diff"
+ * calculation might overflow.  We do support stop_time == TIMESTAMP_INFINITY,
+ * which will result in INT_MAX wait time.

I wonder if we should explicitly reject negative timestamps to eliminate
any chance of int64 overflow, too.

Hmm. I'm disinclined to add an assumption that the epoch is in the past,
but I take your point that the subtraction would overflow with
TIMESTAMP_INFINITY and a negative finite timestamp. Maybe we should
make use of pg_sub_s64_overflow()?

BTW, I just noticed that the adjacent function TimestampDifference
has a lot of callers that would be much happier using
TimestampDifferenceMilliseconds.

regards, tom lane

#15

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Tom Lane (#14)

Re: suppressing useless wakeups in logical/worker.c

On Thu, Jan 26, 2023 at 03:04:30PM -0500, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

I wonder if we should explicitly reject negative timestamps to eliminate
any chance of int64 overflow, too.

Hmm. I'm disinclined to add an assumption that the epoch is in the past,
but I take your point that the subtraction would overflow with
TIMESTAMP_INFINITY and a negative finite timestamp. Maybe we should
make use of pg_sub_s64_overflow()?

That would be my vote. I think the 'diff <= 0' check might need to be
replaced with something like 'start_time > stop_time' so that we return 0
for the underflow case.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#16

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Nathan Bossart (#15)

Re: suppressing useless wakeups in logical/worker.c

Nathan Bossart <nathandbossart@gmail.com> writes:

On Thu, Jan 26, 2023 at 03:04:30PM -0500, Tom Lane wrote:

Hmm. I'm disinclined to add an assumption that the epoch is in the past,
but I take your point that the subtraction would overflow with
TIMESTAMP_INFINITY and a negative finite timestamp. Maybe we should
make use of pg_sub_s64_overflow()?

That would be my vote. I think the 'diff <= 0' check might need to be
replaced with something like 'start_time > stop_time' so that we return 0
for the underflow case.

Right, so more like this.

regards, tom lane

#17

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Tom Lane (#16)

Re: suppressing useless wakeups in logical/worker.c

On Thu, Jan 26, 2023 at 04:09:51PM -0500, Tom Lane wrote:

Right, so more like this.

LGTM

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#18

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Nathan Bossart (#17)

Re: suppressing useless wakeups in logical/worker.c

Nathan Bossart <nathandbossart@gmail.com> writes:

On Thu, Jan 26, 2023 at 04:09:51PM -0500, Tom Lane wrote:

Right, so more like this.

LGTM

Thanks, pushed.

Returning to the prior patch ... I don't much care for this:

+                    /* Maybe there will be a free slot in a second... */
+                    retry_time = TimestampTzPlusSeconds(now, 1);
+                    LogRepWorkerUpdateSyncStartWakeup(retry_time);

We're not moving the goalposts very far on unnecessary wakeups if
we have to do that. Do we need to get a wakeup on sync slot free?
Although having to send that to every worker seems ugly. Maybe this
is being done in the wrong place and we need to find a way to get
the launcher to handle it.

As for the business about process_syncing_tables being only called
conditionally, I was already of the opinion that the way it's
getting called is loony. Why isn't it called from LogicalRepApplyLoop
(and noplace else)? With more certainty about when it runs, we might
not need so many kluges elsewhere.

regards, tom lane

#19

Amit Kapila

amit.kapila16@gmail.com

over 3 years ago

In reply to: Tom Lane (#18)

Re: suppressing useless wakeups in logical/worker.c

On Fri, Jan 27, 2023 at 4:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thanks, pushed.

Returning to the prior patch ... I don't much care for this:
+                    /* Maybe there will be a free slot in a second... */
+                    retry_time = TimestampTzPlusSeconds(now, 1);
+                    LogRepWorkerUpdateSyncStartWakeup(retry_time);
We're not moving the goalposts very far on unnecessary wakeups if
we have to do that. Do we need to get a wakeup on sync slot free?
Although having to send that to every worker seems ugly. Maybe this
is being done in the wrong place and we need to find a way to get
the launcher to handle it.

As for the business about process_syncing_tables being only called
conditionally, I was already of the opinion that the way it's
getting called is loony. Why isn't it called from LogicalRepApplyLoop
(and noplace else)?

Currently, it seems to be called after processing transaction end
commands or when we are not in any transaction. As per my
understanding, that is when we can ensure the sync between tablesync
and apply worker. For example, say when tablesync worker is doing the
initial copy, the apply worker went ahead and processed some
additional xacts (WAL), now the tablesync worker needs to process all
those transactions after initial sync and before it can mark the state
as SYNCDONE. So that can be checked only at transaction boundries.

However, it is not very clear to me why the patch needs the below code.
@@ -3615,7 +3639,33 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
  if (!dlist_is_empty(&lsn_mapping))
  wait_time = WalWriterDelay;
  else
- wait_time = NAPTIME_PER_CYCLE;
+ {
+ TimestampTz nextWakeup = DT_NOEND;
+
+ /*
+ * Since process_syncing_tables() is called conditionally, the
+ * tablesync worker start wakeup time might be in the past, and we
+ * can't know for sure when it will be updated again.  Rather than
+ * spinning in a tight loop in this case, bump this wakeup time by
+ * a second.
+ */
+ now = GetCurrentTimestamp();
+ if (wakeup[LRW_WAKEUP_SYNC_START] < now)
+ wakeup[LRW_WAKEUP_SYNC_START] =
TimestampTzPlusSeconds(wakeup[LRW_WAKEUP_SYNC_START], 1);

Do we see unnecessary wakeups without this, or delay in sync?

BTW, do we need to do something about wakeups in
wait_for_relation_state_change()?

--
With Regards,
Amit Kapila.

#20

Nathan Bossart

nathandbossart@gmail.com

over 3 years ago

In reply to: Amit Kapila (#19)

Re: suppressing useless wakeups in logical/worker.c

On Sat, Jan 28, 2023 at 10:26:25AM +0530, Amit Kapila wrote:

On Fri, Jan 27, 2023 at 4:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Returning to the prior patch ... I don't much care for this:
+                    /* Maybe there will be a free slot in a second... */
+                    retry_time = TimestampTzPlusSeconds(now, 1);
+                    LogRepWorkerUpdateSyncStartWakeup(retry_time);
We're not moving the goalposts very far on unnecessary wakeups if
we have to do that. Do we need to get a wakeup on sync slot free?
Although having to send that to every worker seems ugly. Maybe this
is being done in the wrong place and we need to find a way to get
the launcher to handle it.

It might be feasible to set up a before_shmem_exit() callback that wakes up
the apply worker (like is already done for the launcher). I think the
apply worker is ordinarily notified via the tablesync worker's notify_pid,
but AFAICT there's no guarantee that the apply worker hasn't restarted with
a different PID.

+ /*
+ * Since process_syncing_tables() is called conditionally, the
+ * tablesync worker start wakeup time might be in the past, and we
+ * can't know for sure when it will be updated again.  Rather than
+ * spinning in a tight loop in this case, bump this wakeup time by
+ * a second.
+ */
+ now = GetCurrentTimestamp();
+ if (wakeup[LRW_WAKEUP_SYNC_START] < now)
+ wakeup[LRW_WAKEUP_SYNC_START] =
TimestampTzPlusSeconds(wakeup[LRW_WAKEUP_SYNC_START], 1);

Do we see unnecessary wakeups without this, or delay in sync?

I haven't looked too cloesly to see whether busy loops are likely in
practice.

BTW, do we need to do something about wakeups in
wait_for_relation_state_change()?

... and wait_for_worker_state_change(), and copy_read_data(). From a quick
glance, it looks like fixing these would be a more invasive change. TBH
I'm beginning to wonder whether all this is really worth it to prevent
waking up once per second.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#21

Amit Kapila

amit.kapila16@gmail.com

about 3 years ago

In reply to: Nathan Bossart (#20)

#22

Nathan Bossart

nathandbossart@gmail.com

about 3 years ago

In reply to: Amit Kapila (#21)

#23