recoveryStopsAfter is not usable when recovery_target_inclusive is false

Started by Hayato Kuroda (Fujitsu)6 months ago4 messages
#1Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
1 attachment(s)

Hi hackers,

While working on [1]/messages/by-id/18897-d3db67535860dddb@postgresql.org I found the point. When recovery_target_lsn is specified and
recovery_target_inclusive is false, recoveryStopsAfter() cannot return true.

```
/* Check if the target LSN has been reached */
if (recoveryTarget == RECOVERY_TARGET_LSN &&
recoveryTargetInclusive &&
record->ReadRecPtr >= recoveryTargetLSN)
```

In this case the recovery can stop when next record is read. This normally works
well but if the next record has not been generated yet, startup process will wait
till new one will come then exit from the apply loop.

I feel the process can exit bit earliyer, by comparing with the end point of the
applied record and recovery_target_lsn.
Attached patch roughly implemented the idea.

I read the old discussions, but I cannot find the reason of current style.
Do you have any thoughts for it?

[1]: /messages/by-id/18897-d3db67535860dddb@postgresql.org

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

recovery_stop_after.diffsapplication/octet-stream; name=recovery_stop_after.diffsDownload
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index e8f3ba00caa..bbab772a6f7 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -2572,6 +2572,34 @@ verifyBackupPageConsistency(XLogReaderState *record)
 	}
 }
 
+/*
+ * Support function for recoveryStopsBefore and recoveryStopsAfter. Returns
+ * true if we have already reached the recovery target LSN, otherwise returns
+ * false.
+ */
+static inline bool
+TargetLSNIsReached(XLogReaderState *record, bool is_before)
+{
+	XLogRecPtr	targetLsn;
+
+	Assert(recoveryTarget == RECOVERY_TARGET_LSN);
+
+	if (is_before)
+	{
+		if (recoveryTargetInclusive)
+			return false;
+
+		targetLsn = record->ReadRecPtr;
+	}
+	else
+	{
+		targetLsn = recoveryTargetInclusive ?
+			record->ReadRecPtr : record->EndRecPtr;
+	}
+
+	return targetLsn >= recoveryTargetLSN;
+}
+
 /*
  * For point-in-time recovery, this function decides whether we want to
  * stop applying the XLOG before the current record.
@@ -2612,8 +2640,7 @@ recoveryStopsBefore(XLogReaderState *record)
 
 	/* Check if target LSN has been reached */
 	if (recoveryTarget == RECOVERY_TARGET_LSN &&
-		!recoveryTargetInclusive &&
-		record->ReadRecPtr >= recoveryTargetLSN)
+		TargetLSNIsReached(record, true))
 	{
 		recoveryStopAfter = false;
 		recoveryStopXid = InvalidTransactionId;
@@ -2780,12 +2807,11 @@ recoveryStopsAfter(XLogReaderState *record)
 
 	/* Check if the target LSN has been reached */
 	if (recoveryTarget == RECOVERY_TARGET_LSN &&
-		recoveryTargetInclusive &&
-		record->ReadRecPtr >= recoveryTargetLSN)
+		TargetLSNIsReached(record, false))
 	{
 		recoveryStopAfter = true;
 		recoveryStopXid = InvalidTransactionId;
-		recoveryStopLSN = record->ReadRecPtr;
+		recoveryStopLSN = recoveryTargetInclusive ? record->ReadRecPtr : record->EndRecPtr;
 		recoveryStopTime = 0;
 		recoveryStopName[0] = '\0';
 		ereport(LOG,
#2Michael Paquier
michael@paquier.xyz
In reply to: Hayato Kuroda (Fujitsu) (#1)
Re: recoveryStopsAfter is not usable when recovery_target_inclusive is false

On Wed, Jul 23, 2025 at 05:04:37AM +0000, Hayato Kuroda (Fujitsu) wrote:

While working on [1] I found the point. When recovery_target_lsn is specified and
recovery_target_inclusive is false, recoveryStopsAfter() cannot return true.

/* Check if the target LSN has been reached */
if (recoveryTarget == RECOVERY_TARGET_LSN &&
recoveryTargetInclusive &&
record->ReadRecPtr >= recoveryTargetLSN)

In this case the recovery can stop when next record is read. This normally works
well but if the next record has not been generated yet, startup process will wait
till new one will come then exit from the apply loop.

+static inline bool
+TargetLSNIsReached(XLogReaderState *record, bool is_before)
+{
+    XLogRecPtr    targetLsn;
+
+    Assert(recoveryTarget == RECOVERY_TARGET_LSN);
+
+    if (is_before)
+    {
+        if (recoveryTargetInclusive)
+            return false;
+
+        targetLsn = record->ReadRecPtr;
+    }
+    else
+    {
+        targetLsn = recoveryTargetInclusive ?
+            record->ReadRecPtr : record->EndRecPtr;
+    }
+
+    return targetLsn >= recoveryTargetLSN;
+}

So, what you are doing here is changing the behavior of the check in
recoveryStopsAfter(), with the addition of one exit path based on
EndRecPtr if recovery_target_inclusive is false, to leave a bit
earlier in case if we don't have a follow-up record. Did you notice a
speedup once you did that in your logirep workflow? I am afraid that
this change would be a new mode, skipping a couple of code paths if
one decides to trigger the exit. And I highly suspect that you could
break some cases that worked until now here, skipping one
recoveryPausesHere() and one ProcessStartupProcInterrupts() to say the
least.

I feel the process can exit bit earliyer, by comparing with the end point of the
applied record and recovery_target_lsn.
Attached patch roughly implemented the idea.

Could you prove your point with a test or something that justifies a
new definition?

I read the old discussions, but I cannot find the reason of current style.

35250b6ad7a8 was quite some time ago, as of this thread, with my
fingerprints all over it:
/messages/by-id/CAB7nPqRKKyZQhcB39mty9C_gfB0ODRUQZEWB4GPP8ARpdAB=ZA@mail.gmail.com

If my memory serves me right, this comes down to the consistency with
how stop XID targets are handled before and after records are read,
except that this one applies to ReadRecPtr.
--
Michael

#3Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Michael Paquier (#2)
RE: recoveryStopsAfter is not usable when recovery_target_inclusive is false

Dear Michael,

Sorry for the late reply.

So, what you are doing here is changing the behavior of the check in
recoveryStopsAfter(), with the addition of one exit path based on
EndRecPtr if recovery_target_inclusive is false, to leave a bit
earlier in case if we don't have a follow-up record.

Yes, that's right.

Did you notice a speedup once you did that in your logirep workflow?

No, I have not verified the point. I feel it may not speed up for normal workload,
except last WAL does not exist. I noticed this point while working on the test
issue on pg_createsubscriber like 03b08c8f5 and [1]/messages/by-id/CANhcyEVqFCNhrbkCJwOpT1Su5-D3s+kSsOoc-4edKc7rzbRfeA@mail.gmail.com.

I am afraid that
this change would be a new mode, skipping a couple of code paths if
one decides to trigger the exit. And I highly suspect that you could
break some cases that worked until now here, skipping one
recoveryPausesHere() and one ProcessStartupProcInterrupts() to say the
least.

Hmm. Yes, we must investigate the side-effect deeper.

I feel the process can exit bit earliyer, by comparing with the end point of the
applied record and recovery_target_lsn.
Attached patch roughly implemented the idea.

Could you prove your point with a test or something that justifies a
new definition?

One of my colleagues is creating the test to show the beneficial workload,
could you please see it?

I read the old discussions, but I cannot find the reason of current style.

35250b6ad7a8 was quite some time ago, as of this thread, with my
fingerprints all over it:
/messages/by-id/CAB7nPqRKKyZQhcB39mty9C_gf
B0ODRUQZEWB4GPP8ARpdAB%3DZA%40mail.gmail.com

Yeah, I have found it but recoveryStopsAfter() has the same condition since the first
patch...

If my memory serves me right, this comes down to the consistency with
how stop XID targets are handled before and after records are read,
except that this one applies to ReadRecPtr.

I may find the part you're referring:

```
/*
* There can be only one transaction end record with this exact
* transactionid
*
* when testing for an xid, we MUST test for equality only, since
* transactions are numbered in the order they start, not the order
* they complete. A higher numbered xid will complete before you about
* 50% of the time...
*/
if (recoveryTarget == RECOVERY_TARGET_XID && recoveryTargetInclusive &&
recordXid == recoveryTargetXid)
```

One difference between LSN and XID is the predictability. Regarding the WAL
record, the startpoint of the next WAL record is surely next to the endpoint of
previous WAL. In terms of transaction id, however, the commit ordering can be
different with the number. COMMIT for xid=400 may come after another commit for
xid=402. Thus we cannot predict whether we should finish the recovery after
applying one transaction, when recovery_target_inclusive = false and
recovery_target_xid is set.

[1]: /messages/by-id/CANhcyEVqFCNhrbkCJwOpT1Su5-D3s+kSsOoc-4edKc7rzbRfeA@mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#4Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#3)
1 attachment(s)
Re: recoveryStopsAfter is not usable when recovery_target_inclusive is false

On Thu, 24 Jul 2025 at 16:49, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Michael,

Sorry for the late reply.

So, what you are doing here is changing the behavior of the check in
recoveryStopsAfter(), with the addition of one exit path based on
EndRecPtr if recovery_target_inclusive is false, to leave a bit
earlier in case if we don't have a follow-up record.

Yes, that's right.

Did you notice a speedup once you did that in your logirep workflow?

No, I have not verified the point. I feel it may not speed up for normal workload,
except last WAL does not exist. I noticed this point while working on the test
issue on pg_createsubscriber like 03b08c8f5 and [1].

I am afraid that
this change would be a new mode, skipping a couple of code paths if
one decides to trigger the exit. And I highly suspect that you could
break some cases that worked until now here, skipping one
recoveryPausesHere() and one ProcessStartupProcInterrupts() to say the
least.

Hmm. Yes, we must investigate the side-effect deeper.

I feel the process can exit bit earliyer, by comparing with the end point of the
applied record and recovery_target_lsn.
Attached patch roughly implemented the idea.

Could you prove your point with a test or something that justifies a
new definition?

One of my colleagues is creating the test to show the beneficial workload,
could you please see it?

I have created a script to show a beneficial workload.
In the script,
1. I have created a streaming replication setup.
2. did some DDLs, DMLs on primary node.
3. Got current lsn on primary using "pg_current_wal_lsn"
4. Set recovery parameters "recover_target_lsn" to the lsn we got on
primary, "recovery_target_action" to "promote" and
"recovery_target_inclusive" to false. And restarted the standby.
Now, when the standby is recovering after the restart, it takes some
time to recover.

With HEAD, the recovery process takes ~21sec to finish:
[16:52:53.559](21.810s) ok 1 - promoted standby is not in recovery

Whereas with patch it takes ~1sec to finish:
[16:51:38.597](1.009s) ok 1 - promoted standby is not in recovery

I read the old discussions, but I cannot find the reason of current style.

35250b6ad7a8 was quite some time ago, as of this thread, with my
fingerprints all over it:
/messages/by-id/CAB7nPqRKKyZQhcB39mty9C_gf
B0ODRUQZEWB4GPP8ARpdAB%3DZA%40mail.gmail.com

Yeah, I have found it but recoveryStopsAfter() has the same condition since the first
patch...

If my memory serves me right, this comes down to the consistency with
how stop XID targets are handled before and after records are read,
except that this one applies to ReadRecPtr.

I may find the part you're referring:

```
/*
* There can be only one transaction end record with this exact
* transactionid
*
* when testing for an xid, we MUST test for equality only, since
* transactions are numbered in the order they start, not the order
* they complete. A higher numbered xid will complete before you about
* 50% of the time...
*/
if (recoveryTarget == RECOVERY_TARGET_XID && recoveryTargetInclusive &&
recordXid == recoveryTargetXid)
```

One difference between LSN and XID is the predictability. Regarding the WAL
record, the startpoint of the next WAL record is surely next to the endpoint of
previous WAL. In terms of transaction id, however, the commit ordering can be
different with the number. COMMIT for xid=400 may come after another commit for
xid=402. Thus we cannot predict whether we should finish the recovery after
applying one transaction, when recovery_target_inclusive = false and
recovery_target_xid is set.

[1]: /messages/by-id/CANhcyEVqFCNhrbkCJwOpT1Su5-D3s+kSsOoc-4edKc7rzbRfeA@mail.gmail.com

I have attached the test script.

Thanks,
Shlok Kyal

Attachments:

100_test.plapplication/octet-stream; name=100_test.plDownload