Changes to recovery_min_apply_delay are ignored while waiting for delay

Started by Soumyadeep Chakrabortyover 4 years ago8 messages
#1Soumyadeep Chakraborty
soumyadeep2007@gmail.com
1 attachment(s)

Hello,

I came across this issue while I was tweaking a TAP test with Ashwin for
this thread [1]/messages/by-id/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ@mail.gmail.com.

We noticed that while the startup process waits for a recovery delay, it does
not respect changes to the recovery_min_apply_delay GUC. So:

// Standby is waiting on recoveryWakeupLatch with a large
recovery_min_apply_delay value
node_standby->safe_psql('postgres', 'ALTER SYSTEM SET
recovery_min_apply_delay TO 0;');
node_standby->reload;
// Standby is still waiting, it does not proceed until the old timeout
is elapsed.

I attached a small patch to fix this. It makes it so that
recovery_min_apply_delay is reconsulted in the loop to redetermine the wait
interval. This makes it similar to the loop in CheckpointerMain, where
CheckPointTimeout is consulted after interrupts are handled:

if (elapsed_secs >= CheckPointTimeout)
continue; /* no sleep for us ... */

I have also added a test to 005_replay_delay.pl.

Regards,
Soumyadeep(VMware)

[1]: /messages/by-id/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ@mail.gmail.com

Attachments:

v1-0001-Refresh-delayUntil-in-recovery-delay-loop.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Refresh-delayUntil-in-recovery-delay-loop.patchDownload
From e1b93d4b3874eb0923e18dca9a9eccd453d6cd56 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Date: Mon, 2 Aug 2021 20:50:37 -0700
Subject: [PATCH v1 1/1] Refresh delayUntil in recovery delay loop

This ensures that the wait interval in the loop is correctly
recalculated with the updated value of recovery_min_apply_delay.

Now, if one changes the GUC while we are waiting, those changes will
take effect. Practical applications include instantly cancelling a long
wait time by setting recovery_min_apply_delay to 0. This can be useful
in tests.
---
 src/backend/access/transam/xlog.c       | 11 ++++++---
 src/test/recovery/t/005_replay_delay.pl | 32 +++++++++++++++++++++----
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f84c0bb01eb..89dc759586c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6234,12 +6234,11 @@ recoveryApplyDelay(XLogReaderState *record)
 	if (!getRecordTimestamp(record, &xtime))
 		return false;
 
-	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
-
 	/*
 	 * Exit without arming the latch if it's already past time to apply this
 	 * record
 	 */
+	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
 	msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil);
 	if (msecs <= 0)
 		return false;
@@ -6255,7 +6254,13 @@ recoveryApplyDelay(XLogReaderState *record)
 			break;
 
 		/*
-		 * Wait for difference between GetCurrentTimestamp() and delayUntil
+		 * Recalculate delayUntil as recovery_min_apply_delay could have changed
+		 * while we were waiting in the loop.
+		 */
+		delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
+
+		/*
+		 * Wait for difference between GetCurrentTimestamp() and delayUntil.
 		 */
 		msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
 												delayUntil);
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
index 0b56380e0a7..74f821b3248 100644
--- a/src/test/recovery/t/005_replay_delay.pl
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -7,7 +7,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -56,6 +56,30 @@ $node_standby->poll_query_until('postgres',
 ok(time() - $primary_insert_time >= $delay,
 	"standby applies WAL only after replication delay");
 
+# Now set the delay for the next commit to some obscenely high value.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO '2h';");
+$node_standby->reload;
+
+$node_primary->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# We should not have replayed the LSN from the last insert on the standby yet,
+# even in spite of it being received and flushed eventually. In other words, we
+# should be stuck in recovery_min_apply_delay.
+my $last_insert_lsn =
+	$node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+$node_primary->wait_for_catchup('standby', 'flush', $last_insert_lsn);
+is( $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn() < '$last_insert_lsn'::pg_lsn;"),
+	't', 'standby stuck in recovery_min_apply_delay');
+
+# Clear the recovery_min_apply_delay timeout so that the wait is immediately
+# cancelled and replay can proceed.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO 0;");
+$node_standby->reload;
+
+# Now the standby should catch up.
+$node_primary->wait_for_catchup('standby');
 
 # Check that recovery can be paused or resumed expectedly.
 my $node_standby2 = PostgresNode->new('standby2');
@@ -72,7 +96,7 @@ is( $node_standby2->safe_psql(
 # Request to pause recovery and wait until it's actually paused.
 $node_standby2->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
 $node_primary->safe_psql('postgres',
-	"INSERT INTO tab_int VALUES (generate_series(21,30))");
+	"INSERT INTO tab_int VALUES (generate_series(31,40))");
 $node_standby2->poll_query_until('postgres',
 	"SELECT pg_get_wal_replay_pause_state() = 'paused'")
   or die "Timed out while waiting for recovery to be paused";
@@ -84,7 +108,7 @@ my $receive_lsn =
 my $replay_lsn =
   $node_standby2->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
 $node_primary->safe_psql('postgres',
-	"INSERT INTO tab_int VALUES (generate_series(31,40))");
+	"INSERT INTO tab_int VALUES (generate_series(41,50))");
 $node_standby2->poll_query_until('postgres',
 	"SELECT '$receive_lsn'::pg_lsn < pg_last_wal_receive_lsn()")
   or die "Timed out while waiting for new WAL to be streamed";
@@ -102,7 +126,7 @@ $node_standby2->poll_query_until('postgres',
 # is triggered while recovery is paused.
 $node_standby2->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
 $node_primary->safe_psql('postgres',
-	"INSERT INTO tab_int VALUES (generate_series(41,50))");
+	"INSERT INTO tab_int VALUES (generate_series(51,60))");
 $node_standby2->poll_query_until('postgres',
 	"SELECT pg_get_wal_replay_pause_state() = 'paused'")
   or die "Timed out while waiting for recovery to be paused";
-- 
2.25.1

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Soumyadeep Chakraborty (#1)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

At Mon, 2 Aug 2021 22:21:56 -0700, Soumyadeep Chakraborty <soumyadeep2007@gmail.com> wrote in

Hello,

I came across this issue while I was tweaking a TAP test with Ashwin for
this thread [1].

We noticed that while the startup process waits for a recovery delay, it does
not respect changes to the recovery_min_apply_delay GUC. So:

// Standby is waiting on recoveryWakeupLatch with a large
recovery_min_apply_delay value
node_standby->safe_psql('postgres', 'ALTER SYSTEM SET
recovery_min_apply_delay TO 0;');
node_standby->reload;
// Standby is still waiting, it does not proceed until the old timeout
is elapsed.

I attached a small patch to fix this. It makes it so that
recovery_min_apply_delay is reconsulted in the loop to redetermine the wait
interval. This makes it similar to the loop in CheckpointerMain, where
CheckPointTimeout is consulted after interrupts are handled:

if (elapsed_secs >= CheckPointTimeout)
continue; /* no sleep for us ... */

I have also added a test to 005_replay_delay.pl.

Regards,
Soumyadeep(VMware)

[1] /messages/by-id/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ@mail.gmail.com

Sounds reasonable and the patch looks fine as a whole. Applied cleanly
and works as expected. The added test properly catches the issue.

One comment from me.

+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO 0;");

It might be better do "SET reco.. TO DEFAULT" instead.

And how about adding the new test at the end of existing tests. We can
get rid of the extra lines in the diff.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Kyotaro Horiguchi (#2)
1 attachment(s)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

Hi Kyotaro,

Thanks for the review!

On Mon, Aug 2, 2021 at 11:42 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

One comment from me.

+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO 0;");

It might be better do "SET reco.. TO DEFAULT" instead.

Sure.

And how about adding the new test at the end of existing tests. We can
get rid of the extra lines in the diff.

No problem. See attached v2. I didn't do that initially as the test file
looks as though it is split into two halves: one for testing
recovery_min_apply_delay and the other for testing recovery pause. So while
making this change, I added a header comment for the newly added test case.
Please take a look.

Regards,
Soumyadeep (VMware)

Attachments:

v2-0001-Refresh-delayUntil-in-recovery-delay-loop.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Refresh-delayUntil-in-recovery-delay-loop.patchDownload
From 8b4ea51e1a22548fdd3f6921fe374d3a9d987a77 Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Date: Mon, 2 Aug 2021 20:50:37 -0700
Subject: [PATCH v2 1/1] Refresh delayUntil in recovery delay loop

This ensures that the wait interval in the loop is correctly
recalculated with the updated value of recovery_min_apply_delay.

Now, if one changes the GUC while we are waiting, those changes will
take effect. Practical applications include instantly cancelling a long
wait time by setting recovery_min_apply_delay to 0. This can be useful
in tests.
---
 src/backend/access/transam/xlog.c       | 11 ++++++---
 src/test/recovery/t/005_replay_delay.pl | 31 +++++++++++++++++++++++--
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f84c0bb01eb..89dc759586c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6234,12 +6234,11 @@ recoveryApplyDelay(XLogReaderState *record)
 	if (!getRecordTimestamp(record, &xtime))
 		return false;
 
-	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
-
 	/*
 	 * Exit without arming the latch if it's already past time to apply this
 	 * record
 	 */
+	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
 	msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil);
 	if (msecs <= 0)
 		return false;
@@ -6255,7 +6254,13 @@ recoveryApplyDelay(XLogReaderState *record)
 			break;
 
 		/*
-		 * Wait for difference between GetCurrentTimestamp() and delayUntil
+		 * Recalculate delayUntil as recovery_min_apply_delay could have changed
+		 * while we were waiting in the loop.
+		 */
+		delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
+
+		/*
+		 * Wait for difference between GetCurrentTimestamp() and delayUntil.
 		 */
 		msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
 												delayUntil);
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
index 0b56380e0a7..d55f1fc257f 100644
--- a/src/test/recovery/t/005_replay_delay.pl
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -7,7 +7,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -56,7 +56,6 @@ $node_standby->poll_query_until('postgres',
 ok(time() - $primary_insert_time >= $delay,
 	"standby applies WAL only after replication delay");
 
-
 # Check that recovery can be paused or resumed expectedly.
 my $node_standby2 = PostgresNode->new('standby2');
 $node_standby2->init_from_backup($node_primary, $backup_name,
@@ -110,3 +109,31 @@ $node_standby2->poll_query_until('postgres',
 $node_standby2->promote;
 $node_standby2->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
   or die "Timed out while waiting for promotion to finish";
+
+# Now test to see if updates to recovery_min_apply_delay apply when a standby is
+# waiting for a recovery delay to elapse.
+
+# First, set the delay for the next commit to some obscenely high value.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO '2h';");
+$node_standby->reload;
+
+$node_primary->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(51, 60))");
+
+# We should not have replayed the LSN from the last insert on the standby yet,
+# even though it will be received and flushed eventually. In other words, we
+# should be stuck in recovery_min_apply_delay.
+my $last_insert_lsn =
+	$node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+$node_primary->wait_for_catchup('standby', 'flush', $last_insert_lsn);
+is( $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn() < '$last_insert_lsn'::pg_lsn;"),
+	't', 'standby stuck in recovery_min_apply_delay');
+
+# Clear the recovery_min_apply_delay timeout so that the wait is immediately
+# cancelled and replay can proceed.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO DEFAULT;");
+$node_standby->reload;
+
+# Now the standby should catch up.
+$node_primary->wait_for_catchup('standby');
-- 
2.25.1

#4Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Soumyadeep Chakraborty (#3)
1 attachment(s)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

Rebased. Also added a stronger check to see if the standby is stuck in
recovery_min_apply_delay:

$node_standby->poll_query_until('postgres', qq{
SELECT wait_event = 'RecoveryApplyDelay' FROM pg_stat_activity
WHERE backend_type='startup';
}) or die "Timed out checking if startup is in recovery_min_apply_delay";

Attached v3.

Regards,
Soumyadeep (VMware)

Attachments:

v3-0001-Refresh-delayUntil-in-recovery-delay-loop.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Refresh-delayUntil-in-recovery-delay-loop.patchDownload
From 441716f45d0fbffde1135ba007c3a6c5b6b464de Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Date: Mon, 2 Aug 2021 20:50:37 -0700
Subject: [PATCH v3 1/1] Refresh delayUntil in recovery delay loop

This ensures that the wait interval in the loop is correctly
recalculated with the updated value of recovery_min_apply_delay.

Now, if one changes the GUC while we are waiting, those changes will
take effect. Practical applications include instantly cancelling a long
wait time by setting recovery_min_apply_delay to 0. This can be useful
in tests.
---
 src/backend/access/transam/xlog.c       | 11 +++++---
 src/test/recovery/t/005_replay_delay.pl | 36 +++++++++++++++++++++++--
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d0ec6a834be..3e05119212b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6234,12 +6234,11 @@ recoveryApplyDelay(XLogReaderState *record)
 	if (!getRecordTimestamp(record, &xtime))
 		return false;
 
-	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
-
 	/*
 	 * Exit without arming the latch if it's already past time to apply this
 	 * record
 	 */
+	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
 	msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil);
 	if (msecs <= 0)
 		return false;
@@ -6255,7 +6254,13 @@ recoveryApplyDelay(XLogReaderState *record)
 			break;
 
 		/*
-		 * Wait for difference between GetCurrentTimestamp() and delayUntil
+		 * Recalculate delayUntil as recovery_min_apply_delay could have changed
+		 * while we were waiting in the loop.
+		 */
+		delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
+
+		/*
+		 * Wait for difference between GetCurrentTimestamp() and delayUntil.
 		 */
 		msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
 												delayUntil);
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
index 0b56380e0a7..4f409ad10d1 100644
--- a/src/test/recovery/t/005_replay_delay.pl
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -7,7 +7,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -56,7 +56,6 @@ $node_standby->poll_query_until('postgres',
 ok(time() - $primary_insert_time >= $delay,
 	"standby applies WAL only after replication delay");
 
-
 # Check that recovery can be paused or resumed expectedly.
 my $node_standby2 = PostgresNode->new('standby2');
 $node_standby2->init_from_backup($node_primary, $backup_name,
@@ -110,3 +109,36 @@ $node_standby2->poll_query_until('postgres',
 $node_standby2->promote;
 $node_standby2->poll_query_until('postgres', "SELECT NOT pg_is_in_recovery()")
   or die "Timed out while waiting for promotion to finish";
+
+# Now test to see if updates to recovery_min_apply_delay apply when a standby is
+# waiting for a recovery delay to elapse.
+
+# First, set the delay for the next commit to some obscenely high value.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO '2h';");
+$node_standby->reload;
+
+$node_primary->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(51, 60))");
+
+# The standby's flush LSN should be caught up.
+my $last_insert_lsn =
+	$node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+$node_primary->wait_for_catchup('standby', 'flush', $last_insert_lsn);
+
+# The standby's replay LSN will not be caught up as it should be waiting in
+# recovery_min_apply_delay.
+$node_standby->poll_query_until('postgres', qq{
+SELECT wait_event = 'RecoveryApplyDelay' FROM pg_stat_activity
+WHERE backend_type='startup';
+}) or die "Timed out checking if startup is in recovery_min_apply_delay";
+is( $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn() < '$last_insert_lsn'::pg_lsn;"),
+	't', 'standby stuck in recovery_min_apply_delay');
+
+# Clear the recovery_min_apply_delay timeout so that the wait is immediately
+# cancelled and replay can proceed.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO DEFAULT;");
+$node_standby->reload;
+
+# Now the standby should catch up.
+$node_primary->wait_for_catchup('standby');
-- 
2.25.1

#5Michael Paquier
michael@paquier.xyz
In reply to: Soumyadeep Chakraborty (#4)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

On Fri, Aug 06, 2021 at 04:59:55PM -0700, Soumyadeep Chakraborty wrote:

Rebased. Also added a stronger check to see if the standby is stuck in
recovery_min_apply_delay:

$node_standby->poll_query_until('postgres', qq{
SELECT wait_event = 'RecoveryApplyDelay' FROM pg_stat_activity
WHERE backend_type='startup';
}) or die "Timed out checking if startup is in recovery_min_apply_delay";

Relying on wait events is a good idea here. This makes the test more
reliable to make sure that the startup process is stuck in the
WaitLatch phase.

Attached v3.

Agreed that the current behavior is confusing. As you are using the
commit timestamp for the comparison, this is right. One small-ish
comment I have about the code is that we should mention
recovery_min_apply_delay for HandleStartupProcInterrupts(), and not
only the trigger file location.

+# First, set the delay for the next commit to some obscenely high value.
It has no need to be obscenely high, just high enough to give the time
to slow machines to catch the wait event lookup done. So this could
use better words to explain this choice.

Anyway, it seems to me that something is incorrect in this new test
(manual tests pass here, the TAP test is off). One thing we need to
make sure for any new test added is that it correctly breaks if the
fix proposed is *not* in place. And as far as I can see, the test
passes even if the recalculation of delayUntil is done within the loop
that reloads the configuration. The thing may be a bit tricky to make
reliable as the parameter reloading may cause wait_event to not be
RecoveryApplyDelay, so we should have at least a check based on a scan
of pg_stat_replication.replay_lsn on the primary. Perhaps you have
an other idea?

When using wait_for_catchup(), I would recommend to list "replay" for
this test, even if that's the default mode, to make clear what the
intention is.
--
Michael

#6Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Michael Paquier (#5)
1 attachment(s)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

Hey Michael,

Really appreciate the review!

On Wed, Aug 11, 2021 at 12:40 AM Michael Paquier <michael@paquier.xyz> wrote:

Agreed that the current behavior is confusing. As you are using the
commit timestamp for the comparison, this is right. One small-ish
comment I have about the code is that we should mention
recovery_min_apply_delay for HandleStartupProcInterrupts(), and not
only the trigger file location.

Cool, updated.

+# First, set the delay for the next commit to some obscenely high value.
It has no need to be obscenely high, just high enough to give the time
to slow machines to catch the wait event lookup done. So this could
use better words to explain this choice.

Sounds good. Done.

Anyway, it seems to me that something is incorrect in this new test
(manual tests pass here, the TAP test is off). One thing we need to
make sure for any new test added is that it correctly breaks if the
fix proposed is *not* in place. And as far as I can see, the test
passes even if the recalculation of delayUntil is done within the loop
that reloads the configuration. The thing may be a bit tricky to make
reliable as the parameter reloading may cause wait_event to not be
RecoveryApplyDelay, so we should have at least a check based on a scan
of pg_stat_replication.replay_lsn on the primary. Perhaps you have
an other idea?

Hmm, please see attached v4 which just shifts the test to the middle,
like v1. When I run the test without the code change, the test hangs
as expected in:

# Now the standby should catch up.
$node_primary->wait_for_catchup('standby', 'replay');

and passes with the code change, as expected. I can't explain why the
test doesn't freeze up in v3 in wait_for_catchup() at the end.

As for checking on the wait event, since we only signal the standby
after performing the check, that should be fine. Nothing else would be
sending a SIGHUP before the check. Is that assumption correct?

When using wait_for_catchup(), I would recommend to list "replay" for
this test, even if that's the default mode, to make clear what the
intention is.

Done.

Regards,
Soumyadeep (VMware)

Attachments:

v4-0001-Refresh-delayUntil-in-recovery-delay-loop.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Refresh-delayUntil-in-recovery-delay-loop.patchDownload
From 824076a977af0e40da1c7eb9e4aeac9a8e81a7ee Mon Sep 17 00:00:00 2001
From: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Date: Mon, 2 Aug 2021 20:50:37 -0700
Subject: [PATCH v4 1/1] Refresh delayUntil in recovery delay loop

This ensures that the wait interval in the loop is correctly
recalculated with the updated value of recovery_min_apply_delay.

Now, if one changes the GUC while we are waiting, those changes will
take effect. Practical applications include instantly cancelling a long
wait time by setting recovery_min_apply_delay to 0. This can be useful
in tests.
---
 src/backend/access/transam/xlog.c       | 13 +++++---
 src/test/recovery/t/005_replay_delay.pl | 41 ++++++++++++++++++++++---
 2 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d0ec6a834be..74ad7ff905b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6234,12 +6234,11 @@ recoveryApplyDelay(XLogReaderState *record)
 	if (!getRecordTimestamp(record, &xtime))
 		return false;
 
-	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
-
 	/*
 	 * Exit without arming the latch if it's already past time to apply this
 	 * record
 	 */
+	delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
 	msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(), delayUntil);
 	if (msecs <= 0)
 		return false;
@@ -6248,14 +6247,20 @@ recoveryApplyDelay(XLogReaderState *record)
 	{
 		ResetLatch(&XLogCtl->recoveryWakeupLatch);
 
-		/* might change the trigger file's location */
+		/* might change the trigger file's location and recovery_min_apply_delay */
 		HandleStartupProcInterrupts();
 
 		if (CheckForStandbyTrigger())
 			break;
 
 		/*
-		 * Wait for difference between GetCurrentTimestamp() and delayUntil
+		 * Recalculate delayUntil as recovery_min_apply_delay could have changed
+		 * since last time.
+		 */
+		delayUntil = TimestampTzPlusMilliseconds(xtime, recovery_min_apply_delay);
+
+		/*
+		 * Wait for difference between GetCurrentTimestamp() and delayUntil.
 		 */
 		msecs = TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
 												delayUntil);
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
index 0b56380e0a7..ad8093df41a 100644
--- a/src/test/recovery/t/005_replay_delay.pl
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -7,7 +7,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
 
 # Initialize primary node
 my $node_primary = PostgresNode->new('primary');
@@ -56,6 +56,39 @@ $node_standby->poll_query_until('postgres',
 ok(time() - $primary_insert_time >= $delay,
 	"standby applies WAL only after replication delay");
 
+# Now test to see if updates to recovery_min_apply_delay apply when a standby is
+# waiting for a recovery delay to elapse.
+
+# First, set the delay for the next commit to a value much higher than the usual
+# poll timeouts, so that replay remains blocked while we are running checks.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO '2h';");
+$node_standby->reload;
+
+$node_primary->safe_psql('postgres',
+	"INSERT INTO tab_int VALUES (generate_series(21, 30))");
+
+# The standby's flush LSN should be caught up thanks to the walreceiver.
+my $last_insert_lsn =
+	$node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
+$node_primary->wait_for_catchup('standby', 'flush', $last_insert_lsn);
+
+# The standby's replay LSN will not be caught up as it should be waiting in
+# recovery_min_apply_delay.
+$node_standby->poll_query_until('postgres', qq{
+SELECT wait_event = 'RecoveryApplyDelay' FROM pg_stat_activity
+WHERE backend_type='startup';
+}) or die "Timed out checking if startup is in recovery_min_apply_delay";
+is( $node_standby->safe_psql('postgres',
+	"SELECT pg_last_wal_replay_lsn() < '$last_insert_lsn'::pg_lsn;"),
+	't', 'standby stuck in recovery_min_apply_delay');
+
+# Clear the recovery_min_apply_delay timeout so that the wait is immediately
+# cancelled and replay can proceed.
+$node_standby->safe_psql('postgres', "ALTER SYSTEM SET recovery_min_apply_delay TO DEFAULT;");
+$node_standby->reload;
+
+# Now the standby replay should catch up.
+$node_primary->wait_for_catchup('standby', 'replay');
 
 # Check that recovery can be paused or resumed expectedly.
 my $node_standby2 = PostgresNode->new('standby2');
@@ -72,7 +105,7 @@ is( $node_standby2->safe_psql(
 # Request to pause recovery and wait until it's actually paused.
 $node_standby2->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
 $node_primary->safe_psql('postgres',
-	"INSERT INTO tab_int VALUES (generate_series(21,30))");
+	"INSERT INTO tab_int VALUES (generate_series(31,40))");
 $node_standby2->poll_query_until('postgres',
 	"SELECT pg_get_wal_replay_pause_state() = 'paused'")
   or die "Timed out while waiting for recovery to be paused";
@@ -84,7 +117,7 @@ my $receive_lsn =
 my $replay_lsn =
   $node_standby2->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
 $node_primary->safe_psql('postgres',
-	"INSERT INTO tab_int VALUES (generate_series(31,40))");
+	"INSERT INTO tab_int VALUES (generate_series(41,50))");
 $node_standby2->poll_query_until('postgres',
 	"SELECT '$receive_lsn'::pg_lsn < pg_last_wal_receive_lsn()")
   or die "Timed out while waiting for new WAL to be streamed";
@@ -102,7 +135,7 @@ $node_standby2->poll_query_until('postgres',
 # is triggered while recovery is paused.
 $node_standby2->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
 $node_primary->safe_psql('postgres',
-	"INSERT INTO tab_int VALUES (generate_series(41,50))");
+	"INSERT INTO tab_int VALUES (generate_series(51,60))");
 $node_standby2->poll_query_until('postgres',
 	"SELECT pg_get_wal_replay_pause_state() = 'paused'")
   or die "Timed out while waiting for recovery to be paused";
-- 
2.25.1

#7Michael Paquier
michael@paquier.xyz
In reply to: Soumyadeep Chakraborty (#6)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

On Fri, Aug 13, 2021 at 05:59:21PM -0700, Soumyadeep Chakraborty wrote:

and passes with the code change, as expected. I can't explain why the
test doesn't freeze up in v3 in wait_for_catchup() at the end.

It took me some some to understand why. If I am right, that's because
of the intermediate test block working on $standby_2 and the two
INSERT queries of the primary. In v1 and v4, we have no activity on
the primary between the first set of tests and yours, meaning that
$standby has nothing to do. In v2 and v3, the two INSERT queries run
on the primary for the purpose of the recovery pause make $standby_1
wait for the default value of recovery_min_apply_delay, aka 3s, in
parallel. If the set of tests for $standby_2 is faster than that,
we'd bump on the phase where the code still waited for 3s, not the 2
hours set, visibly.

After considering this stuff, the order dependency we'd introduce in
this test makes the whole thing more brittle than it should. And such
an edge case does not seem worth spending extra cycles testing anyway,
as if things break we'd finish with a test stuck for an unnecessary
long time by relying on wait_for_catchup("replay"). We could use
something else, say based on a lookup of pg_stat_activity but this
still requires extra run time for the wait phases needed. So at the
end I have dropped the test, but backpatched the fix.
--
Michael

#8Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Michael Paquier (#7)
Re: Changes to recovery_min_apply_delay are ignored while waiting for delay

On Sun, Aug 15, 2021 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Aug 13, 2021 at 05:59:21PM -0700, Soumyadeep Chakraborty wrote:

and passes with the code change, as expected. I can't explain why the
test doesn't freeze up in v3 in wait_for_catchup() at the end.

It took me some some to understand why. If I am right, that's because
of the intermediate test block working on $standby_2 and the two
INSERT queries of the primary. In v1 and v4, we have no activity on
the primary between the first set of tests and yours, meaning that
$standby has nothing to do. In v2 and v3, the two INSERT queries run
on the primary for the purpose of the recovery pause make $standby_1
wait for the default value of recovery_min_apply_delay, aka 3s, in
parallel. If the set of tests for $standby_2 is faster than that,
we'd bump on the phase where the code still waited for 3s, not the 2
hours set, visibly.

I see, thanks a lot for the explanation. Thanks to your investigation, I
can now kind of reuse some of the test mechanisms for the other patch that
I am working on [1]/messages/by-id/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ@mail.gmail.com. There, we don't have multiple standbys getting in the
way, thankfully.

After considering this stuff, the order dependency we'd introduce in
this test makes the whole thing more brittle than it should. And such
an edge case does not seem worth spending extra cycles testing anyway,
as if things break we'd finish with a test stuck for an unnecessary
long time by relying on wait_for_catchup("replay"). We could use
something else, say based on a lookup of pg_stat_activity but this
still requires extra run time for the wait phases needed. So at the
end I have dropped the test, but backpatched the fix.
--

Fair.

Regards,
Soumyadeep (VMware)

[1]: /messages/by-id/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ@mail.gmail.com