pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

Started by Fujii Masaoalmost 5 years ago20 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

If a promotion is triggered while recovery is paused, the paused state ends
and promotion continues. But currently pg_get_wal_replay_pause_state()
returns 'paused' in that case. Isn't this a bug?

Attached patch fixes this issue by resetting the recovery pause state to
'not paused' when standby promotion is triggered.

Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

reset_wal_pause_state_after_promotion_v1.patchtext/plain; charset=UTF-8; name=reset_wal_pause_state_after_promotion_v1.patch; x-mac-creator=0; x-mac-type=0Download+8-0
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#1)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

At Mon, 17 May 2021 23:29:18 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

If a promotion is triggered while recovery is paused, the paused state
ends
and promotion continues. But currently pg_get_wal_replay_pause_state()
returns 'paused' in that case. Isn't this a bug?

Attached patch fixes this issue by resetting the recovery pause state
to
'not paused' when standby promotion is triggered.

Thought?

Nice catch!

Once the state enteres "paused" state no more WAL record is expected
to be replayed until exiting the state. I'm not sure but maybe we are
also expecting that the server promotes whthout a record replayed when
triggered while pausing. However, actually there's a chance for a
record to replayed before promotion. Of course it is existing
behavior but I'd like to make sure whether we deliberately allow that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/18 9:58, Kyotaro Horiguchi wrote:

At Mon, 17 May 2021 23:29:18 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

If a promotion is triggered while recovery is paused, the paused state
ends
and promotion continues. But currently pg_get_wal_replay_pause_state()
returns 'paused' in that case. Isn't this a bug?

Attached patch fixes this issue by resetting the recovery pause state
to
'not paused' when standby promotion is triggered.

Thought?

Nice catch!

Once the state enteres "paused" state no more WAL record is expected
to be replayed until exiting the state. I'm not sure but maybe we are
also expecting that the server promotes whthout a record replayed when
triggered while pausing.

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Fujii Masao (#1)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On Mon, May 17, 2021 at 7:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

If a promotion is triggered while recovery is paused, the paused state ends
and promotion continues. But currently pg_get_wal_replay_pause_state()
returns 'paused' in that case. Isn't this a bug?

Attached patch fixes this issue by resetting the recovery pause state to
'not paused' when standby promotion is triggered.

Thought?

I think, prior to commit 496ee647ecd2917369ffcf1eaa0b2cdca07c8730
(Prefer standby promotion over recovery pause.) this behavior was fine
because the pause was continued but after this commit now we are
giving preference to pause so this is a bug so need to be fixed.

The fix looks fine but I think along with this we should also return
immediately from the pause loop if promotion is requested. Because if
we recheck the recovery pause then someone can pause again and we will
be in loop so better to exit as soon as promotion is requested, see
attached patch. Should be applied along with your patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

Return_immediately_if_promote_requested.patchtext/x-patch; charset=US-ASCII; name=Return_immediately_if_promote_requested.patchDownload+2-0
#5Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#3)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?
--
Michael

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Dilip Kumar (#4)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/18 14:53, Dilip Kumar wrote:

On Mon, May 17, 2021 at 7:59 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

If a promotion is triggered while recovery is paused, the paused state ends
and promotion continues. But currently pg_get_wal_replay_pause_state()
returns 'paused' in that case. Isn't this a bug?

Attached patch fixes this issue by resetting the recovery pause state to
'not paused' when standby promotion is triggered.

Thought?

I think, prior to commit 496ee647ecd2917369ffcf1eaa0b2cdca07c8730
(Prefer standby promotion over recovery pause.) this behavior was fine
because the pause was continued but after this commit now we are
giving preference to pause so this is a bug so need to be fixed.

The fix looks fine but I think along with this we should also return
immediately from the pause loop if promotion is requested. Because if
we recheck the recovery pause then someone can pause again and we will
be in loop so better to exit as soon as promotion is requested, see
attached patch. Should be applied along with your patch.

But this change can cause the recovery to continue with insufficient parameter
settings if a promotion is requested while the server is in the paused state
because of such invalid settings. This behavior seems not safe.
If this my understanding is right, the recovery should abort immediately
(i.e., FATAL error ""recovery aborted because of insufficient parameter settings"
should be thrown) if a promotion is requested in that case, like when
pg_wal_replay_resume() is executed in that case. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#7Dilip Kumar
dilipbalaut@gmail.com
In reply to: Fujii Masao (#6)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On Tue, May 18, 2021 at 1:43 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

The fix looks fine but I think along with this we should also return
immediately from the pause loop if promotion is requested. Because if
we recheck the recovery pause then someone can pause again and we will
be in loop so better to exit as soon as promotion is requested, see
attached patch. Should be applied along with your patch.

But this change can cause the recovery to continue with insufficient parameter
settings if a promotion is requested while the server is in the paused state
because of such invalid settings. This behavior seems not safe.
If this my understanding is right, the recovery should abort immediately
(i.e., FATAL error ""recovery aborted because of insufficient parameter settings"
should be thrown) if a promotion is requested in that case, like when
pg_wal_replay_resume() is executed in that case. Thought?

Yeah, you are right, I missed that.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#3)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

At Tue, 18 May 2021 12:48:38 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Mmm. I was confused with recovery target + pause. Actually promotion
works as so and it is documented. Anyway it is a matter of the next
version.

I forgot to mention the patch itself, but what the patch does looks
fine to me. Disabling pause after setting SharedProteIsTriggered
prevents later re-pausing (from the sql function).

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#5)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/18 15:46, Michael Paquier wrote:

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#10Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#8)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/19 9:53, Kyotaro Horiguchi wrote:

At Tue, 18 May 2021 12:48:38 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Mmm. I was confused with recovery target + pause. Actually promotion
works as so and it is documented. Anyway it is a matter of the next
version.

I forgot to mention the patch itself, but what the patch does looks
fine to me. Disabling pause after setting SharedProteIsTriggered
prevents later re-pausing (from the sql function).

Thanks for the review! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11Michael Paquier
michael@paquier.xyz
In reply to: Fujii Masao (#9)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On Wed, May 19, 2021 at 01:46:45PM +0900, Fujii Masao wrote:

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Couldn't you rely on recovery_end_command for number #4? The shared
memory state tracked by SharedRecoveryState is updated after the
end-recovery command is triggered, so pg_get_wal_replay_pause_state()
can be executed at this point. A bit hairy, I agree, but that would
work :)

Still, it would be easy enough to have something for
pg_get_wal_replay_pause_state() called on a standby when there is no
pause (your case #2) and a second case on a standby with a pause
triggered, though (not listed above).
--
Michael

#12Dilip Kumar
dilipbalaut@gmail.com
In reply to: Fujii Masao (#9)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On Wed, May 19, 2021 at 10:16 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/05/18 15:46, Michael Paquier wrote:

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Even for #2, we can not ensure that whether it will be 'paused' or
'pause requested'.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#13Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#12)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Wed, May 19, 2021 at 10:16 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/05/18 15:46, Michael Paquier wrote:

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Even for #2, we can not ensure that whether it will be 'paused' or
'pause requested'.

We often use poll_query_until() to make sure some desired state is
reached. And, as Michael suggested, the function
pg_get_wal_replay_pause_state() still works at the time of
recovery_end_command. So a bit more detailed steps are:

#0. Equip the server with recovery_end_command that waits for some
trigger then start the server.
#1. Pause the recovery
#2. Wait until pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Wait until pg_get_wal_replay_pause_state() returns 'not paused'
#5. Trigger recovery_end_command to let promotion proceed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#13)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/19 15:25, Kyotaro Horiguchi wrote:

At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Wed, May 19, 2021 at 10:16 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/05/18 15:46, Michael Paquier wrote:

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Even for #2, we can not ensure that whether it will be 'paused' or
'pause requested'.

We often use poll_query_until() to make sure some desired state is
reached.

Yes.

And, as Michael suggested, the function
pg_get_wal_replay_pause_state() still works at the time of
recovery_end_command. So a bit more detailed steps are:

IMO this idea is tricky and fragile, so I'm inclined to avoid that if possible.
Attached is the POC patch to add the following tests.

#1. Check that pg_get_wal_replay_pause_state() reports "not paused" at first.
#2. Request to pause archive recovery and wait until it's actually paused.
#3. Request to resume archive recovery and wait until it's actually resumed.
#4. Request to pause archive recovery and wait until it's actually paused.
Then, check that the paused state ends and promotion continues
if a promotion is triggered while recovery is paused.

In #4, pg_get_wal_replay_pause_state() is not executed while promotion
is ongoing. #4 checks that pg_is_in_recovery() returns false and
the promotion finishes expectedly in that case. Isn't this test enough for now?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

recovery_pause_test_v1.patchtext/plain; charset=UTF-8; name=recovery_pause_test_v1.patch; x-mac-creator=0; x-mac-type=0Download+40-1
#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#14)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

At Wed, 19 May 2021 16:21:58 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2021/05/19 15:25, Kyotaro Horiguchi wrote:

At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar
<dilipbalaut@gmail.com> wrote in

On Wed, May 19, 2021 at 10:16 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/05/18 15:46, Michael Paquier wrote:

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Even for #2, we can not ensure that whether it will be 'paused' or
'pause requested'.

We often use poll_query_until() to make sure some desired state is
reached.

Yes.

And, as Michael suggested, the function
pg_get_wal_replay_pause_state() still works at the time of
recovery_end_command. So a bit more detailed steps are:

IMO this idea is tricky and fragile, so I'm inclined to avoid that if

Agreed, the recovery_end_command would be something like the following
avoiding dependency on sh. However, I'm not sure it works as well on
Windows..

recovery_end_command='perl -e "while( -f \'$trigfile\') {sleep 0.1;}"'

possible.
Attached is the POC patch to add the following tests.

#1. Check that pg_get_wal_replay_pause_state() reports "not paused" at
#first.
#2. Request to pause archive recovery and wait until it's actually
#paused.
#3. Request to resume archive recovery and wait until it's actually
#resumed.
#4. Request to pause archive recovery and wait until it's actually
#paused.
Then, check that the paused state ends and promotion continues
if a promotion is triggered while recovery is paused.

In #4, pg_get_wal_replay_pause_state() is not executed while promotion
is ongoing. #4 checks that pg_is_in_recovery() returns false and
the promotion finishes expectedly in that case. Isn't this test enough
for now?

+1 for adding some tests for pg_wal_replay_pause() but the test seems
like checking only that pg_get_wal_replay_pause_state() returns the
expected state value. Don't we need to check that the recovery is
actually paused and that the promotion happens at expected LSN?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#16Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#13)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On Wed, May 19, 2021 at 11:55 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Wed, 19 May 2021 11:19:13 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Wed, May 19, 2021 at 10:16 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2021/05/18 15:46, Michael Paquier wrote:

On Tue, May 18, 2021 at 12:48:38PM +0900, Fujii Masao wrote:

Currently a promotion causes all available WAL to be replayed before
a standby becomes a primary whether it was in paused state or not.
OTOH, something like immediate promotion (i.e., standby becomes
a primary without replaying outstanding WAL) might be useful for
some cases. I don't object to that.

Sounds like a "promotion immediate" mode. It does not sound difficult
nor expensive to add a small test for that in one of the existing
recovery tests triggerring a promotion. Could you add one based on
pg_get_wal_replay_pause_state()?

You're thinking to add the test like the following?
#1. Pause the recovery
#2. Confirm that pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Confirm that pg_get_wal_replay_pause_state() returns 'not paused'

It seems not easy to do the test #4 stably because
pg_get_wal_replay_pause_state() needs to be executed
before the promotion finishes.

Even for #2, we can not ensure that whether it will be 'paused' or
'pause requested'.

We often use poll_query_until() to make sure some desired state is
reached. And, as Michael suggested, the function
pg_get_wal_replay_pause_state() still works at the time of
recovery_end_command. So a bit more detailed steps are:

Right, if we are polling for the state change in #2 then that makes sense.

#0. Equip the server with recovery_end_command that waits for some
trigger then start the server.
#1. Pause the recovery
#2. Wait until pg_get_wal_replay_pause_state() returns 'paused'
#3. Trigger standby promotion
#4. Wait until pg_get_wal_replay_pause_state() returns 'not paused'
#5. Trigger recovery_end_command to let promotion proceed.

+1

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#17Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#15)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/19 16:43, Kyotaro Horiguchi wrote:

+1 for adding some tests for pg_wal_replay_pause() but the test seems
like checking only that pg_get_wal_replay_pause_state() returns the
expected state value. Don't we need to check that the recovery is
actually paused and that the promotion happens at expected LSN?

Sounds good. Attached is the updated version of the patch.
I added such checks into the test.

BTW, while reading some recovery regression tests, I found that
013_crash_restart.pl has "use Time::HiRes qw(usleep)" but it seems
not necessary. We can safely remove that? Patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

recovery_pause_test_v2.patchtext/plain; charset=UTF-8; name=recovery_pause_test_v2.patch; x-mac-creator=0; x-mac-type=0Download+57-2
remove_unnecessary_hires_usleep_v1.patchtext/plain; charset=UTF-8; name=remove_unnecessary_hires_usleep_v1.patch; x-mac-creator=0; x-mac-type=0Download+0-1
#18Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#17)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/19 19:24, Fujii Masao wrote:

On 2021/05/19 16:43, Kyotaro Horiguchi wrote:

+1 for adding some tests for pg_wal_replay_pause() but the test seems
like checking only that pg_get_wal_replay_pause_state() returns the
expected state value.  Don't we need to check that the recovery is
actually paused and that the promotion happens at expected LSN?

Sounds good. Attached is the updated version of the patch.
I added such checks into the test.

BTW, while reading some recovery regression tests, I found that
013_crash_restart.pl has "use Time::HiRes qw(usleep)" but it seems
not necessary. We can safely remove that? Patch attached.

Barring any objections, I'm thinking to commit these two patches.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#19Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#18)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

Sorry for missing this.

At Mon, 31 May 2021 12:52:54 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2021/05/19 19:24, Fujii Masao wrote:

On 2021/05/19 16:43, Kyotaro Horiguchi wrote:

+1 for adding some tests for pg_wal_replay_pause() but the test seems
like checking only that pg_get_wal_replay_pause_state() returns the
expected state value.  Don't we need to check that the recovery is
actually paused and that the promotion happens at expected LSN?

Sounds good. Attached is the updated version of the patch.
I added such checks into the test.

Thanks! Looks fine. The paused-state test may get false-success but it
would be sufficient that it detects the problem in most cases.

BTW, while reading some recovery regression tests, I found that
013_crash_restart.pl has "use Time::HiRes qw(usleep)" but it seems
not necessary. We can safely remove that? Patch attached.

Looks just fine for the removal of HiRes usage. All other use of
HiRes are accompanied by a usleep usage.

Barring any objections, I'm thinking to commit these two patches.

+1.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#20Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#19)
Re: pg_get_wal_replay_pause_state() should not return 'paused' while a promotion is ongoing.

On 2021/05/31 17:18, Kyotaro Horiguchi wrote:

Sorry for missing this.

At Mon, 31 May 2021 12:52:54 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2021/05/19 19:24, Fujii Masao wrote:

On 2021/05/19 16:43, Kyotaro Horiguchi wrote:

+1 for adding some tests for pg_wal_replay_pause() but the test seems
like checking only that pg_get_wal_replay_pause_state() returns the
expected state value.  Don't we need to check that the recovery is
actually paused and that the promotion happens at expected LSN?

Sounds good. Attached is the updated version of the patch.
I added such checks into the test.

Thanks! Looks fine. The paused-state test may get false-success but it
would be sufficient that it detects the problem in most cases.

BTW, while reading some recovery regression tests, I found that
013_crash_restart.pl has "use Time::HiRes qw(usleep)" but it seems
not necessary. We can safely remove that? Patch attached.

Looks just fine for the removal of HiRes usage. All other use of
HiRes are accompanied by a usleep usage.

Barring any objections, I'm thinking to commit these two patches.

+1.

Thanks for the review! I pushed those two patches.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION