Stronger safeguard for archive recovery not to miss data

Started by osumi.takamichi@fujitsu.comover 5 years ago55 messageshackers
Jump to latest
#1osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com

Hello

The attached patch is intended to prevent a scenario that
archive recovery hits WALs which come from wal_level=minimal
and the server continues to work, which was discussed in the thread of [1]/messages/by-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320@TYAPR01MB2990.jpnprd01.prod.outlook.com.
The motivation is to protect that user ends up with both getting replica
that could miss data and getting the server to miss data in targeted recovery mode.

About how to modify this, we reached the consensus in the thread.
It is by changing the ereport's level from WARNING to FATAL in CheckRequiredParameterValues().

In order to test this fix, what I did is
1 - get a base backup during wal_level is replica
2 - stop the server and change the wal_level from replica to minimal
3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
4 - stop the server and make the wal_level back to replica
5 - start the server again
6 - execute archive recoveries in both cases
(1) by editing the postgresql.conf and
touching recovery.signal in the base backup from 1th step
(2) by making a replica with standby.signal
* During wal_level is replica, I enabled archive_mode in this test.

First of all, I confirmed the server started up without this patch.
After applying this safeguard patch, I checked that
the server cannot start up any more in the scenario case.
I checked the log and got the result below with this patch.

2020-11-26 06:49:46.003 UTC [19715] FATAL: WAL was generated with wal_level=minimal, data may be missing
2020-11-26 06:49:46.003 UTC [19715] HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.

Lastly, this should be backpatched.
Any comments ?

[1]: /messages/by-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320@TYAPR01MB2990.jpnprd01.prod.outlook.com
/messages/by-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320@TYAPR01MB2990.jpnprd01.prod.outlook.com

Best,
Takamichi Osumi

Attachments:

stronger_safeguard_for_archive_recovery.patchapplication/octet-stream; name=stronger_safeguard_for_archive_recovery.patchDownload+1-2
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#1)
Re: Stronger safeguard for archive recovery not to miss data

At Thu, 26 Nov 2020 07:18:39 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in

Hello

The attached patch is intended to prevent a scenario that
archive recovery hits WALs which come from wal_level=minimal
and the server continues to work, which was discussed in the thread of [1].
The motivation is to protect that user ends up with both getting replica
that could miss data and getting the server to miss data in targeted recovery mode.

About how to modify this, we reached the consensus in the thread.
It is by changing the ereport's level from WARNING to FATAL in CheckRequiredParameterValues().

In order to test this fix, what I did is
1 - get a base backup during wal_level is replica
2 - stop the server and change the wal_level from replica to minimal
3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
4 - stop the server and make the wal_level back to replica
5 - start the server again
6 - execute archive recoveries in both cases
(1) by editing the postgresql.conf and
touching recovery.signal in the base backup from 1th step
(2) by making a replica with standby.signal
* During wal_level is replica, I enabled archive_mode in this test.

First of all, I confirmed the server started up without this patch.
After applying this safeguard patch, I checked that
the server cannot start up any more in the scenario case.
I checked the log and got the result below with this patch.

2020-11-26 06:49:46.003 UTC [19715] FATAL: WAL was generated with wal_level=minimal, data may be missing
2020-11-26 06:49:46.003 UTC [19715] HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.

Lastly, this should be backpatched.
Any comments ?

Perhaps we need the TAP test that conducts the above steps.

[1]
/messages/by-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320@TYAPR01MB2990.jpnprd01.prod.outlook.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In reply to: osumi.takamichi@fujitsu.com (#1)
Re: Stronger safeguard for archive recovery not to miss data

Hello

First of all, I confirmed the server started up without this patch.

It is possible only with manually configured hot_standby = off, right?
We have ERROR in hot standby mode just below in CheckRequiredParameterValues.

regards, Sergei

#4osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Kyotaro Horiguchi (#2)
RE: Stronger safeguard for archive recovery not to miss data

Hello, Horiguchi-San

On Thursday, Nov 26, 2020 4:29 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>

At Thu, 26 Nov 2020 07:18:39 +0000, "osumi.takamichi@fujitsu.com"
<osumi.takamichi@fujitsu.com> wrote in

In order to test this fix, what I did is
1 - get a base backup during wal_level is replica
2 - stop the server and change the wal_level from replica to minimal
3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
4 - stop the server and make the wal_level back to replica
5 - start the server again
6 - execute archive recoveries in both cases
(1) by editing the postgresql.conf and
touching recovery.signal in the base backup from 1th step
(2) by making a replica with standby.signal
* During wal_level is replica, I enabled archive_mode in this test.

Perhaps we need the TAP test that conducts the above steps.

It really makes sense that it's better to show the procedures
about how to reproduce the flow.
Thanks for your advice.

Best,
Takamichi Osumi

#5osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Sergei Kornilov (#3)
RE: Stronger safeguard for archive recovery not to miss data

Hello, Sergei

It is possible only with manually configured hot_standby = off, right?
We have ERROR in hot standby mode just below in
CheckRequiredParameterValues.

Yes, you are right. I turned off the hot_standby in the test in the previous e-mail.
I'm sorry that I missed writing this tip of information.

Best,
Takamichi Osumi

#6osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Kyotaro Horiguchi (#2)
RE: Stronger safeguard for archive recovery not to miss data

Hi

On Thursday, November 26, 2020 4:29 PM
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Thu, 26 Nov 2020 07:18:39 +0000, "osumi.takamichi@fujitsu.com"
<osumi.takamichi@fujitsu.com> wrote in

The attached patch is intended to prevent a scenario that archive
recovery hits WALs which come from wal_level=minimal and the server
continues to work, which was discussed in the thread of [1].
The motivation is to protect that user ends up with both getting
replica that could miss data and getting the server to miss data in targeted

recovery mode.

About how to modify this, we reached the consensus in the thread.
It is by changing the ereport's level from WARNING to FATAL in

CheckRequiredParameterValues().

In order to test this fix, what I did is
1 - get a base backup during wal_level is replica
2 - stop the server and change the wal_level from replica to minimal
3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
4 - stop the server and make the wal_level back to replica
5 - start the server again
6 - execute archive recoveries in both cases
(1) by editing the postgresql.conf and
touching recovery.signal in the base backup from 1th step
(2) by making a replica with standby.signal
* During wal_level is replica, I enabled archive_mode in this test.

First of all, I confirmed the server started up without this patch.
After applying this safeguard patch, I checked that the server cannot
start up any more in the scenario case.
I checked the log and got the result below with this patch.

2020-11-26 06:49:46.003 UTC [19715] FATAL: WAL was generated with
wal_level=minimal, data may be missing
2020-11-26 06:49:46.003 UTC [19715] HINT: This happens if you

temporarily set wal_level=minimal without taking a new base backup.

Lastly, this should be backpatched.
Any comments ?

Perhaps we need the TAP test that conducts the above steps.

I added the TAP tests to reproduce and share the result,
using the case of 6-(1) described above.
Here, I created a new file for it because the purposes of other files in
src/recovery didn't match the purpose of my TAP tests perfectly.
If you are dubious about this idea, please have a look at the comments
in each file.

When the attached patch is applied,
my TAP tests are executed like other ones like below.

t/018_wal_optimize.pl ................ ok
t/019_replslot_limit.pl .............. ok
t/020_archive_status.pl .............. ok
t/021_row_visibility.pl .............. ok
t/022_archive_recovery.pl ............ ok
All tests successful.

Also, I confirmed that there's no regression by make check-world.
Any comments ?

Best,
Takamichi Osumi

Attachments:

stronger_safeguard_for_archive_recovery_v02.patchapplication/octet-stream; name=stronger_safeguard_for_archive_recovery_v02.patchDownload+78-2
#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: osumi.takamichi@fujitsu.com (#6)
Re: Stronger safeguard for archive recovery not to miss data

On Tue, 2020-12-08 at 03:08 +0000, osumi.takamichi@fujitsu.com wrote:

On Thursday, November 26, 2020 4:29 PM
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Thu, 26 Nov 2020 07:18:39 +0000, "osumi.takamichi@fujitsu.com"
<osumi.takamichi@fujitsu.com> wrote in

The attached patch is intended to prevent a scenario that archive
recovery hits WALs which come from wal_level=minimal and the server
continues to work, which was discussed in the thread of [1].

Perhaps we need the TAP test that conducts the above steps.

I added the TAP tests to reproduce and share the result,
using the case of 6-(1) described above.
Here, I created a new file for it because the purposes of other files in
src/recovery didn't match the purpose of my TAP tests perfectly.
If you are dubious about this idea, please have a look at the comments
in each file.

When the attached patch is applied,
my TAP tests are executed like other ones like below.

t/018_wal_optimize.pl ................ ok
t/019_replslot_limit.pl .............. ok
t/020_archive_status.pl .............. ok
t/021_row_visibility.pl .............. ok
t/022_archive_recovery.pl ............ ok
All tests successful.

Also, I confirmed that there's no regression by make check-world.
Any comments ?

The patch applies and passes regression tests, as well as the new TAP test.

I think this should be backpatched, since it fixes a bug.

I am not quite happy with the message:

FATAL: WAL was generated with wal_level=minimal, data may be missing
HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.

This sounds too harmless to me and doesn't give the user a clue
what would be the best way to proceed.

Suggestion:

FATAL: WAL was generated with wal_level=minimal, cannot continue recovering
DETAIL: This happens if you temporarily set wal_level=minimal on the primary server.
HINT: Create a new standby from a new base backup after setting wal_level=replica.

Yours,
Laurenz Albe

#8osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Laurenz Albe (#7)
RE: Stronger safeguard for archive recovery not to miss data

Hi, Laurenz

On Friday, January 15, 2021 12:56 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2020-12-08 at 03:08 +0000, osumi.takamichi@fujitsu.com wrote:

On Thursday, November 26, 2020 4:29 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 26 Nov 2020 07:18:39 +0000, "osumi.takamichi@fujitsu.com"
<osumi.takamichi@fujitsu.com> wrote in

The attached patch is intended to prevent a scenario that archive
recovery hits WALs which come from wal_level=minimal and the
server continues to work, which was discussed in the thread of [1].

Perhaps we need the TAP test that conducts the above steps.

I added the TAP tests to reproduce and share the result, using the
case of 6-(1) described above.
Here, I created a new file for it because the purposes of other files
in src/recovery didn't match the purpose of my TAP tests perfectly.
If you are dubious about this idea, please have a look at the comments
in each file.

When the attached patch is applied,
my TAP tests are executed like other ones like below.

t/018_wal_optimize.pl ................ ok t/019_replslot_limit.pl
.............. ok t/020_archive_status.pl .............. ok
t/021_row_visibility.pl .............. ok t/022_archive_recovery.pl
............ ok All tests successful.

Also, I confirmed that there's no regression by make check-world.
Any comments ?

The patch applies and passes regression tests, as well as the new TAP test.

Thank you for checking.

I think this should be backpatched, since it fixes a bug.

Agreed.

I am not quite happy with the message:

FATAL: WAL was generated with wal_level=minimal, data may be missing
HINT: This happens if you temporarily set wal_level=minimal without taking a
new base backup.

This sounds too harmless to me and doesn't give the user a clue what would be
the best way to proceed.

Suggestion:

FATAL: WAL was generated with wal_level=minimal, cannot continue
recovering

Adopted.

DETAIL: This happens if you temporarily set wal_level=minimal on the primary
server.
HINT: Create a new standby from a new base backup after setting
wal_level=replica.

Thanks for your suggestion.
I noticed that this message should cover both archive recovery modes,
which means in recovery mode and standby mode. Then, I combined your
suggestion above with this point of view. Have a look at the updated patch.
I also enriched the new tap tests to show this perspective.

Best Regards,
Takamichi Osumi

Attachments:

stronger_safeguard_for_archive_recovery_v03.patchapplication/octet-stream; name=stronger_safeguard_for_archive_recovery_v03.patchDownload+112-4
#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: osumi.takamichi@fujitsu.com (#8)
Re: Stronger safeguard for archive recovery not to miss data

On Mon, 2021-01-18 at 07:34 +0000, osumi.takamichi@fujitsu.com wrote:

I noticed that this message should cover both archive recovery modes,
which means in recovery mode and standby mode. Then, I combined your
suggestion above with this point of view. Have a look at the updated patch.
I also enriched the new tap tests to show this perspective.

Looks good, thanks.

I'll mark this patch as "ready for committer".

Yours,
Laurenz Albe

#10Fujii Masao
masao.fujii@gmail.com
In reply to: Laurenz Albe (#9)
Re: Stronger safeguard for archive recovery not to miss data

On 2021/01/20 1:05, Laurenz Albe wrote:

On Mon, 2021-01-18 at 07:34 +0000, osumi.takamichi@fujitsu.com wrote:

I noticed that this message should cover both archive recovery modes,
which means in recovery mode and standby mode. Then, I combined your
suggestion above with this point of view. Have a look at the updated patch.
I also enriched the new tap tests to show this perspective.

Looks good, thanks.

I'll mark this patch as "ready for committer".

+ errhint("Run recovery again from a new base backup taken after setting wal_level higher than minimal")));

Isn't it impossible to do this in normal archive recovery case? In that case,
since the server crashed and the database got corrupted, probably
we cannot take a new base backup.

Originally even when users accidentally set wal_level to minimal, they could
start the server from the backup by disabling hot_standby and salvage the data.
But with the patch, we lost the way to do that. Right? I was wondering that
WARNING was used intentionally there for that case.

if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
ereport(ERROR,
(errmsg("hot standby is not possible because wal_level was not set to \"replica\" or higher on the primary server"),
errhint("Either set wal_level to \"replica\" on the primary, or turn off hot_standby here.")));

With the patch, we never reach the above code?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#10)
Re: Stronger safeguard for archive recovery not to miss data

On Wed, 2021-01-20 at 13:10 +0900, Fujii Masao wrote:

+ errhint("Run recovery again from a new base backup taken after setting wal_level higher than minimal")));

Isn't it impossible to do this in normal archive recovery case? In that case,
since the server crashed and the database got corrupted, probably
we cannot take a new base backup.

Originally even when users accidentally set wal_level to minimal, they could
start the server from the backup by disabling hot_standby and salvage the data.
But with the patch, we lost the way to do that. Right? I was wondering that
WARNING was used intentionally there for that case.

I would argue that if you set your "wal_level" to minimal, do some work,
reset it to "replica" and recover past that, two things could happen:

1. Since "archive_mode" was off, you are missing some WAL segments and
cannot recover past that point anyway.

2. You are missing some relevant WAL records, and your recovered
database is corrupted. This is very likely, because you probably
switched to "minimal" with the intention to generate less WAL.

Now I see your point that a corrupted database may be better than no
database at all, but wouldn't you agree that a warning in the log
(that nobody reads) is too little information?

Normally we don't take such a relaxed attitude towards data corruption.

Perhaps there could be a GUC "recovery_allow_data_corruption" to
override this check, but I'd say that a warning is too little.

if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
ereport(ERROR,
(errmsg("hot standby is not possible because wal_level was not set to \"replica\" or higher on the primary server"),
errhint("Either set wal_level to \"replica\" on the primary, or turn off hot_standby here.")));

With the patch, we never reach the above code?

Right, that would have to go. I didn't notice that.

Yours,
Laurenz Albe

#12osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Laurenz Albe (#11)
RE: Stronger safeguard for archive recovery not to miss data

Hi

On Wed, Jan 20, 2021 9:03 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2021-01-20 at 13:10 +0900, Fujii Masao wrote:

+                                errhint("Run recovery again from a
+ new base backup taken after setting wal_level higher than
+ minimal")));

Isn't it impossible to do this in normal archive recovery case? In
that case, since the server crashed and the database got corrupted,
probably we cannot take a new base backup.

Originally even when users accidentally set wal_level to minimal, they
could start the server from the backup by disabling hot_standby and salvage

the data.

But with the patch, we lost the way to do that. Right? I was wondering
that WARNING was used intentionally there for that case.

OK. I understand that this WARNING is necessary to give user a chance
to start up the server again and salvage his/her data,
when hot_standby=off and the server goes through a period of wal_level=minimal
during an archive recovery.

I would argue that if you set your "wal_level" to minimal, do some work, reset it
to "replica" and recover past that, two things could happen:

1. Since "archive_mode" was off, you are missing some WAL segments and
cannot recover past that point anyway.

2. You are missing some relevant WAL records, and your recovered
database is corrupted. This is very likely, because you probably
switched to "minimal" with the intention to generate less WAL.

Now I see your point that a corrupted database may be better than no database
at all, but wouldn't you agree that a warning in the log (that nobody reads) is
too little information?

Normally we don't take such a relaxed attitude towards data corruption.

Yeah, we thought we needed stronger protection for that reason.

Perhaps there could be a GUC "recovery_allow_data_corruption" to override this
check, but I'd say that a warning is too little.

if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
ereport(ERROR,
(errmsg("hot standby is not

possible because wal_level was not set to \"replica\" or higher on the primary
server"),

errhint("Either set wal_level
to \"replica\" on the primary, or turn off hot_standby here.")));

With the patch, we never reach the above code?

Right, that would have to go. I didn't notice that.

Adding a condition to check if "recovery_allow_data_corruption" is 'on' around the end of
CheckRequiredParameterValues() sounds safer for me too, although
implementing a new GUC parameter sounds bigger than what I expected at first.
The default of the value should be 'off' to protect users from getting the corrupted server.
Does everyone agree with this direction ?

Best Regards,
Takamichi Osumi

#13Laurenz Albe
laurenz.albe@cybertec.at
In reply to: osumi.takamichi@fujitsu.com (#12)
Re: Stronger safeguard for archive recovery not to miss data

On Thu, 2021-01-21 at 11:49 +0000, osumi.takamichi@fujitsu.com wrote:

Adding a condition to check if "recovery_allow_data_corruption" is 'on' around the end of
CheckRequiredParameterValues() sounds safer for me too, although
implementing a new GUC parameter sounds bigger than what I expected at first.
The default of the value should be 'off' to protect users from getting the corrupted server.
Does everyone agree with this direction ?

I'd say that adding such a GUC is material for another patch, if we want it at all.

I think it is very unlikely that people will switch from "wal_level=replica" to
"minimal" and back very soon afterwards and also try to recover past such
a switch, which probably explains why nobody has complained about data corruption
generated that way. To get the server to start with "wal_level=minimal", you must
set "archive_mode=off" and "max_wal_senders=0", and few people will do that and
still expect recovery to work.

My vote is that we should not have a GUC for such an unlikely event, and that
stopping recovery is good enough.

Yours,
Laurenz Albe

#14osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Laurenz Albe (#13)
RE: Stronger safeguard for archive recovery not to miss data

Hi, Laurenz

On Thursday, January 21, 2021 9:51 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2021-01-21 at 11:49 +0000, osumi.takamichi@fujitsu.com wrote:

Adding a condition to check if "recovery_allow_data_corruption" is
'on' around the end of
CheckRequiredParameterValues() sounds safer for me too, although
implementing a new GUC parameter sounds bigger than what I expected at

first.

The default of the value should be 'off' to protect users from getting the

corrupted server.

Does everyone agree with this direction ?

I'd say that adding such a GUC is material for another patch, if we want it at all.

OK. You meant another different patch.

I think it is very unlikely that people will switch from "wal_level=replica" to
"minimal" and back very soon afterwards and also try to recover past such a
switch, which probably explains why nobody has complained about data
corruption generated that way. To get the server to start with
"wal_level=minimal", you must set "archive_mode=off" and
"max_wal_senders=0", and few people will do that and still expect recovery to
work.

Yeah, the possibility is low of course.

My vote is that we should not have a GUC for such an unlikely event, and that
stopping recovery is good enough.

OK. IIUC, my current patch for this fix doesn't need to be changed or withdrawn.
Thank you for your explanation.

Best Regards,
Takamichi Osumi

#15Laurenz Albe
laurenz.albe@cybertec.at
In reply to: osumi.takamichi@fujitsu.com (#14)
Re: Stronger safeguard for archive recovery not to miss data

On Thu, 2021-01-21 at 13:09 +0000, osumi.takamichi@fujitsu.com wrote:

My vote is that we should not have a GUC for such an unlikely event, and that
stopping recovery is good enough.

OK. IIUC, my current patch for this fix doesn't need to be changed or withdrawn.
Thank you for your explanation.

Well, that's just my opinion.
Fujii Masao seemed to disagree with the patch, and his voice carries weight.

Yours,
Laurenz Albe

#16Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#15)
Re: Stronger safeguard for archive recovery not to miss data

On Thu, 2021-01-21 at 15:30 +0100, I wrote:

On Thu, 2021-01-21 at 13:09 +0000, osumi.takamichi@fujitsu.com wrote:

My vote is that we should not have a GUC for such an unlikely event, and that
stopping recovery is good enough.

OK. IIUC, my current patch for this fix doesn't need to be changed or withdrawn.
Thank you for your explanation.

Well, that's just my opinion.

Fujii Masao seemed to disagree with the patch, and his voice carries weight.

I think you should pst another patch where the second, now superfluous,
error message is removed.

Yours,
Laurenz Albe

#17osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Laurenz Albe (#16)
RE: Stronger safeguard for archive recovery not to miss data

Hi

On Monday, January 25, 2021 5:13 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2021-01-21 at 15:30 +0100, I wrote:

On Thu, 2021-01-21 at 13:09 +0000, osumi.takamichi@fujitsu.com wrote:

My vote is that we should not have a GUC for such an unlikely
event, and that stopping recovery is good enough.

OK. IIUC, my current patch for this fix doesn't need to be changed or

withdrawn.

Thank you for your explanation.

Well, that's just my opinion.

Fujii Masao seemed to disagree with the patch, and his voice carries weight.

I think you should pst another patch where the second, now superfluous, error
message is removed.

Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.

Best Regards,
Takamichi Osumi

Attachments:

stronger_safeguard_for_archive_recovery_v04.patchapplication/octet-stream; name=stronger_safeguard_for_archive_recovery_v04.patchDownload+140-37
#18Laurenz Albe
laurenz.albe@cybertec.at
In reply to: osumi.takamichi@fujitsu.com (#17)
Re: Stronger safeguard for archive recovery not to miss data

On Mon, 2021-01-25 at 08:19 +0000, osumi.takamichi@fujitsu.com wrote:

I think you should pst another patch where the second, now superfluous, error
message is removed.

Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.

Looks good to me.
I'll set it to "ready for committer" again.

Yours,
Laurenz Albe

#19David Steele
david@pgmasters.net
In reply to: Laurenz Albe (#18)
Re: Stronger safeguard for archive recovery not to miss data

On 1/25/21 3:55 AM, Laurenz Albe wrote:

On Mon, 2021-01-25 at 08:19 +0000, osumi.takamichi@fujitsu.com wrote:

I think you should pst another patch where the second, now superfluous, error
message is removed.

Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.

Looks good to me.
I'll set it to "ready for committer" again.

Fujii, does the new patch in [1]/messages/by-id/OSBPR01MB48887EFFCA39FA9B1DBAFB0FEDBD0@OSBPR01MB4888.jpnprd01.prod.outlook.com address your concerns?

Regards,
--
-David
david@pgmasters.net

[1]: /messages/by-id/OSBPR01MB48887EFFCA39FA9B1DBAFB0FEDBD0@OSBPR01MB4888.jpnprd01.prod.outlook.com
/messages/by-id/OSBPR01MB48887EFFCA39FA9B1DBAFB0FEDBD0@OSBPR01MB4888.jpnprd01.prod.outlook.com

#20Fujii Masao
masao.fujii@gmail.com
In reply to: David Steele (#19)
Re: Stronger safeguard for archive recovery not to miss data

On 2021/03/25 23:21, David Steele wrote:

On 1/25/21 3:55 AM, Laurenz Albe wrote:

On Mon, 2021-01-25 at 08:19 +0000, osumi.takamichi@fujitsu.com wrote:

I think you should pst another patch where the second, now superfluous, error
message is removed.

Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.

Looks good to me.
I'll set it to "ready for committer" again.

Fujii, does the new patch in [1] address your concerns?

No. I'm still not sure if this patch is good idea... I understand
why this safeguard is necessary. OTOH I'm afraid it increases
a bit the risk that users get unstartable database, i.e., lose whole database.
But maybe I'm concerned about rare case and my opinion is minority one.
So I'd like to hear more opinions about this patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#21David Steele
david@pgmasters.net
In reply to: Fujii Masao (#20)
#22Fujii Masao
masao.fujii@gmail.com
In reply to: David Steele (#21)
#23Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#22)
#24Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#23)
#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#22)
#26Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#25)
#27osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Kyotaro Horiguchi (#26)
#28Fujii Masao
masao.fujii@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#27)
#29Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Fujii Masao (#28)
#30osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Fujii Masao (#28)
#31osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Laurenz Albe (#29)
#32Fujii Masao
masao.fujii@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#30)
#33tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: osumi.takamichi@fujitsu.com (#27)
#34Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Fujii Masao (#32)
#35Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: tsunakawa.takay@fujitsu.com (#33)
#36tsunakawa.takay@fujitsu.com
tsunakawa.takay@fujitsu.com
In reply to: Kyotaro Horiguchi (#35)
#37Fujii Masao
masao.fujii@gmail.com
In reply to: Kyotaro Horiguchi (#34)
#38osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Fujii Masao (#32)
#39osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Fujii Masao (#37)
#40David Steele
david@pgmasters.net
In reply to: Fujii Masao (#32)
#41osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: osumi.takamichi@fujitsu.com (#38)
#42osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: osumi.takamichi@fujitsu.com (#41)
#43Fujii Masao
masao.fujii@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#38)
#44osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Fujii Masao (#43)
#45Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#44)
#46osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Kyotaro Horiguchi (#45)
#47Fujii Masao
masao.fujii@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#46)
#48David Steele
david@pgmasters.net
In reply to: Fujii Masao (#47)
#49osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: David Steele (#48)
#50Fujii Masao
masao.fujii@gmail.com
In reply to: David Steele (#48)
#51Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#50)
#53Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#51)
#54Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#52)
#55osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Fujii Masao (#54)