Race condition in recovery?

Started by Dilip Kumarabout 5 years ago131 messageshackers
Jump to latest
#1Dilip Kumar
dilipbalaut@gmail.com

While analyzing one of the customer issues, based on the log it
appeared that there is a race condition in the recovery process.

The summary of the issue is, that one of the standby is promoted as
the new primary (Node2) and another standby (Node3) is restarted and
set the primary_info and the restore_command so that it can
stream/restore from Node2 (new primary). The problem is that during
the promotion the timeline switch happened in the middle of the
segment in Node2 and the Node3 is able to restore the newTLI.history
file from the archive but the WAL file with the new TLI is not yet
archived. Now, we will try to stream the wal file from the primary
but if we are fetching the checkpoint that time we will not use the
latest timeline instead we will try with the checkpoint timeline and
walsender will send the WAL from the new timeline file because
requested WAL recptr is before the TLI switch. And once that
happened the expectedTLEs will be set based on the oldTLI.history
file. Now, whenever we try to restore the required WAL file and
recoveryTargetTimeLineGoal is set to the latest we again try to rescan
the latest timeline (rescanLatestTimeLine) but the problem is
recoveryTargetTLI was already set to the latest timeline. So now the
problem is expectedTLEs is set to oldTLI and recoveryTargetTLI is set
to newTLI and rescanLatestTimeLine will never update the expectedTLEs.
Now, walsender will fail to stream new WAL using the old TLI and the
archiver process will also fail because that file doesn't not exists
anymore (converted to .partial). Basically, now we will never try
with the newTLI.

I have given the sequence of the events based on my analysis.

Refer to the sequence of event
-----------------------------------------
Node1 primary, Node2 standby1, Node3 standby2

1. Node2 got promoted to new primary, and node 2 picked new TL 13 in
the middle of the segment.
2. Node3, restarted with new primary info of Node2 and restore command
3. Node3, found the newest TL13 in validateRecoveryParameters()
Because the latest TL was requested in recovery.conf (history file
restored from TL13) and set recoveryTargetTLI to 13
So point to note is recoveryTargetTLI is set to 13 but expectedTLEs is
not yet set.
4. Node3, entered into the standby mode.
5. Node3, tries to read the checkpoint Record, on Node3 still the
checkpoint TL (ControlFile->checkPointCopy.ThisTimeLineID) is 12.
6. Node3, tries to get the checkpoint record file using new TL13 from
the archive which it should get ideally but it may not if the Node2
haven't yet archived it.
7. Node3, tries to stream from primary but using TL12 because
ControlFile->checkPointCopy.ThisTimeLineID is 12.
8. Node3, get it because walsender of Node2 read it from TL13 and send
it and Node2 write in the new WAL file but with TL12.

WalSndSegmentOpen()
{
/*-------
* When reading from a historic timeline, and there is a timeline switch
* within this segment, read from the WAL segment belonging to the new
* timeline.
}

9. Node3, now set the expectedTLEs to 12 because that is what
walreceiver has streamed the WAL using.

10. Node3, now recoveryTargetTLI is 13 and expectedTLEs is 12. So
whenever it tries to find the latest TLE (rescanLatestTimeLine ) it
finds it is 13 which is the same as recoveryTargetTLI so nothing to
change but expectedTLEs is 12 using which it will try to
restore/stream further WAL and fail every time.

rescanLatestTimeLine(void)
{
....
newtarget = findNewestTimeLine(recoveryTargetTLI);
if (newtarget == recoveryTargetTLI)
{
/* No new timelines found */
return false;
}

11. So now the situation is that ideally in rescanLatestTimeLine() we
should get the latest TLI but it assumes that it is already on the
latest TLI because recoveryTargetTLI is on the latest TLI.

Other points to be noted:
- The actual issue happened on 9.6.11 but based on the code comparison
it appeared that same can occur on the latest code as well.
- After Node3 is shutdown wal from its pg_wal/ directory were removed
so that it can follow the new primary.

Based on the sequence of events It is quite clear that something is
wrong in rescanLatestTimeLine, maybe after selecting the latest TLI we
should compare it with the head of the expectedTLEs as well instead of
just comparing it to the recoveryTargetTLI?

Log from Node2:
2020-12-22 04:49:02 UTC [1019]: [9-1] user=; db=; app=; client=;
SQLcode=00000 LOG: received promote request
2020-12-22 04:49:02 UTC [1019]: [10-1] user=; db=; app=; client=;
SQLcode=00000 LOG: redo done at 0/F8000028
2020-12-22 04:49:02 UTC [1019]: [11-1] user=; db=; app=; client=;
SQLcode=00000 LOG: last completed transaction was at log time
2020-12-22 04:48:01.833476+00
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000C00000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000D.history" failed:
No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
2020-12-22 04:49:02 UTC [1019]: [12-1] user=; db=; app=; client=;
SQLcode=00000 LOG: selected new timeline ID: 13
2020-12-22 04:49:02 UTC [1019]: [13-1] user=; db=; app=; client=;
SQLcode=00000 LOG: archive recovery complete

Log From Node3 (with pointwise analysis):

1. Node3 restarted, restored 0000000D.history from archive and
recoveryTargetTLI is set to 13
2020-12-22 04:49:40 UTC [2896]: [2-1] user=; db=; app=; client=;
SQLcode=00000 LOG: database system is shut down
2020-12-22 04:49:40 UTC [2872]: [6-1] user=; db=; app=; client=;
SQLcode=00000 LOG: database system is shut down
2020-12-22 04:49:41 UTC [9082]: [1-1] user=; db=; app=; client=;
SQLcode=00000 LOG: database system was shut down in recovery at
2020-12-22 04:49:40 UTC
2020-12-22 04:49:41 UTC [9082]: [2-1] user=; db=; app=; client=;
SQLcode=00000 LOG: creating missing WAL directory
"pg_xlog/archive_status"
2020-12-22 04:49:41 UTC [9082]: [3-1] user=; db=; app=; client=;
SQLcode=00000 LOG: restored log file "0000000D.history" from archive
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000E.history" failed:
No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
2020-12-22 04:49:41 UTC [9082]: [4-1] user=; db=; app=; client=;
SQLcode=00000 LOG: entering standby mode
2020-12-22 04:49:41 UTC [9082]: [5-1] user=; db=; app=; client=;
SQLcode=00000 LOG: restored log file "0000000D.history" from archive

2. Tries to get the WAL file with different timelines from the archive
but did not get so expectedTLEs is not yet set

rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000D00000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000C00000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000B00000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000100000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]

3. Since fetching the checkpoint record so use the checkpoint TLI
which is 12, although primary doesn’t have 0000000C00000000000000F8
file as it renamed it to 0000000C00000000000000F8.partial

But there is the logic in walsender that if requested wal is there in
the current TLI then send from their so it will stream from
0000000D00000000000000F8 file

2020-12-22 04:49:42 UTC [9105]: [1-1] user=; db=; app=; client=;
SQLcode=00000 LOG: fetching timeline history file for timeline 12
from primary server
2020-12-22 04:49:42 UTC [9105]: [2-1] user=; db=; app=; client=;
SQLcode=00000 LOG: started streaming WAL from primary at 0/F8000000
on timeline 12
2020-12-22 04:49:42 UTC [9105]: [3-1] user=; db=; app=; client=;
SQLcode=00000 LOG: replication terminated by primary server
2020-12-22 04:49:42 UTC [9105]: [4-1] user=; db=; app=; client=;
SQLcode=00000 DETAIL: End of WAL reached on timeline 12 at
0/F8000098.

4. Now since walreciver assumes that it has restore the WAL from the
TL 12 the recieveTLI is 12 and the expectedTLEs is set base on the
0000000C.history.
See below Logic in WaitForWalToBecomeAvailable
if (readFile < 0)
{
if (!expectedTLEs)
expectedTLEs = readTimeLineHistory(receiveTLI);

2020-12-22 04:49:42 UTC [9082]: [6-1] user=; db=; app=; client=;
SQLcode=00000 LOG: restored log file "0000000C.history" from archive
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000E.history" failed:
No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000C00000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
2020-12-22 04:49:42 UTC [9082]: [7-1] user=; db=; app=; client=;
SQLcode=00000 LOG: restored log file "0000000C.history" from archive
2020-12-22 04:49:42 UTC [9082]: [8-1] user=; db=; app=; client=;
SQLcode=00000 LOG: consistent recovery state reached at 0/F8000098
2020-12-22 04:49:42 UTC [9082]: [9-1] user=; db=; app=; client=;
SQLcode=00000 LOG: invalid record length at 0/F8000098: wanted 24,
got 0
2020-12-22 04:49:42 UTC [9074]: [3-1] user=; db=; app=; client=;
SQLcode=00000 LOG: database system is ready to accept read only
connections
2020-12-22 04:49:42 UTC [9105]: [5-1] user=; db=; app=; client=;
SQLcode=00000 LOG: restarted WAL streaming at 0/F8000000 on timeline
12
2020-12-22 04:49:42 UTC [9105]: [6-1] user=; db=; app=; client=;
SQLcode=00000 LOG: replication terminated by primary server
2020-12-22 04:49:42 UTC [9105]: [7-1] user=; db=; app=; client=;
SQLcode=00000 DETAIL: End of WAL reached on timeline 12 at
0/F8000098.

4. Now, expectedTLEs head is as 12 and recoveryTargetTLI is 13 so in
rescanLatestTimeLine we always assume we are at the latest Ali but we
try to archive from expectedTLEs which is old TLI.

rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000E.history" failed:
No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000C00000000000000F8"
failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
2020-12-22 04:49:47 UTC [9105]: [8-1] user=; db=; app=; client=;
SQLcode=00000 LOG: restarted WAL streaming at 0/F8000000 on timeline
12
2020-12-22 04:49:47 UTC [9105]: [9-1] user=; db=; app=; client=;
SQLcode=00000 LOG: replication terminated by primary server
2020-12-22 04:49:47 UTC [9105]: [10-1] user=; db=; app=; client=;
SQLcode=00000 DETAIL: End of WAL reached on timeline 12 at
0/F8000098.
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000E.history" failed:
No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1179) [sender=3.1.2]
rsync: link_stat "/wal_archive/ins30wfm02dbs/0000000C00000000000000F8"
failed: No such file or directory (2)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#2Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#1)
Re: Race condition in recovery?

On Thu, Jan 21, 2021 at 4:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

8. Node3, get it because walsender of Node2 read it from TL13 and send
it and Node2 write in the new WAL file but with TL12.

WalSndSegmentOpen()
{
/*-------
* When reading from a historic timeline, and there is a timeline switch
* within this segment, read from the WAL segment belonging to the new
* timeline.
}

9. Node3, now set the expectedTLEs to 12 because that is what
walreceiver has streamed the WAL using.

This seems to be incorrect, because the comment for expectedTLEs says:

* expectedTLEs: a list of TimeLineHistoryEntries for
recoveryTargetTLI and the timelines of
* its known parents, newest first (so recoveryTargetTLI is always the
* first list member). Only these TLIs are expected to be seen in the WAL
* segments we read, and indeed only these TLIs will be considered as
* candidate WAL files to open at all.

But in your scenario apparently we end up with a situation that
contradicts that, because you go on to say:

10. Node3, now recoveryTargetTLI is 13 and expectedTLEs is 12. So

As I understand, expectedTLEs should end up being a list where the
first element is the timeline we want to end up on, and the last
element is the timeline where we are now, and every timeline in the
list branches off of the next timeline in the list. So here if 13
branches of 12 then the list should be 13,12 not just 12; and if 13
does not branch off of 12 OR if 13 branches off of 12 at an earlier
point in the WAL stream than where we are now, then that should be an
error that shuts down the standby, because then there is no way for
replay to get from where it is now to the correct timeline.

--
Robert Haas
EDB: http://www.enterprisedb.com

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#2)
Re: Race condition in recovery?

On Fri, Jan 22, 2021 at 2:05 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 21, 2021 at 4:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

8. Node3, get it because walsender of Node2 read it from TL13 and send
it and Node2 write in the new WAL file but with TL12.

WalSndSegmentOpen()
{
/*-------
* When reading from a historic timeline, and there is a timeline switch
* within this segment, read from the WAL segment belonging to the new
* timeline.
}

9. Node3, now set the expectedTLEs to 12 because that is what
walreceiver has streamed the WAL using.

This seems to be incorrect, because the comment for expectedTLEs says:

* expectedTLEs: a list of TimeLineHistoryEntries for
recoveryTargetTLI and the timelines of
* its known parents, newest first (so recoveryTargetTLI is always the
* first list member). Only these TLIs are expected to be seen in the WAL
* segments we read, and indeed only these TLIs will be considered as
* candidate WAL files to open at all.

But in your scenario apparently we end up with a situation that
contradicts that, because you go on to say:

10. Node3, now recoveryTargetTLI is 13 and expectedTLEs is 12. So

As I understand, expectedTLEs should end up being a list where the
first element is the timeline we want to end up on, and the last
element is the timeline where we are now, and every timeline in the
list branches off of the next timeline in the list. So here if 13
branches of 12 then the list should be 13,12 not just 12; and if 13
does not branch off of 12 OR if 13 branches off of 12 at an earlier
point in the WAL stream than where we are now, then that should be an
error that shuts down the standby, because then there is no way for
replay to get from where it is now to the correct timeline.

Yeah, I agree with this. So IMHO the expectedTLEs should be set based
on the recoveryTargetTLI instead of receiveTLI. Based on the
expectedTLEs definition it can never be correct to set it based on the
receiveTLI.

Basically, the simple fix could be this.

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index b18257c198..465bc7c929 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12533,7 +12533,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,
bool randAccess,
                                                if (readFile < 0)
                                                {
                                                        if (!expectedTLEs)
-
expectedTLEs = readTimeLineHistory(receiveTLI);
+
expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
+
                                                       readFile =
XLogFileRead(readSegNo, PANIC,

receiveTLI,

XLOG_FROM_STREAM, false);

But I am afraid that whether this adjustment (setting based on
receiveTLI) is done based on some analysis or part of some bug fix. I
will try to find the history of this and maybe based on that we can
make a better decision.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#3)
Re: Race condition in recovery?

On Sat, Jan 23, 2021 at 10:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

But I am afraid that whether this adjustment (setting based on
receiveTLI) is done based on some analysis or part of some bug fix. I
will try to find the history of this and maybe based on that we can
make a better decision.

I have done further analysis of this, so this of initializing the
expectedTLEs from receiveTLI instead of recoveryTargetTLI is done in
below commit.

=====
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58

Delay reading timeline history file until it's fetched from master.

Streaming replication can fetch any missing timeline history files from the
master, but recovery would read the timeline history file for the target
timeline before reading the checkpoint record, and before walreceiver has
had a chance to fetch it from the master. Delay reading it, and the sanity
checks involving timeline history, until after reading the checkpoint
record.

There is at least one scenario where this makes a difference: if you take
a base backup from a standby server right after a timeline switch, the
WAL segment containing the initial checkpoint record will begin with an
older timeline ID. Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

I did not understand one point about this commit message, "Without the
timeline history file, recovering that file will fail as the older
timeline ID is not recognized to be an ancestor of the target
timeline. " I mean in which case this will be true?

Now the problem is that we have initialized the expectTLEs based on
the older timeline history file instead of recoveryTargetTLI, which is
breaking the sanity of expectedTLEs. So in below function
(rescanLatestTimeLine), if we find the newest TLI is same as
recoveryTargetTLI then we assume we don't need to do anything but the
problem is expectedTLEs is set to wrong target and we never update
unless again timeline changes. So I think first we need to identify
what the above commit is trying to achieve and then can we do it in a
better way without breaking the sanity of expectedTLEs.

rescanLatestTimeLine(void)
{
....
newtarget = findNewestTimeLine(recoveryTargetTLI);
if (newtarget == recoveryTargetTLI)
{
/* No new timelines found */
return false;
}
...
newExpectedTLEs = readTimeLineHistory(newtarget);
...
expectedTLEs = newExpectedTLEs;
}

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#4)
Re: Race condition in recovery?

On Tue, Mar 2, 2021 at 3:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

=====
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58

Delay reading timeline history file until it's fetched from master.

Streaming replication can fetch any missing timeline history files from the
master, but recovery would read the timeline history file for the target
timeline before reading the checkpoint record, and before walreceiver has
had a chance to fetch it from the master. Delay reading it, and the sanity
checks involving timeline history, until after reading the checkpoint
record.

There is at least one scenario where this makes a difference: if you take
a base backup from a standby server right after a timeline switch, the
WAL segment containing the initial checkpoint record will begin with an
older timeline ID. Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

The above commit avoid initializing the expectedTLEs from the
recoveryTargetTLI as shown in below hunk from this commit.

@@ -5279,49 +5299,6 @@ StartupXLOG(void)
*/
readRecoveryCommandFile();

- /* Now we can determine the list of expected TLIs */
- expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-

I think the fix for the problem will be that, after reading/validating
the checkpoint record, we can free the current value of expectedTLEs
and reinitialize it based on the recoveryTargetTLI as shown in the
attached patch?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-After-reading-checkpoint-record-fix-expectedTLEs-to-.patchtext/x-patch; charset=US-ASCII; name=0001-After-reading-checkpoint-record-fix-expectedTLEs-to-.patchDownload+7-1
#6Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#5)
Re: Race condition in recovery?

At Tue, 4 May 2021 17:41:06 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

I think the fix for the problem will be that, after reading/validating
the checkpoint record, we can free the current value of expectedTLEs
and reinitialize it based on the recoveryTargetTLI as shown in the
attached patch?

I'm not sure I understand the issue here. I think that the attached
should reproduce the issue mentioned here, but didn't for me.

The result of running the attached test script is shown below. TLIs
are adjusted in your descriptions cited below. The lines prefixed by
NodeN> are the server log lines written while running the attached
test script.

1. Node2 got promoted to new primary, and node 2 picked new TL 2 in
the middle of the segment 3.

Node2> LOG: selected new timeline ID: 2

2. Node3, restarted with new primary info of Node2 and restore command

Node2> node_3 LOG: received replication command: IDENTIFY_SYSTEM

3. Node3, found the newest TL2 in validateRecoveryParameters() Because
the latest TL was requested in recovery.conf (history file restored
from TL2) and set recoveryTargetTLI to 2 So point to note is
recoveryTargetTLI is set to 2 but expectedTLEs is not yet set.

This means you specified recovery_target_timeline? Either way,
expectedTLEs is not relevant to the behavior here. Even if
recovery_target_timeline is set to latest, findNewestTimeLine doesn't
look it.

Node3> LOG: restored log file "00000002.history" from archive

4. Node3, entered into the standby mode.

Node3> LOG: entering standby mode

5. Node3, tries to read the checkpoint Record, on Node3 still the
checkpoint TL (ControlFile->checkPointCopy.ThisTimeLineID) is 1.

expectedTLEs is loaded just before fetching the last checkpoint.

ReadCheckpointRecord doesn't consider checkPointCopy.ThisTimeLineID.

The reason for the checkpoint TLI is that the segment file was that of
the newest TLI in expectedTLEs found in pg_wal directory. If the
segment for TLI=2 containing the last checkpoint had been archived,
checkpoint record would be read as TLI=2. Replication starts at TLI=2
in this case because archive recovery has reached that timeline.
(Turn on the optional section in the attached test script to see this
behavior.) This is the expected behavior since we assume that the
segment files for TLI=n and n+1 are identical in the TLI=n part.

Anyway the checkpoint that is read is on TLI=1 in this case and
replication starts at TLI=1.

Node3> LOG: Checkpoint record: TLI=1, 0/3014F78

6. Node3, tries to get the checkpoint record file using new TL2 from
the archive which it should get ideally but it may not if the Node2
haven't yet archived it.

This doesn't happen for me. Instead, node3 runs recovery from the
checkpoint up to the end of the archived WAL. In this case the end
point is 3014FF0@TLI=1.

Node3> LOG: invalid record length at 0/3014FF0: wanted 24, got 0

Then, node3 connects to node2 requesting TLI=1 because the history
file (or expectedTLEs) told that the LSN belongs to TLI=1.

Node3> LOG: 0/3014FF0 is on TLI 1
Node3> LOG: started streaming WAL from primary at 0/3000000 on timeline 1

After a while node2 finds a timeline switch and disconnects the
replication.

Node3> LOG: replication terminated by primary server
Node3> DETAIL: End of WAL reached on timeline 1 at 0/3029A68.

After scanning the archive and pg_wal ends in failure, node3 correctly
requests node2 for TLI=2 because expectedTLEs told that the current
LSN belongs to TLI=2.

Node3> LOG: 0/3029A68 is on TLI 2
Node3> LOG: restarted WAL streaming at 0/3000000 on timeline 2

Finally the items below don't happen for me, because node3 needs not
to go back to the last checkpoint any longer. Perhaps the script is
failing to reproduce your issue correctly.

7. Node3, tries to stream from primary but using TL1 because
ControlFile->checkPointCopy.ThisTimeLineID is 1.

As mentioned above, the checkPointCopy.ThisTimeLineID on either the
primary and secondary is irrelevant to the timline the primary
sends. The primary streams the timeline requested by the secondary.

8. Node3, get it because walsender of Node2 read it from TL2 and send
it and Node2 write in the new WAL file but with TL1.

Walsender strems the requested TLI from walreceiver, then disconnects
at the end of the TLI notifying node3 of the next TLI. Node3
re-establishes replication with the new TLI. Looking into pg_wal of
node3, segment 3 for both TLI=1 and 2 are filled by the correct
content.

So,,, I don't understand what are you saying is the race condition..

An issue that may be slightly relevant to this case have been raised
[1]: . But it is about writing end-of-recovery checkpoint into the older timeline.
timeline.

Could you please fix the test script so that it causes your issue
correctly? And/or elaborate a bit more?

The attached first file is the debugging aid logging. The second is
the test script, to be placed in src/test/recovery/t.

1: /messages/by-id/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

extralog.diff.txttext/plain; charset=us-asciiDownload+16-2
000_test.pltext/plain; charset=us-asciiDownload
#7Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#6)
Re: Race condition in recovery?

On Fri, May 7, 2021 at 8:23 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Tue, 4 May 2021 17:41:06 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
Could you please fix the test script so that it causes your issue
correctly? And/or elaborate a bit more?

The attached first file is the debugging aid logging. The second is
the test script, to be placed in src/test/recovery/t.

I will look into your test case and try to see whether we can
reproduce the issue. But let me summarise what is the exact issue.
Basically, the issue is that first in validateRecoveryParameters if
the recovery target is the latest then we fetch the latest history
file and set the recoveryTargetTLI timeline to the latest available
timeline assume it's 2 but we delay updating the expectedTLEs (as per
commit ee994272ca50f70b53074f0febaec97e28f83c4e). Now, while reading
the checkpoint record if we don't get the required WAL from the
archive then we try to get from primary, and while getting checkpoint
from primary we use "ControlFile->checkPointCopy.ThisTimeLineID"
suppose that is older timeline 1. Now after reading the checkpoint we
will set the expectedTLEs based on the timeline from which we got the
checkpoint record.

See below Logic in WaitForWalToBecomeAvailable
if (readFile < 0)
{
if (!expectedTLEs)
expectedTLEs = readTimeLineHistory(receiveTLI);

Now, the first problem is we are breaking the sanity of expectedTLEs
because as per the definition it should already start with
recoveryTargetTLI but it is starting with the older TLI. Now, in
rescanLatestTimeLine we are trying to fetch the latest TLI which is
still 2, so this logic returns without reinitializing the expectedTLEs
because it assumes that if recoveryTargetTLI is pointing to 2 then
expectedTLEs must be correct and need not be changed.

See below logic:
rescanLatestTimeLine(void)
{
....
newtarget = findNewestTimeLine(recoveryTargetTLI);
if (newtarget == recoveryTargetTLI)
{
/* No new timelines found */
return false;
}
...
newExpectedTLEs = readTimeLineHistory(newtarget);
...
expectedTLEs = newExpectedTLEs;

Solution:
1. Find better way to fix the problem of commit
(ee994272ca50f70b53074f0febaec97e28f83c4e) which is breaking the
sanity of expectedTLEs.
2. Assume, we have to live with fix 1 and we have to initialize
expectedTLEs with an older timeline for validating the checkpoint in
absence of tl.hostory file (as this commit claims). Then as soon as
we read and validate the checkpoint, fix the expectedTLEs and set it
based on the history file of recoveryTargetTLI.

Does this explanation make sense? If not please let me know what part
is not clear in the explanation so I can point to that code.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#8Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#7)
Re: Race condition in recovery?

Thanks.

At Fri, 7 May 2021 11:04:53 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Fri, May 7, 2021 at 8:23 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Tue, 4 May 2021 17:41:06 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
Could you please fix the test script so that it causes your issue
correctly? And/or elaborate a bit more?

The attached first file is the debugging aid logging. The second is
the test script, to be placed in src/test/recovery/t.

I will look into your test case and try to see whether we can
reproduce the issue. But let me summarise what is the exact issue.
Basically, the issue is that first in validateRecoveryParameters if
the recovery target is the latest then we fetch the latest history
file and set the recoveryTargetTLI timeline to the latest available
timeline assume it's 2 but we delay updating the expectedTLEs (as per
commit ee994272ca50f70b53074f0febaec97e28f83c4e). Now, while reading

I think it is right up to here.

the checkpoint record if we don't get the required WAL from the
archive then we try to get from primary, and while getting checkpoint
from primary we use "ControlFile->checkPointCopy.ThisTimeLineID"
suppose that is older timeline 1. Now after reading the checkpoint we
will set the expectedTLEs based on the timeline from which we got the
checkpoint record.

I doubt this point. ReadCheckpointRecord finally calls
XLogFileReadAnyTLI and it uses the content of the 00000002.history as
the local timeline entry list, since expectedTLEs is NIL and
recoveryTargetTLI has been updated to 2 by
validateRecoveryParameters(). But node 3 was having only the segment
on TLI=1 so ReadCheckPointRecord() finds the wanted checkpoint recrod
on TLI=1. XLogFileReadAnyTLI() copies the local TLE list based on
TLI=2 to expectedTLEs just after that because the wanted checkpoint
record was available based on the list.

So I don't think checkPointCopy.ThisTimeLineID cannot affect this
logic, and don't think expectedTLEs is left with NIL. It's helpful
that you could show the specific code path to cause that.

See below Logic in WaitForWalToBecomeAvailable
if (readFile < 0)
{
if (!expectedTLEs)
expectedTLEs = readTimeLineHistory(receiveTLI);

Now, the first problem is we are breaking the sanity of expectedTLEs
because as per the definition it should already start with
recoveryTargetTLI but it is starting with the older TLI. Now, in

If my description above is correct, expectedTLEs has been always
filled by TLI=2's hisotry so readTimeLineHistory is not called there.

After that the things are working as described in my previous mail. So
The following is not an issue if I'm not missing something.

rescanLatestTimeLine we are trying to fetch the latest TLI which is
still 2, so this logic returns without reinitializing the expectedTLEs
because it assumes that if recoveryTargetTLI is pointing to 2 then
expectedTLEs must be correct and need not be changed.

See below logic:
rescanLatestTimeLine(void)
{
....
newtarget = findNewestTimeLine(recoveryTargetTLI);
if (newtarget == recoveryTargetTLI)
{
/* No new timelines found */
return false;
}
...
newExpectedTLEs = readTimeLineHistory(newtarget);
...
expectedTLEs = newExpectedTLEs;

Solution:
1. Find better way to fix the problem of commit
(ee994272ca50f70b53074f0febaec97e28f83c4e) which is breaking the
sanity of expectedTLEs.
2. Assume, we have to live with fix 1 and we have to initialize
expectedTLEs with an older timeline for validating the checkpoint in
absence of tl.hostory file (as this commit claims). Then as soon as
we read and validate the checkpoint, fix the expectedTLEs and set it
based on the history file of recoveryTargetTLI.

Does this explanation make sense? If not please let me know what part
is not clear in the explanation so I can point to that code.

So, unfortunately not.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#9Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#8)
Re: Race condition in recovery?

On Fri, May 7, 2021 at 2:33 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Thanks.

At Fri, 7 May 2021 11:04:53 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Fri, May 7, 2021 at 8:23 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Tue, 4 May 2021 17:41:06 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
Could you please fix the test script so that it causes your issue
correctly? And/or elaborate a bit more?

The attached first file is the debugging aid logging. The second is
the test script, to be placed in src/test/recovery/t.

I will look into your test case and try to see whether we can
reproduce the issue. But let me summarise what is the exact issue.
Basically, the issue is that first in validateRecoveryParameters if
the recovery target is the latest then we fetch the latest history
file and set the recoveryTargetTLI timeline to the latest available
timeline assume it's 2 but we delay updating the expectedTLEs (as per
commit ee994272ca50f70b53074f0febaec97e28f83c4e). Now, while reading

I think it is right up to here.

the checkpoint record if we don't get the required WAL from the
archive then we try to get from primary, and while getting checkpoint
from primary we use "ControlFile->checkPointCopy.ThisTimeLineID"
suppose that is older timeline 1. Now after reading the checkpoint we
will set the expectedTLEs based on the timeline from which we got the
checkpoint record.

I doubt this point. ReadCheckpointRecord finally calls
XLogFileReadAnyTLI and it uses the content of the 00000002.history as
the local timeline entry list, since expectedTLEs is NIL and
recoveryTargetTLI has been updated to 2 by
validateRecoveryParameters(). But node 3 was having only the segment
on TLI=1 so ReadCheckPointRecord() finds the wanted checkpoint recrod
on TLI=1. XLogFileReadAnyTLI() copies the local TLE list based on
TLI=2 to expectedTLEs just after that because the wanted checkpoint
record was available based on the list.

Okay, I got your point, now, consider the scenario that we are trying
to get the checkpoint record in XLogFileReadAnyTLI, you are right that
it returns history file 00000002.history. I think I did not mention
one point, basically, the tool while restarting node 3 after promoting
node 2 is deleting all the local WAL of node3 (so that node 3 can
follow node2). So now node3 doesn't have the checkpoint in the local
segment. Suppose checkpoint record was in segment
000000010000000000000001, but after TL switch 000000010000000000000001
is renamed to 000000010000000000000001.partial on node2 so now
practically doesn't have 000000010000000000000001 file anywhere.
However if TL switch mid-segment then we copy that segment with new TL
so we have 000000020000000000000001 which contains the checkpoint
record, but node 2 haven't yet archived it.

So now you come out of XLogFileReadAnyTLI, without reading checkpoint
record and without setting expectedTLEs. Because expectedTLEs is only
set if we are able to read the checkpoint record. Make sense?

So I don't think checkPointCopy.ThisTimeLineID cannot affect this
logic, and don't think expectedTLEs is left with NIL. It's helpful
that you could show the specific code path to cause that.

So now expectedTLEs is still NULL and you go to get the checkpoint
record from primary and use checkPointCopy.ThisTimeLineID.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#9)
Re: Race condition in recovery?

At Fri, 7 May 2021 15:16:03 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

Okay, I got your point, now, consider the scenario that we are trying
to get the checkpoint record in XLogFileReadAnyTLI, you are right that
it returns history file 00000002.history. I think I did not mention
one point, basically, the tool while restarting node 3 after promoting
node 2 is deleting all the local WAL of node3 (so that node 3 can
follow node2). So now node3 doesn't have the checkpoint in the local
segment. Suppose checkpoint record was in segment

...

So now you come out of XLogFileReadAnyTLI, without reading checkpoint
record and without setting expectedTLEs. Because expectedTLEs is only
set if we are able to read the checkpoint record. Make sense?

Thanks. I understood the case and reproduced. Although I don't think
removing WAL files from non-backup cluster is legit, I also think we
can safely start archive recovery from a replicated segment.

So now expectedTLEs is still NULL and you go to get the checkpoint
record from primary and use checkPointCopy.ThisTimeLineID.

I don't think erasing expectedTLEs after once set is the right fix
because expectedTLEs are supposed to be set just once iff we are sure
that we are going to follow the history, until rescan changes it as
the only exception.

It seems to me the issue here is not a race condition but
WaitForWALToBecomeAvailable initializing expectedTLEs with the history
of a improper timeline. So using recoveryTargetTLI instead of
receiveTLI for the case fixes this issue.

- if (!expectedTLEs)
- expectedTLEs = readTimeLineHistory(receiveTLI);

I thought that the reason using receiveTLI instead of
recoveryTargetTLI here is that there's a case where receiveTLI is the
future of recoveryTarrgetTLI but I haven't successfully had such a
situation. If I set recovoryTargetTLI to a TLI that standby doesn't
know but primary knows, validateRecoveryParameters immediately
complains about that before reaching there. Anyway the attached
assumes receiveTLI may be the future of recoveryTargetTLI.

Just inserting if() into the exising code makes the added lines stick
out of the right side edge of 80 columns so I refactored there a bit
to lower indentation.

I believe the 004_timeline_switch.pl detects your issue. And the
attached change fixes it.

Any suggestions are welcome.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Choose-correct-timeline-when-received-historic-timel.patchtext/x-patch; charset=us-asciiDownload+108-12
#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#10)
Re: Race condition in recovery?

On Mon, May 10, 2021 at 2:05 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I thought that the reason using receiveTLI instead of
recoveryTargetTLI here is that there's a case where receiveTLI is the
future of recoveryTarrgetTLI but I haven't successfully had such a
situation. If I set recovoryTargetTLI to a TLI that standby doesn't
know but primary knows, validateRecoveryParameters immediately
complains about that before reaching there. Anyway the attached
assumes receiveTLI may be the future of recoveryTargetTLI.

If you see the note in this commit. It says without the timeline
history file, so does it trying to say that although receiveTLI is the
ancestor of recovoryTargetTLI, it can not detect that because of the
absence of the TL.history file ?

ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
.....
Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

I believe the 004_timeline_switch.pl detects your issue. And the
attached change fixes it.

I think this fix looks better to me, but I will think more about it
and give my feedback. Thanks for quickly coming up with the
reproducible test case.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#11)
Re: Race condition in recovery?

At Mon, 10 May 2021 14:27:21 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Mon, May 10, 2021 at 2:05 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I thought that the reason using receiveTLI instead of
recoveryTargetTLI here is that there's a case where receiveTLI is the
future of recoveryTarrgetTLI but I haven't successfully had such a
situation. If I set recovoryTargetTLI to a TLI that standby doesn't
know but primary knows, validateRecoveryParameters immediately
complains about that before reaching there. Anyway the attached
assumes receiveTLI may be the future of recoveryTargetTLI.

If you see the note in this commit. It says without the timeline
history file, so does it trying to say that although receiveTLI is the
ancestor of recovoryTargetTLI, it can not detect that because of the
absence of the TL.history file ?

Yeah, it reads so for me and it works as described. What I don't
understand is that why the patch uses receiveTLI, not
recovoryTargetTLI to load timeline hisotry in
WaitForWALToBecomeAvailable. The only possible reason is that there
could be a case where receivedTLI is the future of recoveryTargetTLI.
However, AFAICS it's impossible for that case to happen. At
replication start, requsting TLI is that of the last checkpoint, which
is the same to recoveryTargetTLI, or anywhere in exising expectedTLEs
which must be the past of recoveryTargetTLI. That seems to be already
true at the time replication was made possible to follow a timeline
switch (abfd192b1b).

So I was tempted to just load history for recoveryTargetTLI then
confirm that receiveTLI is in the history. Actually that change
doesn't harm any of the recovery TAP tests. It is way simpler than
the last patch. However, I'm not confident that it is right.. ;(

ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
.....
Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

I believe the 004_timeline_switch.pl detects your issue. And the
attached change fixes it.

I think this fix looks better to me, but I will think more about it
and give my feedback. Thanks for quickly coming up with the
reproducible test case.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#12)
Re: Race condition in recovery?

On Tue, May 11, 2021 at 1:42 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 10 May 2021 14:27:21 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Mon, May 10, 2021 at 2:05 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

I thought that the reason using receiveTLI instead of
recoveryTargetTLI here is that there's a case where receiveTLI is the
future of recoveryTarrgetTLI but I haven't successfully had such a
situation. If I set recovoryTargetTLI to a TLI that standby doesn't
know but primary knows, validateRecoveryParameters immediately
complains about that before reaching there. Anyway the attached
assumes receiveTLI may be the future of recoveryTargetTLI.

If you see the note in this commit. It says without the timeline
history file, so does it trying to say that although receiveTLI is the
ancestor of recovoryTargetTLI, it can not detect that because of the
absence of the TL.history file ?

Yeah, it reads so for me and it works as described. What I don't
understand is that why the patch uses receiveTLI, not
recovoryTargetTLI to load timeline hisotry in
WaitForWALToBecomeAvailable. The only possible reason is that there
could be a case where receivedTLI is the future of recoveryTargetTLI.
However, AFAICS it's impossible for that case to happen. At
replication start, requsting TLI is that of the last checkpoint, which
is the same to recoveryTargetTLI, or anywhere in exising expectedTLEs
which must be the past of recoveryTargetTLI. That seems to be already
true at the time replication was made possible to follow a timeline
switch (abfd192b1b).

So I was tempted to just load history for recoveryTargetTLI then
confirm that receiveTLI is in the history. Actually that change
doesn't harm any of the recovery TAP tests. It is way simpler than
the last patch. However, I'm not confident that it is right.. ;(

I first thought of fixing like as you describe that instead of loading
history of receiveTLI, load history for recoveryTargetTLI. But then,
this commit (ee994272ca50f70b53074f0febaec97e28f83c4e) has especially
used the history file of receiveTLI to solve a particular issue which
I did not clearly understand. I am not sure that whether it is a good
idea to directly using recoveryTargetTLI, without exactly
understanding why this commit was using receiveTLI. It doesn't seem
like an oversight to me, it seems intentional. Maybe Heikki can
comment on this?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#14Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#10)
Re: Race condition in recovery?

On Mon, May 10, 2021 at 4:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

It seems to me the issue here is not a race condition but
WaitForWALToBecomeAvailable initializing expectedTLEs with the history
of a improper timeline. So using recoveryTargetTLI instead of
receiveTLI for the case fixes this issue.

I agree.

I believe the 004_timeline_switch.pl detects your issue. And the
attached change fixes it.

So why does this use recoveryTargetTLI instead of receiveTLI only
conditionally? Why not do it all the time?

The hard thing about this code is that the assumptions are not very
clear. If we don't know why something is a certain way, then we might
break things if we change it. Worse yet, if nobody else knows why it's
like that either, then who knows what assumptions they might be
making? It's hard to be sure that any change is safe.

But that being said, we have a clear definition from the comments for
what expectedTLEs is supposed to contain, and it's only going to end
up with those contents if we initialize it from recoveryTargetTLI. So
I am inclined to think that we ought to do that always, and if it
breaks something, then that's a sign that some other part of the code
also needs fixing, because apparently that hypothetical other part of
the code doesn't work if expctedTLEs contains what the comments say
that it should.

Now maybe that's the wrong idea. But if so, then we're saying that the
definition of expectedTLEs needs to be changed, and we should update
the comments with the new definition, whatever it is. A lot of the
confusion here results from the fact that the code and comments are
inconsistent and we can't tell whether that's intentional or
inadvertent. Let's not leave the next person who looks at this code
wondering the same thing about whatever changes we make.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#14)
Re: Race condition in recovery?

On Fri, May 14, 2021 at 2:37 AM Robert Haas <robertmhaas@gmail.com> wrote:

So why does this use recoveryTargetTLI instead of receiveTLI only
conditionally? Why not do it all the time?

The hard thing about this code is that the assumptions are not very
clear. If we don't know why something is a certain way, then we might
break things if we change it. Worse yet, if nobody else knows why it's
like that either, then who knows what assumptions they might be
making? It's hard to be sure that any change is safe.

But that being said, we have a clear definition from the comments for
what expectedTLEs is supposed to contain, and it's only going to end
up with those contents if we initialize it from recoveryTargetTLI. So
I am inclined to think that we ought to do that always, and if it
breaks something, then that's a sign that some other part of the code
also needs fixing, because apparently that hypothetical other part of
the code doesn't work if expctedTLEs contains what the comments say
that it should.

Now maybe that's the wrong idea. But if so, then we're saying that the
definition of expectedTLEs needs to be changed, and we should update
the comments with the new definition, whatever it is. A lot of the
confusion here results from the fact that the code and comments are
inconsistent and we can't tell whether that's intentional or
inadvertent. Let's not leave the next person who looks at this code
wondering the same thing about whatever changes we make.

I am not sure that have you noticed the commit id which changed the
definition of expectedTLEs, Heikki has committed that change so adding
him in the list to know his opinion.

=====
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58

Delay reading timeline history file until it's fetched from master.
.....
Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

Part of this commit message says that it will not identify the
recoveryTargetTLI as the ancestor of the checkpoint timeline (without
history file). I did not understand what it is trying to say. Does
it is trying to say that even though the recoveryTargetTLI is the
ancestor of the checkpoint timeline but we can not track that because
we don't have a history file? So to handle this problem change the
definition of expectedTLEs to directly point to the checkpoint
timeline?

Because before this commit, we were directly initializing expectedTLEs
with the history file of recoveryTargetTLI, we were not even waiting
for reading the checkpoint, but under this commit, it is changed.

I am referring to the below code which was deleted by this commit:

========
@@ -5279,49 +5299,6 @@ StartupXLOG(void)
*/
readRecoveryCommandFile();

- /* Now we can determine the list of expected TLIs */
- expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-
- /*
- * If the location of the checkpoint record is not on the expected
- * timeline in the history of the requested timeline, we cannot proceed:
- * the backup is not part of the history of the requested timeline.
- */
- if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
- ControlFile->checkPointCopy.ThisTimeLineID)
- {
=========

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#16Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#14)
Re: Race condition in recovery?

At Thu, 13 May 2021 17:07:31 -0400, Robert Haas <robertmhaas@gmail.com> wrote in

On Mon, May 10, 2021 at 4:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

It seems to me the issue here is not a race condition but
WaitForWALToBecomeAvailable initializing expectedTLEs with the history
of a improper timeline. So using recoveryTargetTLI instead of
receiveTLI for the case fixes this issue.

I agree.

I believe the 004_timeline_switch.pl detects your issue. And the
attached change fixes it.

So why does this use recoveryTargetTLI instead of receiveTLI only
conditionally? Why not do it all the time?

The commit ee994272ca apparently says that there's a case where primary

The hard thing about this code is that the assumptions are not very
clear. If we don't know why something is a certain way, then we might
break things if we change it. Worse yet, if nobody else knows why it's
like that either, then who knows what assumptions they might be
making? It's hard to be sure that any change is safe.

Thanks for the comment.

But that being said, we have a clear definition from the comments for
what expectedTLEs is supposed to contain, and it's only going to end
up with those contents if we initialize it from recoveryTargetTLI. So
I am inclined to think that we ought to do that always, and if it

Yes, I also found it after that, and agreed. Desynchronization
between recoveryTargetTLI and expectedTLEs prevents
rescanLatestTimeline from working.

breaks something, then that's a sign that some other part of the code
also needs fixing, because apparently that hypothetical other part of
the code doesn't work if expctedTLEs contains what the comments say
that it should.

After some more inspection, I'm now also sure that it is a
typo/thinko. Other than while fetching the first checkpoint,
receivedTLI is always in the history of recoveryTargetTLI, otherwise
recoveryTargetTLI is equal to receiveTLI.

I read that the commit message as "waiting for fetching possible
future history files to know if there's the future for the current
timeline. However now I read it as "don't bother expecting for
possiblly-unavailable history files when we're reading the first
checkpoint the timeline for which is already known to us.". If it is
correct we don't bother considering future history files coming from
primary there.

Now maybe that's the wrong idea. But if so, then we're saying that the
definition of expectedTLEs needs to be changed, and we should update
the comments with the new definition, whatever it is. A lot of the
confusion here results from the fact that the code and comments are
inconsistent and we can't tell whether that's intentional or
inadvertent. Let's not leave the next person who looks at this code
wondering the same thing about whatever changes we make.

Ok. The reason why we haven't have a complain about that would be
that it is rare that pg_wal is wiped out before a standby connects to
a just-promoted primary. I'm not sure about the tool Dilip is using,
though..

So the result is the attached. This would be back-patcheable to 9.3
(or 9.6?) but I doubt that we should do as we don't seem to have had a
complaint on this issue and we're not full faith on this.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v2-0001-Set-expectedTLEs-correctly-based-on-recoveryTarge.patchtext/x-patch; charset=us-asciiDownload+73-3
#17Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#16)
Re: Race condition in recovery?

At Fri, 14 May 2021 14:12:31 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 13 May 2021 17:07:31 -0400, Robert Haas <robertmhaas@gmail.com> wrote in

On Mon, May 10, 2021 at 4:35 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

It seems to me the issue here is not a race condition but
WaitForWALToBecomeAvailable initializing expectedTLEs with the history
of a improper timeline. So using recoveryTargetTLI instead of
receiveTLI for the case fixes this issue.

I agree.

I believe the 004_timeline_switch.pl detects your issue. And the
attached change fixes it.

So why does this use recoveryTargetTLI instead of receiveTLI only
conditionally? Why not do it all the time?

The commit ee994272ca apparently says that there's a case where primary

This is not an incomplete line but just a garbage.

The hard thing about this code is that the assumptions are not very
clear. If we don't know why something is a certain way, then we might
break things if we change it. Worse yet, if nobody else knows why it's
like that either, then who knows what assumptions they might be
making? It's hard to be sure that any change is safe.

Thanks for the comment.

But that being said, we have a clear definition from the comments for
what expectedTLEs is supposed to contain, and it's only going to end
up with those contents if we initialize it from recoveryTargetTLI. So
I am inclined to think that we ought to do that always, and if it

Yes, I also found it after that, and agreed. Desynchronization
between recoveryTargetTLI and expectedTLEs prevents
rescanLatestTimeline from working.

breaks something, then that's a sign that some other part of the code
also needs fixing, because apparently that hypothetical other part of
the code doesn't work if expctedTLEs contains what the comments say
that it should.

After some more inspection, I'm now also sure that it is a
typo/thinko. Other than while fetching the first checkpoint,
receivedTLI is always in the history of recoveryTargetTLI, otherwise
recoveryTargetTLI is equal to receiveTLI.

I read that the commit message as "waiting for fetching possible
future history files to know if there's the future for the current
timeline. However now I read it as "don't bother expecting for
possiblly-unavailable history files when we're reading the first
checkpoint the timeline for which is already known to us.". If it is
correct we don't bother considering future history files coming from
primary there.

Now maybe that's the wrong idea. But if so, then we're saying that the
definition of expectedTLEs needs to be changed, and we should update
the comments with the new definition, whatever it is. A lot of the
confusion here results from the fact that the code and comments are
inconsistent and we can't tell whether that's intentional or
inadvertent. Let's not leave the next person who looks at this code
wondering the same thing about whatever changes we make.

Ok. The reason why we haven't have a complain about that would be
that it is rare that pg_wal is wiped out before a standby connects to
a just-promoted primary. I'm not sure about the tool Dilip is using,
though..

So the result is the attached. This would be back-patcheable to 9.3
(or 9.6?) but I doubt that we should do as we don't seem to have had a
complaint on this issue and we're not full faith on this.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#15)
Re: Race condition in recovery?

On Fri, May 14, 2021 at 12:59 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I am not sure that have you noticed the commit id which changed the
definition of expectedTLEs, Heikki has committed that change so adding
him in the list to know his opinion.

I did notice, but keep in mind that this was more than 8 years ago.
Even if Heikki is reading this thread, he may not remember why he
changed 1 line of code one way rather than another in 2013. I mean if
he does that's great, but it's asking a lot.

=====
ee994272ca50f70b53074f0febaec97e28f83c4e
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58
Committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2013-01-03 14:11:58

Delay reading timeline history file until it's fetched from master.
.....
Without the timeline history file, recovering that file
will fail as the older timeline ID is not recognized to be an ancestor of
the target timeline. If you try to recover from such a backup, using only
streaming replication to fetch the WAL, this patch is required for that to
work.
=====

Part of this commit message says that it will not identify the
recoveryTargetTLI as the ancestor of the checkpoint timeline (without
history file). I did not understand what it is trying to say. Does
it is trying to say that even though the recoveryTargetTLI is the
ancestor of the checkpoint timeline but we can not track that because
we don't have a history file? So to handle this problem change the
definition of expectedTLEs to directly point to the checkpoint
timeline?

Because before this commit, we were directly initializing expectedTLEs
with the history file of recoveryTargetTLI, we were not even waiting
for reading the checkpoint, but under this commit, it is changed.

Well, I think that is talking about what the commit did in general,
not specifically the one line of code that we think may be incorrect.
As I understand it, the general issue here was that if
XLogFileReadAnyTLI() was called before expectedTLEs got set, then
prior to this commit it would have to fail, because the foreach() loop
in that function would be iterating over an empty list. Heikki tried
to make it not fail in that case, by setting tles =
readTimeLineHistory(recoveryTargetTLI), so that the foreach loop
*wouldn't* get an empty list.

Thinking about this a bit more, I think the idea behind the logic this
commit added to XLogFileReadAnyTLI() is that
XLogFileReadAnyTLI(recoveryTargetTLI) may or may not produce the
correct answer. If the timeline history file exists, it will contain
all the information that we need and will return a complete list of
TLEs. But if the file does not exist yet, then it will return a
1-entry list saying that the TLI in question has no parents. If
readTimeLineHistory() actually reads the file, then it's safe to save
the return value in expectedTLEs, but if it doesn't, then it may or
may not be safe. If XLogFileReadAnyTLI calls XLogFileRead and it
works, then the WAL segment we need exists on our target timeline and
we don't actually need the timeline history for anything because we
can just directly begin replay from the target timeline. But if
XLogFileRead fails with the 1-entry dummy list, then we need the
timeline history and don't have it yet, so we have to retry later,
when the history file will hopefully be present, and then at that
point readTimeLineHistory will return a different and better answer
and hopefully it will all work.

I think this is what the commit message is talking about when it says
that "Without the timeline history file, recovering that file will
fail as the older timeline ID is not recognized to be an ancestor of
the target timeline." Without the timeline history file, we can't know
whether some other timeline is an ancestor or not. But the specific
way that manifests is that XLogFileReadAnyTLI() returns a 1-entry
dummy list instead of the real contents of the timeline history file.
This commit doesn't prevent that from happening, but it does prevent
the 1-entry dummy list from getting stored in the global variable
expectedTLEs, except in the case where no timeline switch is occurring
and the lack of history therefore doesn't matter. Without this commit,
if the call to readTimeLineHistory(recoveryTargetTLI) happens at a
time when the timeline history file is not yet available, the 1-entry
dummy list ends up in the global variable and there's no way for it to
ever be replaced with a real history even if the timeline history file
shows up in the archive later.

As I see it, the question on the table here is whether there's any
justification for the fact that when the second switch in
WaitForWALToBecomeAvailable takes the
XLOG_FROM_ARCHIVE/XLOG_FROM_PG_WAL branch, it calls XLogFileReadAnyTLI
which tries to read the history of recoveryTargetTLI, while when that
same switch takes the XLOG_FROM_STREAM branch, it tries to read the
history of receiveTLI. I tend to think it doesn't make sense. On
general principle, archiving and streaming are supposed to work the
same way, so the idea that they are getting the timeline from
different places is inherently suspect. But also and more
specifically, AFAICS receiveTLI always has to be the same TLI that we
requested from the server, so we're always looking up our own current
TLI here rather than the target TLI, which seems wrong to me, at least
of this moment. :-)

But that having been said, I still don't quite understand the
conditions required to tickle this problem. I spent a long time poking
at it today. It seems to me that it ought somehow to be possible to
recreate the scenario without trying to reuse the old master as a
standby, and also without even needing a WAL archive, but I couldn't
figure out how to do it. I tried setting up a primary and a standby,
and then making a backup from the standby, promoting it, and then
starting what would have been a cascading standby from the backup. But
that doesn't do it. The first mistake I made was creating the standbys
with something like 'pg_basebackup -R', but that's not good enough
because then they have WAL, so I added '-Xnone'. But then I realized
that when a base backup ends, the primary writes an XLOG_SWITCH
record, which means that when the standby is promoted, the promotion
is not in the same WAL segment as the checkpoint from which the
machine that would have been a cascading standby is trying to start. I
worked around that by setting recovery_target='immediate' on the
standby. With that change, I get a WAL file on the new timeline - 2 in
this case - that looks like this:

rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn:
0/19000060, prev 0/19000028, desc: CHECKPOINT_ONLINE redo 0/19000028;
tli 1; prev tli 1; fpw true; xid 0:587; oid 16385; multi 1; offset 0;
oldest xid 579 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
timestamp xid: 0/0; oldest running xid 587; online
rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn:
0/190000D8, prev 0/19000060, desc: BACKUP_END 0/19000028
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn:
0/19000100, prev 0/190000D8, desc: CHECKPOINT_SHUTDOWN redo
0/19000100; tli 2; prev tli 1; fpw true; xid 0:587; oid 16385; multi
1; offset 0; oldest xid 579 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown

That sure looks like the right thing to recreate the problem, because
the first checkpoint is from the backup, and the
woulda-been-a-cascading-standby should be starting there, and the
second checkpoint is in the same segment and shows a timeline switch.
But everything worked great:

2021-05-14 17:44:58.684 EDT [5697] DETAIL: End of WAL reached on
timeline 1 at 0/19000100.
2021-05-14 17:44:58.728 EDT [5694] LOG: new target timeline is 2
2021-05-14 17:44:58.746 EDT [5694] LOG: redo starts at 0/19000028
2021-05-14 17:44:58.749 EDT [5694] LOG: consistent recovery state
reached at 0/19000100

I don't understand why that works. It feels to me like it ought to run
smack into the same problem you saw, but it doesn't.

I am referring to the below code which was deleted by this commit:

========
@@ -5279,49 +5299,6 @@ StartupXLOG(void)
*/
readRecoveryCommandFile();

- /* Now we can determine the list of expected TLIs */
- expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
-
- /*
- * If the location of the checkpoint record is not on the expected
- * timeline in the history of the requested timeline, we cannot proceed:
- * the backup is not part of the history of the requested timeline.
- */
- if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
- ControlFile->checkPointCopy.ThisTimeLineID)
- {
=========

I don't think this code is really deleted. The tliOfPointInHistory
check was just moved later in the function. And expectedTLEs is still
supposed to be getting initialized, because just before the new
location of the tliOfPointInHistory check, Heikki added
Assert(expectedTLEs).

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#18)
Re: Race condition in recovery?

On Sat, May 15, 2021 at 3:58 AM Robert Haas <robertmhaas@gmail.com> wrote:

I did notice, but keep in mind that this was more than 8 years ago.
Even if Heikki is reading this thread, he may not remember why he
changed 1 line of code one way rather than another in 2013. I mean if
he does that's great, but it's asking a lot.

I agree with your point. But I think that one line is related to the
purpose of this commit and I have explained (in 3rd paragraph below)
why do I think so.

As I understand it, the general issue here was that if

XLogFileReadAnyTLI() was called before expectedTLEs got set, then
prior to this commit it would have to fail, because the foreach() loop
in that function would be iterating over an empty list. Heikki tried
to make it not fail in that case, by setting tles =
readTimeLineHistory(recoveryTargetTLI), so that the foreach loop
*wouldn't* get an empty list.

I might be missing something but I don't agree with this logic. If
you see prior to this commit the code flow was like below[1]StartupXLOG(void) { ..... So my
point is if we are calling XLogFileReadAnyTLI() somewhere while
reading the checkpoint record then note that expectedTLEs were
initialized unconditionally before even try to read that checkpoint
record. So how expectedTLEs could be uninitialized in
LogFileReadAnyTLI?

[1]: StartupXLOG(void) { ....
StartupXLOG(void)
{
....

recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
...
readRecoveryCommandFile();
...
expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
...
..
record = ReadCheckpointRecord(checkPointLoc, 0);

Another point which I am not sure about but still I think that one
line (expectedTLEs = readTimeLineHistory(receiveTLI);), somewhere
related to the purpose of this commit. Let me explain why do I think
so. Basically, before this commit, we were initializing
"expectedTLEs" based on the history file of "recoveryTargetTLI", right
after calling "readRecoveryCommandFile()" (this function will
initialize recoveryTargetTLI based on the recovery target, and it
ensures it read the respective history file). Now, right after this
point, there was a check as shown below[2]if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) != ControlFile->checkPointCopy.ThisTimeLineID) { error() }, which is checking whether
the checkpoint TLI exists in the "expectedTLEs" which is initialized
based on recoveryTargetTLI. And it appeared that this check was
failing in some cases which this commit tried to fix and all other
code is there to support that. Because now before going for reading
the checkpoint we are not initializing "expectedTLEs" so now after
moving this line from here it was possible that "expectedTLEs" is not
initialized in XLogFileReadAnyTLI() and the remaining code in
XLogFileReadAnyTLI() is to handle that part.

Now, coming to my point that why this one line is related, In this
one line (expectedTLEs = readTimeLineHistory(receiveTLI);), we
completely avoiding recoveryTargetTLI and initializing "expectedTLEs"
based on the history file of the TL from which we read the checkpoint,
so now, there is no scope of below[2]if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) != ControlFile->checkPointCopy.ThisTimeLineID) { error() } check to fail because note that
we are not initializing "expectedTLEs" based on the
"recoveryTargetTLI" but we are initializing from the history from
where we read checkpoint.

So I feel if we directly fix this one line to initialize
"expectedTLEs" from "recoveryTargetTLI" then it will expose to the
same problem this commit tried to fix.

[2]: if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) != ControlFile->checkPointCopy.ThisTimeLineID) { error() }
if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
ControlFile->checkPointCopy.ThisTimeLineID)
{
error()
}

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#20Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#19)
Re: Race condition in recovery?

At Sat, 15 May 2021 10:55:05 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in

On Sat, May 15, 2021 at 3:58 AM Robert Haas <robertmhaas@gmail.com> wrote:

I did notice, but keep in mind that this was more than 8 years ago.
Even if Heikki is reading this thread, he may not remember why he
changed 1 line of code one way rather than another in 2013. I mean if
he does that's great, but it's asking a lot.

I agree with your point. But I think that one line is related to the
purpose of this commit and I have explained (in 3rd paragraph below)
why do I think so.

As I understand it, the general issue here was that if

XLogFileReadAnyTLI() was called before expectedTLEs got set, then
prior to this commit it would have to fail, because the foreach() loop
in that function would be iterating over an empty list. Heikki tried
to make it not fail in that case, by setting tles =
readTimeLineHistory(recoveryTargetTLI), so that the foreach loop
*wouldn't* get an empty list.

I might be missing something but I don't agree with this logic. If
you see prior to this commit the code flow was like below[1]. So my
point is if we are calling XLogFileReadAnyTLI() somewhere while
reading the checkpoint record then note that expectedTLEs were
initialized unconditionally before even try to read that checkpoint
record. So how expectedTLEs could be uninitialized in
LogFileReadAnyTLI?

Mmm. I think both of you are right. Before the commit,
XLogFileReadAnyTLI expected that expectedTLEs is initialized. After
the commit it cannot no longer expect that so readTimeLineHistory was
changed to try fetching by itself. *If* an appropriate history file
is found, it *initializes* expectedTLEs with the content.

[1]
StartupXLOG(void)
{
....

recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
...
readRecoveryCommandFile();
...
expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
...
..
record = ReadCheckpointRecord(checkPointLoc, 0);

Another point which I am not sure about but still I think that one
line (expectedTLEs = readTimeLineHistory(receiveTLI);), somewhere
related to the purpose of this commit. Let me explain why do I think
so. Basically, before this commit, we were initializing
"expectedTLEs" based on the history file of "recoveryTargetTLI", right
after calling "readRecoveryCommandFile()" (this function will
initialize recoveryTargetTLI based on the recovery target, and it
ensures it read the respective history file). Now, right after this
point, there was a check as shown below[2], which is checking whether
the checkpoint TLI exists in the "expectedTLEs" which is initialized
based on recoveryTargetTLI. And it appeared that this check was
failing in some cases which this commit tried to fix and all other
code is there to support that. Because now before going for reading
the checkpoint we are not initializing "expectedTLEs" so now after
moving this line from here it was possible that "expectedTLEs" is not
initialized in XLogFileReadAnyTLI() and the remaining code in
XLogFileReadAnyTLI() is to handle that part.

Before the commit expectedTLEs is always initialized with just one
entry for the TLI of the last checkpoint record.

(1) If XLogFileReadAnyTLI() found the segment but no history file
found, that is, using the dummy TLE-list, expectedTLEs is initialized
with the dummy one-entry list. So there's no behavioral change in this
aspect.

(2) If we didn't find the segment for the checkpoint record, it starts
replication and fetches history files and WAL records then revisits
XLogFileReadAnyTLI. Now we have both the history file and segments,
it successfully reads the recood. The difference of expectedTLEs made
by the patch is having just one entry or the all entries for the past.

Assuming that we keep expectedTLEs synced with recoveryTargetTLI,
rescanLatestTimeLine updates the list properly so no need to worry
about the future. So the issue would be in the past timelines. After
reading the checkpoint record, if we need to rewind to the previous
timeline for the REDO point, the dummy list is inconvenient.

So there is a possibility that the patch fixed the case (2), where the
standby doesn't have both the segment for the checkpoint record and
the history file for the checkpoint, and the REDO point is on the last
TLI. If it is correct, the patch still fails for the case (1), that
is, the issue raised here. Anyway it would be useless (and rahter
harmful) to initialize expectedTLEs based on receiveTLI there.

So my resul there is:

The commit fixed the case (2)
The fix caused the issue for the case (1).
The proposed fix fixes the case (1), caused by the commit.

There's another issue in the case (1) and REDO point is back to the
previous timeline, which is in doubt we need to fix..

Now, coming to my point that why this one line is related, In this
one line (expectedTLEs = readTimeLineHistory(receiveTLI);), we
completely avoiding recoveryTargetTLI and initializing "expectedTLEs"
based on the history file of the TL from which we read the checkpoint,
so now, there is no scope of below[2] check to fail because note that
we are not initializing "expectedTLEs" based on the
"recoveryTargetTLI" but we are initializing from the history from
where we read checkpoint.

So I feel if we directly fix this one line to initialize
"expectedTLEs" from "recoveryTargetTLI" then it will expose to the
same problem this commit tried to fix.

[2]
if (tliOfPointInHistory(ControlFile->checkPoint, expectedTLEs) !=
ControlFile->checkPointCopy.ThisTimeLineID)
{
error()
}

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#21Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#20)
#22Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#21)
#23Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#20)
#24Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#19)
#26Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#25)
#27Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#24)
#28Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#27)
#29Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#27)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#26)
#31Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#30)
#32Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#31)
#33Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#30)
#34Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#31)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#33)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#31)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#36)
#38Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#37)
#39Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#38)
#41Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#40)
#42Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#41)
#43Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#36)
#44Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#42)
#45Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#44)
#46Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#42)
#47Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#46)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#47)
#49Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#48)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#49)
#51Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#50)
#52Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#51)
#53Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#52)
#54Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#48)
#55Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#53)
#56Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#55)
#57Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#56)
#58Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#52)
#59Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Robert Haas (#58)
#60Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#58)
#61Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#59)
#62Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Kyotaro Horiguchi (#61)
#63Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#62)
#64Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#62)
#65Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Kyotaro Horiguchi (#63)
#66Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#60)
#67Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#66)
#68Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#66)
#69Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#65)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#54)
#71Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#69)
#72Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#70)
#73Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Dilip Kumar (#72)
#74Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#72)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#73)
#76Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#75)
#77Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#76)
#78Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Kyotaro Horiguchi (#69)
#79Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Kyotaro Horiguchi (#71)
#80Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#79)
#81Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#78)
#82Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Kyotaro Horiguchi (#80)
#83Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tatsuro Yamada (#82)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#76)
#85Stephen Frost
sfrost@snowman.net
In reply to: Kyotaro Horiguchi (#83)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#84)
#87Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#84)
#88Tatsuro Yamada
tatsuro.yamada.tf@nttcom.co.jp
In reply to: Kyotaro Horiguchi (#83)
#89Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Stephen Frost (#85)
#90Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#89)
#91Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#86)
#92Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#91)
#93Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#92)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#93)
#95Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#94)
#96Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#95)
#97Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#96)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#96)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#96)
#100Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#99)
#101Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#100)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#101)
#103Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Robert Haas (#102)
#104Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#103)
#105Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#104)
#106Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#105)
#107Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#104)
#108Dilip Kumar
dilipbalaut@gmail.com
In reply to: Kyotaro Horiguchi (#107)
#109Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dilip Kumar (#108)
#110Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro Horiguchi (#106)
#111Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#109)
#112Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#111)
#113Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#112)
#114Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#113)
#115Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#114)
#116Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#115)
#117Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#116)
#118Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#117)
#119Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#118)
#120Mikael Kjellström
mikael.kjellstrom@mksoft.nu
In reply to: Tom Lane (#100)
#121Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#114)
#122Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#121)
#123Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#122)
#124Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#123)
#125Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#124)
#126Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#124)
#127Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#126)
#128Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#109)
#129Andrew Dunstan
andrew@dunslane.net
In reply to: Kyotaro Horiguchi (#128)
#130Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#127)
#131Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andrew Dunstan (#129)