pg_rewind does not rewind diverging timelines
Hi all,
I have been playing around with various promotion scenarios to check if it
is possible to lose writes in more complicated scenarios involving
promotions and uses of synchronous_standby_names and decided to create a
TLA+ model for streaming replication involving promotions and check those
with TLC. You can find the models at [1]https://github.com/mkindahl/tla-postgres if you're interested.
There is one scenario that I assume is known that TLC found, but does not
seem to be fixed. It is a relatively rare case, but since the fix is quite
easy, I thought I'd share it with you and get feedback.
The scenario can occur if you're unlucky and have more than one crash when
promoting standbys to be primaries, and goes like this:
You have three servers, S1, S2, and S3. S1 is primary and S2 and S3 are
standbys. All are on timeline (TLI) 1.
1. S1 crashes
2. S1 recovers and starts promotion. It writes XLOG_END_OF_RECOVERY (EOR)
for TLI 2 to the WAL.
3. S1 It manages to write some records W1 to the WAL.
4. Before the EOR is replicated to any standby, S1 crashes again. It is now
on TLI 2 and has some changes that are not elsewhere.
5. S2 is promoted. It writes an EOR for TLI 2 (since it is not aware of any
other timeline) to the WAL.
6. S2 writes some records W2 to WAL and now S1 has a record of TLI 2
version 1 (TLI 2.1) and S2 is on TLI 2.2.
7. S1 recovers and wants to join as a standby. You run pg_rewind to get rid
of the extra data, but since S2 is also on TLI 2, pg_rewind will happily
assume that both are on the same timeline.
8. S2 is now a standby but has that extra record for W2 both in the WAL and
in the database.
The fix (see attached draft) is quite simple: add a UUID to the EOR and to
the history file. When comparing timelines, don't only check the TLI, also
check the UUID. If not both match, go back further until you find a
timeline where both the TLI and the timeline UUID matches and do the usual
fandango to find the good LSN to rewind to.
Attachments:
0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patch.v1application/octet-stream; name=0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patch.v1Download+349-25
On Thu, Apr 30, 2026 at 10:19 AM Mats Kindahl <mats.kindahl@gmail.com>
wrote:
Hi all,
I have been playing around with various promotion scenarios to check if it
is possible to lose writes in more complicated scenarios involving
promotions and uses of synchronous_standby_names and decided to create a
TLA+ model for streaming replication involving promotions and check those
with TLC. You can find the models at [1] if you're interested.There is one scenario that I assume is known that TLC found, but does not
seem to be fixed. It is a relatively rare case, but since the fix is quite
easy, I thought I'd share it with you and get feedback.The scenario can occur if you're unlucky and have more than one crash when
promoting standbys to be primaries, and goes like this:You have three servers, S1, S2, and S3. S1 is primary and S2 and S3 are
standbys. All are on timeline (TLI) 1.1. S1 crashes
2. S1 recovers and starts promotion. It writes XLOG_END_OF_RECOVERY (EOR)
for TLI 2 to the WAL.
3. S1 It manages to write some records W1 to the WAL.
4. Before the EOR is replicated to any standby, S1 crashes again. It is
now on TLI 2 and has some changes that are not elsewhere.
5. S2 is promoted. It writes an EOR for TLI 2 (since it is not aware of
any other timeline) to the WAL.
6. S2 writes some records W2 to WAL and now S1 has a record of TLI 2
version 1 (TLI 2.1) and S2 is on TLI 2.2.
7. S1 recovers and wants to join as a standby. You run pg_rewind to get
rid of the extra data, but since S2 is also on TLI 2, pg_rewind will
happily assume that both are on the same timeline.
8. S2 is now a standby but has that extra record for W2 both in the WAL
and in the database.The fix (see attached draft) is quite simple: add a UUID to the EOR and to
the history file. When comparing timelines, don't only check the TLI, also
check the UUID. If not both match, go back further until you find a
timeline where both the TLI and the timeline UUID matches and do the usual
fandango to find the good LSN to rewind to.
Here is an updated version of the patch. It seems like it is not necessary
to extend the XLOG_END_OF_RECOVERY record with the UUID, just the history
files. The scenario is still the same though, and can trigger diverging
servers, possibly silent. I have an additional test case using a divergence
going back three promotions.
--
Best wishes,
Mats Kindahl, Multigres Developer, Supabase
Attachments:
v2.0002-pg_rewind-test-rewind-across-UUID-mismatched-TLI.patchtext/x-patch; charset=US-ASCII; name=v2.0002-pg_rewind-test-rewind-across-UUID-mismatched-TLI.patchDownload+121-1
v2.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchtext/x-patch; charset=US-ASCII; name=v2.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchDownload+349-25
Hi Mats,
Thanks for picking this up -- the scenario is a real one and I think the
UUID-tagging approach is a clean way to solve it. v2 applies and builds
without trouble, and the core algorithm reads well to me.
I have a handful of observations that I'd love your thoughts.
Regarding Correctness I have the below thoughts
1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);
I think there might be a small mismatch here: GetCurrentTimestamp() returns
microseconds since the Postgres epoch (2000-01-01),
whereas generate_uuidv7_r describes its first argument as milliseconds
since the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp field, so
the resulting UUID wouldn't be a conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.
Uniqueness is preserved either way, so the rewind logic still works as
intended but it seemed worth flagging.
I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;
Or, since promotion isn't on a hot path, gettimeofday() / time(NULL)
directly would also be fine.
2. EOR-record path, the intent is unclear.
The comment above generate_uuidv7_r() at says:
"The same UUID is written into the history file and later into the
XLOG_END_OF_RECOVERY record so that pg_rewind can distinguish two
servers..."
But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add
the UUID, and XLogCtl->ThisTimeLineUUID is written under info_lck without a
reader (I couldn't grep it).
The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like
preparation for an EOR-struct extension that ended up not being part of the
patch.
Was the EOR-record piece something you intended to keep for a follow-up, or
has it been superseded by the history-file approach?
3. Malformed UUID handling in readTimeLineHistory().
The optional field-4 path is:
if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,
CStringGetDatum(uuid_str));
...
}
uuid_in() raises ereport(ERROR) on a malformed input, while the surrounding
syntax-error paths in readTimeLineHistory() use FATAL deliberately.
In practice an ERROR during startup ends up being fatal too, so this isn't
strictly a bug but it would be nicer to stay consistent.
Regarding the Tests I have the following thoughts
The two new cases are nice, a few extensions that I think would strengthen
them:
1. A mixed-version case where one side has a zero UUID. That's the path
we're claiming is graceful, but nothing currently exercises it
2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that
findCommonAncestorTimeline's loop walks past matching entries
before hitting the mismatch. The 0002 test puts the divergence at
depth 1.
3. A small assertion against the on-disk 00000002.history contents, to pin
down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's pg_wal is
the kind of thing that tends to break under
environment changes. A CHECKPOINT on node_x before the backup, or
wal_keep_size as in 0001, would let the test stand on its own.
I'm happy to keep reviewing/contributing, thanks again for working on it.
Regards,
Surya Poondla
Show quoted text
On Fri, May 22, 2026 at 12:09 AM surya poondla <suryapoondla4@gmail.com>
wrote:
Hi Mats,
Thanks for picking this up -- the scenario is a real one and I think the
UUID-tagging approach is a clean way to solve it. v2 applies and builds
without trouble, and the core algorithm reads well to me.
I have a handful of observations that I'd love your thoughts.
Hi Surya,
Thank you for the review. It is a quite rare scenario, but it is real and
the fix is simple.
Regarding Correctness I have the below thoughts
1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);I think there might be a small mismatch here: GetCurrentTimestamp()
returns microseconds since the Postgres epoch (2000-01-01),
whereas generate_uuidv7_r describes its first argument as milliseconds
since the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp field,
so the resulting UUID wouldn't be a conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.
Uniqueness is preserved either way, so the rewind logic still works as
intended but it seemed worth flagging.
I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;Or, since promotion isn't on a hot path, gettimeofday() / time(NULL)
directly would also be fine.
Yes, the intention was to use a proper timestamp to allow debugging servers
if necessary. Switched to gettimeofday() and used 0 for sub-ms since this
is not going to be critical. (We could use ns here as well, but that would
only solve a race if you have two servers being promoted in the same ms,
which I find unlikely, and there is a random number added for that
situation.)
2. EOR-record path, the intent is unclear.
The comment above generate_uuidv7_r() at says:
"The same UUID is written into the history file and later into the
XLOG_END_OF_RECOVERY record so that pg_rewind can distinguish two
servers..."But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add
the UUID, and XLogCtl->ThisTimeLineUUID is written under info_lck without a
reader (I couldn't grep it).The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like
preparation for an EOR-struct extension that ended up not being part of the
patch.Was the EOR-record piece something you intended to keep for a follow-up,
or has it been superseded by the history-file approach?
No, the EOR changes are not needed for the promotion, contrary to what I
originally thought. Cleaned up the comment and the code and removed all
traces of changes to the EOR (I hope).
3. Malformed UUID handling in readTimeLineHistory().
The optional field-4 path is:
if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,
CStringGetDatum(uuid_str));
...
}uuid_in() raises ereport(ERROR) on a malformed input, while the
surrounding syntax-error paths in readTimeLineHistory() use FATAL
deliberately.
In practice an ERROR during startup ends up being fatal too, so this isn't
strictly a bug but it would be nicer to stay consistent.
Agree. I added code to capture the error and raise a FATAL instead (with
the error message from the uuid_in, in case it is modified it makes sense
to show this).
Regarding the Tests I have the following thoughts
The two new cases are nice, a few extensions that I think would strengthen
them:
1. A mixed-version case where one side has a zero UUID. That's the path
we're claiming is graceful, but nothing currently exercises it
Yes, that should work regardless of whether the source or the target has
the zero UUID.
I realized one thing: if two timelines have identical TLI but one has zero
UUID and one has not, it seems they could not come from the same promotion
(one promotion happened on an old server and the other one on a new
server), that is, they should be treated as different. Does that make
sense? I made the necessary changes in the attached patches for testing.
Please have a look.
2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that
findCommonAncestorTimeline's loop walks past matching entries
before hitting the mismatch. The 0002 test puts the divergence at
depth 1.
I was unsure if this test was necessary or interesting, hence a separate
commit. Since you thought it was useful, it's now rolled into the patch and
I extended the tests with the scenarios you suggested.
I also did some refactorings of the tests to avoid duplication. More below.
3. A small assertion against the on-disk 00000002.history contents, to pin
down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's pg_wal
is the kind of thing that tends to break under
environment changes. A CHECKPOINT on node_x before the backup, or
wal_keep_size as in 0001, would let the test stand on its own.
Good point.
I refactored the code to avoid some duplication and make the test flow
self-explanatory and as part of that I set the wal_keep_size for all nodes.
In the process I noticed that many of the functions in RewindTest.pm do the
same job as the primitives I wrote, but have hard-coded variable names. I
could rewrite them to take parameters, but that would be quite a big patch
to add additional changes to each call site, so I did not do that and
rather added small wrappers specific for the tests in 005_same_timeline.pl.
Attached a new version of the now single patch.
I'm happy to keep reviewing/contributing, thanks again for working on it.
Thank you for reviewing it.
--
Best wishes,
Mats Kindahl, Multigres Developer, Supabase
Attachments:
v3.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchtext/x-patch; charset=US-ASCII; name=v3.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchDownload+613-24
Hi, Mats
On Sun, 24 May 2026 at 20:30, Mats Kindahl <mats.kindahl@gmail.com> wrote:
On Fri, May 22, 2026 at 12:09 AM surya poondla <suryapoondla4@gmail.com> wrote:
Hi Mats,
Thanks for picking this up -- the scenario is a real one and I think the UUID-tagging approach is a clean way to
solve it. v2 applies and builds without trouble, and the core algorithm reads well to me.
I have a handful of observations that I'd love your thoughts.Hi Surya,
Thank you for the review. It is a quite rare scenario, but it is real and the fix is simple.
Regarding Correctness I have the below thoughts
1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);I think there might be a small mismatch here: GetCurrentTimestamp() returns microseconds since the Postgres epoch
(2000-01-01),
whereas generate_uuidv7_r describes its first argument as milliseconds since the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp field, so the resulting UUID wouldn't be a
conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.Uniqueness is preserved either way, so the rewind logic still works as intended but it seemed worth flagging.
I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;Or, since promotion isn't on a hot path, gettimeofday() / time(NULL) directly would also be fine.
Yes, the intention was to use a proper timestamp to allow debugging servers if necessary. Switched to gettimeofday() and
used 0 for sub-ms since this is not going to be critical. (We could use ns here as well, but that would only solve a race
if you have two servers being promoted in the same ms, which I find unlikely, and there is a random number added for that
situation.)2. EOR-record path, the intent is unclear.
The comment above generate_uuidv7_r() at says:
"The same UUID is written into the history file and later into the XLOG_END_OF_RECOVERY record so that pg_rewind can
distinguish two servers..."But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add the UUID, and XLogCtl->ThisTimeLineUUID is
written under info_lck without a
reader (I couldn't grep it).The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like preparation for an EOR-struct extension that
ended up not being part of the patch.Was the EOR-record piece something you intended to keep for a follow-up, or has it been superseded by the
history-file approach?No, the EOR changes are not needed for the promotion, contrary to what I originally thought. Cleaned up the comment and
the code and removed all traces of changes to the EOR (I hope).3. Malformed UUID handling in readTimeLineHistory().
The optional field-4 path is:
if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,
CStringGetDatum(uuid_str));
...
}uuid_in() raises ereport(ERROR) on a malformed input, while the surrounding syntax-error paths in readTimeLineHistory
() use FATAL deliberately.
In practice an ERROR during startup ends up being fatal too, so this isn't strictly a bug but it would be nicer to
stay consistent.Agree. I added code to capture the error and raise a FATAL instead (with the error message from the uuid_in, in case it
is modified it makes sense to show this).Regarding the Tests I have the following thoughts
The two new cases are nice, a few extensions that I think would strengthen them:
1. A mixed-version case where one side has a zero UUID. That's the path we're claiming is graceful, but nothing
currently exercises itYes, that should work regardless of whether the source or the target has the zero UUID.
I realized one thing: if two timelines have identical TLI but one has zero UUID and one has not, it seems they could not
come from the same promotion (one promotion happened on an old server and the other one on a new server), that is, they
should be treated as different. Does that make sense? I made the necessary changes in the attached patches for testing.
Please have a look.2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that findCommonAncestorTimeline's loop walks past
matching entries
before hitting the mismatch. The 0002 test puts the divergence at depth 1.I was unsure if this test was necessary or interesting, hence a separate commit. Since you thought it was useful, it's
now rolled into the patch and I extended the tests with the scenarios you suggested.I also did some refactorings of the tests to avoid duplication. More below.
3. A small assertion against the on-disk 00000002.history contents, to pin down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's pg_wal is the kind of thing that tends to break
under
environment changes. A CHECKPOINT on node_x before the backup, or wal_keep_size as in 0001, would let the test
stand on its own.Good point.
I refactored the code to avoid some duplication and make the test flow self-explanatory and as part of that I set the
wal_keep_size for all nodes.In the process I noticed that many of the functions in RewindTest.pm do the same job as the primitives I wrote, but have
hard-coded variable names. I could rewrite them to take parameters, but that would be quite a big patch to add additional
changes to each call site, so I did not do that and rather added small wrappers specific for the tests in
005_same_timeline.pl⚠️.Attached a new version of the now single patch.
I'm happy to keep reviewing/contributing, thanks again for working on it.
Thank you for reviewing it.
Thank you for your work. I have one comment.
+ a = &tlh->source[tlh->sourceNentries - 2].tluuid;
+ b = &tlh->target[tlh->targetNentries - 2].tluuid;
+
+ if (memcmp(a, &zero, UUID_LEN) == 0 && memcmp(b, &zero, UUID_LEN) == 0)
+ return true;
+
+ return memcmp(a, b, UUID_LEN) == 0;
Since we already have matchingTimelineUUID(), the above code can be simplified
using it.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Hi Japin,
On Mon, May 25, 2026 at 7:21 AM Japin Li <japinli@hotmail.com> wrote:
Hi, Mats
On Sun, 24 May 2026 at 20:30, Mats Kindahl <mats.kindahl@gmail.com> wrote:
On Fri, May 22, 2026 at 12:09 AM surya poondla <suryapoondla4@gmail.com>
wrote:
Hi Mats,
Thanks for picking this up -- the scenario is a real one and I think
the UUID-tagging approach is a clean way to
solve it. v2 applies and builds without trouble, and the core algorithm
reads well to me.
I have a handful of observations that I'd love your thoughts.
Hi Surya,
Thank you for the review. It is a quite rare scenario, but it is real
and the fix is simple.
Regarding Correctness I have the below thoughts
1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);I think there might be a small mismatch here: GetCurrentTimestamp()
returns microseconds since the Postgres epoch
(2000-01-01),
whereas generate_uuidv7_r describes its first argument as millisecondssince the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp
field, so the resulting UUID wouldn't be a
conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.Uniqueness is preserved either way, so the rewind logic still works as
intended but it seemed worth flagging.
I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;Or, since promotion isn't on a hot path, gettimeofday() / time(NULL)
directly would also be fine.
Yes, the intention was to use a proper timestamp to allow debugging
servers if necessary. Switched to gettimeofday() and
used 0 for sub-ms since this is not going to be critical. (We could use
ns here as well, but that would only solve a race
if you have two servers being promoted in the same ms, which I find
unlikely, and there is a random number added for that
situation.)
2. EOR-record path, the intent is unclear.
The comment above generate_uuidv7_r() at says:
"The same UUID is written into the history file and later into the
XLOG_END_OF_RECOVERY record so that pg_rewind can
distinguish two servers..."
But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn'tadd the UUID, and XLogCtl->ThisTimeLineUUID is
written under info_lck without a
reader (I couldn't grep it).The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like
preparation for an EOR-struct extension that
ended up not being part of the patch.
Was the EOR-record piece something you intended to keep for a
follow-up, or has it been superseded by the
history-file approach?
No, the EOR changes are not needed for the promotion, contrary to what I
originally thought. Cleaned up the comment and
the code and removed all traces of changes to the EOR (I hope).
3. Malformed UUID handling in readTimeLineHistory().
The optional field-4 path is:
if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,CStringGetDatum(uuid_str));
...
}uuid_in() raises ereport(ERROR) on a malformed input, while the
surrounding syntax-error paths in readTimeLineHistory
() use FATAL deliberately.
In practice an ERROR during startup ends up being fatal too, so thisisn't strictly a bug but it would be nicer to
stay consistent.
Agree. I added code to capture the error and raise a FATAL instead (with
the error message from the uuid_in, in case it
is modified it makes sense to show this).
Regarding the Tests I have the following thoughts
The two new cases are nice, a few extensions that I think would
strengthen them:
1. A mixed-version case where one side has a zero UUID. That's the path
we're claiming is graceful, but nothing
currently exercises it
Yes, that should work regardless of whether the source or the target has
the zero UUID.
I realized one thing: if two timelines have identical TLI but one has
zero UUID and one has not, it seems they could not
come from the same promotion (one promotion happened on an old server
and the other one on a new server), that is, they
should be treated as different. Does that make sense? I made the
necessary changes in the attached patches for testing.
Please have a look.
2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that
findCommonAncestorTimeline's loop walks past
matching entries
before hitting the mismatch. The 0002 test puts the divergence atdepth 1.
I was unsure if this test was necessary or interesting, hence a separate
commit. Since you thought it was useful, it's
now rolled into the patch and I extended the tests with the scenarios
you suggested.
I also did some refactorings of the tests to avoid duplication. More
below.
3. A small assertion against the on-disk 00000002.history contents, to
pin down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's
pg_wal is the kind of thing that tends to break
under
environment changes. A CHECKPOINT on node_x before the backup, orwal_keep_size as in 0001, would let the test
stand on its own.
Good point.
I refactored the code to avoid some duplication and make the test flow
self-explanatory and as part of that I set the
wal_keep_size for all nodes.
In the process I noticed that many of the functions in RewindTest.pm do
the same job as the primitives I wrote, but have
hard-coded variable names. I could rewrite them to take parameters, but
that would be quite a big patch to add additional
changes to each call site, so I did not do that and rather added small
wrappers specific for the tests in
005_same_timeline.pl⚠️.
Attached a new version of the now single patch.
I'm happy to keep reviewing/contributing, thanks again for working on
it.
Thank you for reviewing it.
Thank you for your work. I have one comment.
+ a = &tlh->source[tlh->sourceNentries - 2].tluuid; + b = &tlh->target[tlh->targetNentries - 2].tluuid; + + if (memcmp(a, &zero, UUID_LEN) == 0 && memcmp(b, &zero, UUID_LEN) == 0) + return true; + + return memcmp(a, b, UUID_LEN) == 0;Since we already have matchingTimelineUUID(), the above code can be
simplified
using it.
Thank you for the review. I switched to using the matchingTimelineUUID()
for this part of the code and made some other minor improvements as well.
--
Best wishes,
Mats Kindahl, Multigres Developer, Supabase
Attachments:
v4.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchtext/x-patch; charset=US-ASCII; name=v4.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchDownload+606-24
Hi, Mats
Thanks for updating the patch.
On Mon, 25 May 2026 at 20:59, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Hi Japin,
On Mon, May 25, 2026 at 7:21 AM Japin Li <japinli@hotmail.com> wrote:
Hi, Mats
On Sun, 24 May 2026 at 20:30, Mats Kindahl <mats.kindahl@gmail.com> wrote:
On Fri, May 22, 2026 at 12:09 AM surya poondla <suryapoondla4@gmail.com> wrote:
Hi Mats,
Thanks for picking this up -- the scenario is a real one and I think the UUID-tagging approach is a clean way to
solve it. v2 applies and builds without trouble, and the core algorithm reads well to me.
I have a handful of observations that I'd love your thoughts.Hi Surya,
Thank you for the review. It is a quite rare scenario, but it is real and the fix is simple.
Regarding Correctness I have the below thoughts
1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);I think there might be a small mismatch here: GetCurrentTimestamp() returns microseconds since the Postgres epoch
(2000-01-01),
whereas generate_uuidv7_r describes its first argument as milliseconds since the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp field, so the resulting UUID wouldn't be a
conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.Uniqueness is preserved either way, so the rewind logic still works as intended but it seemed worth flagging.
I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;Or, since promotion isn't on a hot path, gettimeofday() / time(NULL) directly would also be fine.
Yes, the intention was to use a proper timestamp to allow debugging servers if necessary. Switched to gettimeofday
() and
used 0 for sub-ms since this is not going to be critical. (We could use ns here as well, but that would only solve
a race
if you have two servers being promoted in the same ms, which I find unlikely, and there is a random number added
for that
situation.)
2. EOR-record path, the intent is unclear.
The comment above generate_uuidv7_r() at says:
"The same UUID is written into the history file and later into the XLOG_END_OF_RECOVERY record so that pg_rewind
can
distinguish two servers..."
But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add the UUID, and XLogCtl->ThisTimeLineUUIDis
written under info_lck without a
reader (I couldn't grep it).The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like preparation for an EOR-struct extension
that
ended up not being part of the patch.
Was the EOR-record piece something you intended to keep for a follow-up, or has it been superseded by the
history-file approach?No, the EOR changes are not needed for the promotion, contrary to what I originally thought. Cleaned up the comment
and
the code and removed all traces of changes to the EOR (I hope).
3. Malformed UUID handling in readTimeLineHistory().
The optional field-4 path is:
if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,
CStringGetDatum(uuid_str));
...
}uuid_in() raises ereport(ERROR) on a malformed input, while the surrounding syntax-error paths in
readTimeLineHistory
() use FATAL deliberately.
In practice an ERROR during startup ends up being fatal too, so this isn't strictly a bug but it would be nicer to
stay consistent.Agree. I added code to capture the error and raise a FATAL instead (with the error message from the uuid_in, in
case it
is modified it makes sense to show this).
Regarding the Tests I have the following thoughts
The two new cases are nice, a few extensions that I think would strengthen them:
1. A mixed-version case where one side has a zero UUID. That's the path we're claiming is graceful, but nothing
currently exercises itYes, that should work regardless of whether the source or the target has the zero UUID.
I realized one thing: if two timelines have identical TLI but one has zero UUID and one has not, it seems they
could not
come from the same promotion (one promotion happened on an old server and the other one on a new server), that is,
they
should be treated as different. Does that make sense? I made the necessary changes in the attached patches for
testing.
Please have a look.
2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that findCommonAncestorTimeline's loop walks past
matching entries
before hitting the mismatch. The 0002 test puts the divergence at depth 1.I was unsure if this test was necessary or interesting, hence a separate commit. Since you thought it was useful,
it's
now rolled into the patch and I extended the tests with the scenarios you suggested.
I also did some refactorings of the tests to avoid duplication. More below.
3. A small assertion against the on-disk 00000002.history contents, to pin down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's pg_wal is the kind of thing that tends to break
under
environment changes. A CHECKPOINT on node_x before the backup, or wal_keep_size as in 0001, would let thetest
stand on its own.
Good point.
I refactored the code to avoid some duplication and make the test flow self-explanatory and as part of that I set
the
wal_keep_size for all nodes.
In the process I noticed that many of the functions in RewindTest.pm do the same job as the primitives I wrote, but
have
hard-coded variable names. I could rewrite them to take parameters, but that would be quite a big patch to add
additional
changes to each call site, so I did not do that and rather added small wrappers specific for the tests in
005_same_timeline.pl⚠️⚠️.Attached a new version of the now single patch.
I'm happy to keep reviewing/contributing, thanks again for working on it.
Thank you for reviewing it.
Thank you for your work. I have one comment.
+ a = &tlh->source[tlh->sourceNentries - 2].tluuid; + b = &tlh->target[tlh->targetNentries - 2].tluuid; + + if (memcmp(a, &zero, UUID_LEN) == 0 && memcmp(b, &zero, UUID_LEN) == 0) + return true; + + return memcmp(a, b, UUID_LEN) == 0;Since we already have matchingTimelineUUID(), the above code can be simplified
using it.Thank you for the review. I switched to using the matchingTimelineUUID() for this part of the code and made some other
minor improvements as well.
Here are some comments on v4.
1.
+/*
+ * Timeline histories for both clusters, populated by timelines_match().
+ */
I don't see a timelines_match() function. Does this refer to
matchAndFetchTimelines()?
2.
+typedef struct TimelineHistoriesData
+{
+ TimeLineHistoryEntry *source,
+ *target;
+ int sourceNentries,
+ targetNentries;
+} TimelineHistoriesData;
I'd prefer to use TimeLineHistoriesData to stay consistent with
TimeLineHistoryEntry. Anyway I'm not instant on it.
3.
+typedef TimelineHistoriesData * TimelineHistories;
The space between * and TimelineHistories is unnecessary — see
StringInfoData and other typedefs.
4.
+# node_x and node_b both start from the same TLI 1 baseline.
+my ($node_x, $node_b2) =
+ setup_standbys_from_origin($node_origin2, 'node_x', 'node_b2');
There appears to be a typo in the comment. The node_b should be node_b2.
Everything else looks good. Thank you again for updating the patch!
--
Best wishes,
Mats Kindahl, Multigres Developer, Supabase
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Hi Japin,
Thanks for reviewing the patch.
On Tue, May 26, 2026 at 8:56 AM Japin Li <japinli@hotmail.com> wrote:
Hi, Mats
Thanks for updating the patch.
On Mon, 25 May 2026 at 20:59, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Hi Japin,
On Mon, May 25, 2026 at 7:21 AM Japin Li <japinli@hotmail.com> wrote:
Hi, Mats
On Sun, 24 May 2026 at 20:30, Mats Kindahl <mats.kindahl@gmail.com>
wrote:
On Fri, May 22, 2026 at 12:09 AM surya poondla <
suryapoondla4@gmail.com> wrote:
Hi Mats,
Thanks for picking this up -- the scenario is a real one and I think
the UUID-tagging approach is a clean way to
solve it. v2 applies and builds without trouble, and the core
algorithm reads well to me.
I have a handful of observations that I'd love your thoughts.
Hi Surya,
Thank you for the review. It is a quite rare scenario, but it is real
and the fix is simple.
Regarding Correctness I have the below thoughts
1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);I think there might be a small mismatch here: GetCurrentTimestamp()
returns microseconds since the Postgres epoch
(2000-01-01),
whereas generate_uuidv7_r describes its first argument asmilliseconds since the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp
field, so the resulting UUID wouldn't be a
conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.Uniqueness is preserved either way, so the rewind logic still works
as intended but it seemed worth flagging.
I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;Or, since promotion isn't on a hot path, gettimeofday() / time(NULL)
directly would also be fine.
Yes, the intention was to use a proper timestamp to allow debugging
servers if necessary. Switched to gettimeofday
() and
used 0 for sub-ms since this is not going to be critical. (We could
use ns here as well, but that would only solve
a race
if you have two servers being promoted in the same ms, which I find
unlikely, and there is a random number added
for that
situation.)
2. EOR-record path, the intent is unclear.
The comment above generate_uuidv7_r() at says:
"The same UUID is written into the history file and later into the
XLOG_END_OF_RECOVERY record so that pg_rewind
can
distinguish two servers..."
But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn'tadd the UUID, and XLogCtl->ThisTimeLineUUID
is
written under info_lck without a
reader (I couldn't grep it).The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads
like preparation for an EOR-struct extension
that
ended up not being part of the patch.
Was the EOR-record piece something you intended to keep for a
follow-up, or has it been superseded by the
history-file approach?
No, the EOR changes are not needed for the promotion, contrary to
what I originally thought. Cleaned up the comment
and
the code and removed all traces of changes to the EOR (I hope).
3. Malformed UUID handling in readTimeLineHistory().
The optional field-4 path is:
if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,CStringGetDatum(uuid_str));
...
}uuid_in() raises ereport(ERROR) on a malformed input, while the
surrounding syntax-error paths in
readTimeLineHistory
() use FATAL deliberately.
In practice an ERROR during startup ends up being fatal too, so thisisn't strictly a bug but it would be nicer to
stay consistent.
Agree. I added code to capture the error and raise a FATAL instead
(with the error message from the uuid_in, in
case it
is modified it makes sense to show this).
Regarding the Tests I have the following thoughts
The two new cases are nice, a few extensions that I think would
strengthen them:
1. A mixed-version case where one side has a zero UUID. That's the
path we're claiming is graceful, but nothing
currently exercises it
Yes, that should work regardless of whether the source or the target
has the zero UUID.
I realized one thing: if two timelines have identical TLI but one has
zero UUID and one has not, it seems they
could not
come from the same promotion (one promotion happened on an old server
and the other one on a new server), that is,
they
should be treated as different. Does that make sense? I made the
necessary changes in the attached patches for
testing.
Please have a look.
2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that
findCommonAncestorTimeline's loop walks past
matching entries
before hitting the mismatch. The 0002 test puts the divergenceat depth 1.
I was unsure if this test was necessary or interesting, hence a
separate commit. Since you thought it was useful,
it's
now rolled into the patch and I extended the tests with the scenarios
you suggested.
I also did some refactorings of the tests to avoid duplication. More
below.
3. A small assertion against the on-disk 00000002.history contents,
to pin down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's
pg_wal is the kind of thing that tends to break
under
environment changes. A CHECKPOINT on node_x before the backup,or wal_keep_size as in 0001, would let the
test
stand on its own.
Good point.
I refactored the code to avoid some duplication and make the test
flow self-explanatory and as part of that I set
the
wal_keep_size for all nodes.
In the process I noticed that many of the functions in RewindTest.pm
do the same job as the primitives I wrote, but
have
hard-coded variable names. I could rewrite them to take parameters,
but that would be quite a big patch to add
additional
changes to each call site, so I did not do that and rather added
small wrappers specific for the tests in
005_same_timeline.pl⚠️⚠️.
Attached a new version of the now single patch.
I'm happy to keep reviewing/contributing, thanks again for working
on it.
Thank you for reviewing it.
Thank you for your work. I have one comment.
+ a = &tlh->source[tlh->sourceNentries - 2].tluuid; + b = &tlh->target[tlh->targetNentries - 2].tluuid; + + if (memcmp(a, &zero, UUID_LEN) == 0 && memcmp(b, &zero,UUID_LEN) == 0)
+ return true; + + return memcmp(a, b, UUID_LEN) == 0;Since we already have matchingTimelineUUID(), the above code can be
simplified
using it.
Thank you for the review. I switched to using the matchingTimelineUUID()
for this part of the code and made some other
minor improvements as well.
Here are some comments on v4.
1. +/* + * Timeline histories for both clusters, populated by timelines_match(). + */I don't see a timelines_match() function. Does this refer to
matchAndFetchTimelines()?
Correct. Updated.
2. +typedef struct TimelineHistoriesData +{ + TimeLineHistoryEntry *source, + *target; + int sourceNentries, + targetNentries; +} TimelineHistoriesData;I'd prefer to use TimeLineHistoriesData to stay consistent with
TimeLineHistoryEntry. Anyway I'm not instant on it.
Makes sense to be consistent. Updated.
3.
+typedef TimelineHistoriesData * TimelineHistories;The space between * and TimelineHistories is unnecessary — see
StringInfoData and other typedefs.
My mistake. FIxed.
4. +# node_x and node_b both start from the same TLI 1 baseline. +my ($node_x, $node_b2) = + setup_standbys_from_origin($node_origin2, 'node_x', 'node_b2');There appears to be a typo in the comment. The node_b should be node_b2.
Right. Fixed.
Everything else looks good. Thank you again for updating the patch!
Thank you again for reviewing the patch. :)
Attached a new version of the patch with the changes you suggested.
--
Best wishes,
Mats Kindahl, Multigres Developer, Supabase
Attachments:
v5.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchtext/x-patch; charset=US-ASCII; name=v5.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchDownload+606-24
Hi, Mats
On Tue, 26 May 2026 at 18:03, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Attached a new version of the patch with the changes you suggested.
I found an error on the Windows platform [1]https://cirrus-ci.com/task/6228217159221248.
[07:08:28.538] >>> MALLOC_PERTURB_=168 PG_REGRESS=C:\cirrus\build\src/test\regress\pg_regress.exe REGRESS_SHLIB=C:\cirrus\build\src/test\regress\regress.dll MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 top_builddir=C:\cirrus\build UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 PATH=C:\cirrus\build\tmp_install\usr\local\pgsql\bin;C:\cirrus\build\src\bin\pg_rewind;C:/cirrus/build/src/bin/pg_rewind/test;C:\VS_2019\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64;C:\VS_2019\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\VS_2019\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\VS_2019\Common7\IDE\;C:\VS_2019\Common7\Tools\;C:\VS_2019\VC\Auxiliary\Build;C:\zstd\zstd-v1.5.2-win64;C:\zlib;C:\lz4;C:\icu;C:\winflexbison;C:\strawberry\5.42.0.1\perl\bin;C:\python\Scripts\;C:\python\;C:\Windows Kits\10\Debuggers\x64;C:\Program Files\Git\usr\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\PowerShell\7\;C:\Program Files\Google\Compute Engine\sysprep;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Windows\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps INITDB_TEMPLATE=C:/cirrus/build/tmp_install/initdb-template ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 share_contrib_dir=C:/cirrus/build/tmp_install//usr/local/pgsql/share/contrib C:\python\python3.EXE C:\cirrus\build\..\src/tools/testwrap --basedir C:\cirrus\build --srcdir C:\cirrus\src\bin\pg_rewind --pg-test-extra --testgroup pg_rewind --testname 005_same_timeline -- C:\strawberry\5.42.0.1\perl\bin\perl.EXE -I C:/cirrus/src/test/perl -I C:\cirrus\src\bin\pg_rewind C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl
[07:08:28.538] ------------------------------------- 8< -------------------------------------
[07:08:28.538] stderr:
[07:08:28.538] # Failed test 'pg_rewind rewinds across mismatched TLI 2 / TLI 2-prime to TLI 1'
[07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 45.
[07:08:28.538] # ---------- command failed ----------
[07:08:28.538] # pg_rewind --debug --source-pgdata C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_b2_data/pgdata --target-pgdata C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_a2_data/pgdata --no-sync --config-file C:\cirrus\build\testrun\pg_rewind\005_same_timeline\data\tmp_test_ZCeZ/target-postgresql.conf.tmp --restore-target-wal
[07:08:28.538] # -------------- stderr --------------
[07:08:28.538] # pg_rewind: using for rewind "restore_command = 'cp "C:cirrusuild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/%f" "%p"'"
[07:08:28.538] # pg_rewind: Source timeline history:
[07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
[07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/00000000
[07:08:28.538] # pg_rewind: Target timeline history:
[07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
[07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/060000E0
[07:08:28.538] # pg_rewind: 3: 0/060000E0 - 0/00000000
[07:08:28.538] # pg_rewind: servers diverged at WAL location 0/040000E0 on timeline 1
[07:08:28.538] # cp: cannot stat 'C:cirrus'$'\b''uild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/000000020000000000000004': No such file or directory
[07:08:28.538] # pg_rewind: error: could not restore file "000000020000000000000004" from archive
[07:08:28.538] # pg_rewind: error: could not find previous WAL record at 0/040000E0
[07:08:28.538] # ------------------------------------
[07:08:28.538] # Failed test 'rewound node reflects source history, not target TLI 2/TLI 3 data'
[07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 260.
[07:08:28.538] # got: 'origin2
[07:08:28.538] # x'
[07:08:28.538] # expected: 'b
[07:08:28.538] # origin2'
[07:08:28.538] # Looks like you failed 2 tests of 11.
[07:08:28.538]
[07:08:28.538] (test program exited with status code 2)
[07:08:28.538] ------------------------------------------------------------------------------
[07:08:28.538]
[1]: https://cirrus-ci.com/task/6228217159221248
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
Hi Japin,
On 5/29/26 04:01, Japin Li wrote:
Hi, Mats
On Tue, 26 May 2026 at 18:03, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Attached a new version of the patch with the changes you suggested.
I found an error on the Windows platform [1].
[07:08:28.538] >>> MALLOC_PERTURB_=168 PG_REGRESS=C:\cirrus\build\src/test\regress\pg_regress.exe REGRESS_SHLIB=C:\cirrus\build\src/test\regress\regress.dll MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 top_builddir=C:\cirrus\build UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 PATH=C:\cirrus\build\tmp_install\usr\local\pgsql\bin;C:\cirrus\build\src\bin\pg_rewind;C:/cirrus/build/src/bin/pg_rewind/test;C:\VS_2019\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64;C:\VS_2019\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\VS_2019\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\VS_2019\Common7\IDE\;C:\VS_2019\Common7\Tools\;C:\VS_2019\VC\Auxiliary\Build;C:\zstd\zstd-v1.5.2-win64;C:\zlib;C:\lz4;C:\icu;C:\winflexbison;C:\strawberry\5.42.0.1\perl\bin;C:\python\Scripts\;C:\python\;C:\Windows Kits\10\Debuggers\x64;C:\Program Files\Git\usr\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\PowerShell\7\;C:\Program Files\Google\Compute Engine\sysprep;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Windows\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps INITDB_TEMPLATE=C:/cirrus/build/tmp_install/initdb-template ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 share_contrib_dir=C:/cirrus/build/tmp_install//usr/local/pgsql/share/contrib C:\python\python3.EXE C:\cirrus\build\..\src/tools/testwrap --basedir C:\cirrus\build --srcdir C:\cirrus\src\bin\pg_rewind --pg-test-extra --testgroup pg_rewind --testname 005_same_timeline -- C:\strawberry\5.42.0.1\perl\bin\perl.EXE -I C:/cirrus/src/test/perl -I C:\cirrus\src\bin\pg_rewind C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl
[07:08:28.538] ------------------------------------- 8< -------------------------------------
[07:08:28.538] stderr:
[07:08:28.538] # Failed test 'pg_rewind rewinds across mismatched TLI 2 / TLI 2-prime to TLI 1'
[07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 45.
[07:08:28.538] # ---------- command failed ----------
[07:08:28.538] # pg_rewind --debug --source-pgdata C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_b2_data/pgdata --target-pgdata C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_a2_data/pgdata --no-sync --config-file C:\cirrus\build\testrun\pg_rewind\005_same_timeline\data\tmp_test_ZCeZ/target-postgresql.conf.tmp --restore-target-wal
[07:08:28.538] # -------------- stderr --------------
[07:08:28.538] # pg_rewind: using for rewind "restore_command = 'cp "C:cirrusuild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/%f" "%p"'"
[07:08:28.538] # pg_rewind: Source timeline history:
[07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
[07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/00000000
[07:08:28.538] # pg_rewind: Target timeline history:
[07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
[07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/060000E0
[07:08:28.538] # pg_rewind: 3: 0/060000E0 - 0/00000000
[07:08:28.538] # pg_rewind: servers diverged at WAL location 0/040000E0 on timeline 1
[07:08:28.538] # cp: cannot stat 'C:cirrus'$'\b''uild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/000000020000000000000004': No such file or directory
[07:08:28.538] # pg_rewind: error: could not restore file "000000020000000000000004" from archive
[07:08:28.538] # pg_rewind: error: could not find previous WAL record at 0/040000E0
[07:08:28.538] # ------------------------------------
[07:08:28.538] # Failed test 'rewound node reflects source history, not target TLI 2/TLI 3 data'
[07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 260.
[07:08:28.538] # got: 'origin2
[07:08:28.538] # x'
[07:08:28.538] # expected: 'b
[07:08:28.538] # origin2'
[07:08:28.538] # Looks like you failed 2 tests of 11.
[07:08:28.538]
[07:08:28.538] (test program exited with status code 2)
[07:08:28.538] ------------------------------------------------------------------------------
[07:08:28.538]
Thanks for testing it on Windows.
It seems like the path needs to be cleaned on Windows. I checked
Cluster.pm and created a version of that code and added that to the test
that should work. See attached patch.
I noted that many of the paths are not platform-agnostic. It an idea to
switch to use something like File::Spec instead and build paths using
that, but it's out of scope for this patch.
Best wishes,
Mats Kindahl, Multigres Engineer, Supabase
Attachments:
v6.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchtext/x-patch; charset=UTF-8; name=v6.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patchDownload+615-24
Hi, Mats
On Sat, 30 May 2026 at 22:26, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Hi Japin,
On 5/29/26 04:01, Japin Li wrote:
Hi, Mats
On Tue, 26 May 2026 at 18:03, Mats Kindahl <mats.kindahl@gmail.com> wrote:
Attached a new version of the patch with the changes you suggested.
I found an error on the Windows platform [1].
[07:08:28.538] >>> MALLOC_PERTURB_=168
PG_REGRESS=C:\cirrus\build\src/test\regress\pg_regress.exe
REGRESS_SHLIB=C:\cirrus\build\src/test\regress\regress.dll
MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1
top_builddir=C:\cirrus\build
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1
MESON_TEST_ITERATION=1
PATH=C:\cirrus\build\tmp_install\usr\local\pgsql\bin;C:\cirrus\build\src\bin\pg_rewind;C:/cirrus/build/src/bin/pg_rewind/test;C:\VS_2019\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64;C:\VS_2019\MSBuild\Current\bin\Roslyn;C:\Program
Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64;C:\Program Files
(x86)\Windows
Kits\10\bin\x64;C:\VS_2019\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\VS_2019\Common7\IDE\;C:\VS_2019\Common7\Tools\;C:\VS_2019\VC\Auxiliary\Build;C:\zstd\zstd-v1.5.2-win64;C:\zlib;C:\lz4;C:\icu;C:\winflexbison;C:\strawberry\5.42.0.1\perl\bin;C:\python\Scripts\;C:\python\;C:\Windows
Kits\10\Debuggers\x64;C:\Program
Files\Git\usr\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program
Files\Google\Compute Engine\metadata_scripts;C:\Program Files
(x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program
Files\PowerShell\7\;C:\Program Files\Google\Compute
Engine\sysprep;C:\ProgramData\chocolatey\bin;C:\Program
Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program
Files\Git\usr\bin;C:\Windows\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps
INITDB_TEMPLATE=C:/cirrus/build/tmp_install/initdb-template
ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1
share_contrib_dir=C:/cirrus/build/tmp_install//usr/local/pgsql/share/contrib
C:\python\python3.EXE C:\cirrus\build\..\src/tools/testwrap
--basedir C:\cirrus\build --srcdir C:\cirrus\src\bin\pg_rewind
--pg-test-extra --testgroup pg_rewind --testname 005_same_timeline
-- C:\strawberry\5.42.0.1\perl\bin\perl.EXE -I
C:/cirrus/src/test/perl -I C:\cirrus\src\bin\pg_rewind
C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl
[07:08:28.538] ------------------------------------- 8< -------------------------------------
[07:08:28.538] stderr:
[07:08:28.538] # Failed test 'pg_rewind rewinds across mismatched TLI 2 / TLI 2-prime to TLI 1'
[07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 45.
[07:08:28.538] # ---------- command failed ----------
[07:08:28.538] # pg_rewind --debug --source-pgdata
C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_b2_data/pgdata
--target-pgdata
C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_a2_data/pgdata
--no-sync --config-file
C:\cirrus\build\testrun\pg_rewind\005_same_timeline\data\tmp_test_ZCeZ/target-postgresql.conf.tmp
--restore-target-wal
[07:08:28.538] # -------------- stderr --------------
[07:08:28.538] # pg_rewind: using for rewind "restore_command = 'cp "C:cirrusuild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/%f" "%p"'"
[07:08:28.538] # pg_rewind: Source timeline history:
[07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
[07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/00000000
[07:08:28.538] # pg_rewind: Target timeline history:
[07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
[07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/060000E0
[07:08:28.538] # pg_rewind: 3: 0/060000E0 - 0/00000000
[07:08:28.538] # pg_rewind: servers diverged at WAL location 0/040000E0 on timeline 1
[07:08:28.538] # cp: cannot stat 'C:cirrus'$'\b''uild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/000000020000000000000004': No such file or directory
[07:08:28.538] # pg_rewind: error: could not restore file "000000020000000000000004" from archive
[07:08:28.538] # pg_rewind: error: could not find previous WAL record at 0/040000E0
[07:08:28.538] # ------------------------------------
[07:08:28.538] # Failed test 'rewound node reflects source history, not target TLI 2/TLI 3 data'
[07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 260.
[07:08:28.538] # got: 'origin2
[07:08:28.538] # x'
[07:08:28.538] # expected: 'b
[07:08:28.538] # origin2'
[07:08:28.538] # Looks like you failed 2 tests of 11.
[07:08:28.538]
[07:08:28.538] (test program exited with status code 2)
[07:08:28.538] ------------------------------------------------------------------------------
[07:08:28.538]Thanks for testing it on Windows.
It seems like the path needs to be cleaned on Windows. I checked
Cluster.pm and created a version of that code and added that to the
test that should work. See attached patch.I noted that many of the paths are not platform-agnostic. It an idea
to switch to use something like File::Spec instead and build paths
using that, but it's out of scope for this patch.
Thanks for updating the patch. LGTM.
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.