Recovery of .partial WAL segments

Started by Stefan Fercotalmost 2 years ago7 messages
#1Stefan Fercot
stefan.fercot@protonmail.com
1 attachment(s)

Dear hackers,

Generating a ".partial" WAL segment is pretty common nowadays (using pg_receivewal or during standby promotion).
However, we currently don't do anything with it unless the user manually removes that ".partial" extension.

The 028_pitr_timelines tests are highlighting that fact: with test data being being in 000000020000000000000003 and 000000010000000000000003.partial, a recovery following the latest timeline (2) will succeed but fail if we follow the current timeline (1).

By simply trying to fetch the ".partial" file in XLogFileRead, we can easily recover more data and also cover that (current timeline) recovery case.

So, this proposed patch makes XLogFileRead try to restore ".partial" WAL archives and adds a test to 028_pitr_timelines using current recovery_target_timeline.

As far as I've seen, the current pg_receivewal tests only seem to cover the archives generation but not actually trying to recover using it. I wasn't sure it was interesting to add such tests right now, so I didn't considered it for this patch.

Many thanks in advance for your feedback and thoughts about this,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

Attachments:

v1-0001-Make-XLogFileRead-try-to-restore-.partial-wal-arc.patchtext/x-patch; name=v1-0001-Make-XLogFileRead-try-to-restore-.partial-wal-arc.patchDownload
From 8c73284b72120ddf3537e1632f12455502d47f6d Mon Sep 17 00:00:00 2001
From: Stefan Fercot <pgstef@fercot.be>
Date: Fri, 5 Apr 2024 10:57:03 +0200
Subject: [PATCH v1.] Make XLogFileRead try to restore .partial wal archives

Try to restore the normal archived wal segment first and, if not found, then try to restore the archived .partial wal segment.
This is safe because the next completed wal segment should contain at least the same data.
---
 src/backend/access/transam/xlogrecovery.c | 20 ++++++++++--
 src/test/recovery/t/028_pitr_timelines.pl | 37 ++++++++++++++++++++++-
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index b2fe2d04cc..6d51b6a296 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4206,6 +4206,8 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	char	   *partialxlogfname;
+	bool		restoredArchivedFile;
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -4217,10 +4219,24 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
-			if (!RestoreArchivedFile(path, xlogfname,
+			/*
+			 * Try to restore the normal wal segment first and, if not found,
+			 * then try to restore the .partial wal segment.
+			 */
+
+			partialxlogfname = psprintf("%s.partial", xlogfname);
+
+			restoredArchivedFile = !RestoreArchivedFile(path, xlogfname,
+														"RECOVERYXLOG",
+														wal_segment_size,
+														InRedo) &&
+				!RestoreArchivedFile(path, partialxlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
-									 InRedo))
+									 InRedo);
+
+			pfree(partialxlogfname);
+			if (restoredArchivedFile)
 				return -1;
 			break;
 
diff --git a/src/test/recovery/t/028_pitr_timelines.pl b/src/test/recovery/t/028_pitr_timelines.pl
index 4b7d825b71..512695cd0a 100644
--- a/src/test/recovery/t/028_pitr_timelines.pl
+++ b/src/test/recovery/t/028_pitr_timelines.pl
@@ -110,7 +110,7 @@ $node_standby->stop;
 # segment 000000020000000000000003, before the timeline switching
 # record.  (They are also present in the
 # 000000010000000000000003.partial file, but .partial files are not
-# used automatically.)
+# used when recovering along the latest timeline by default.)
 
 # Now test PITR to the recovery target.  It should find the WAL in
 # segment 000000020000000000000003, but not follow the timeline switch
@@ -173,4 +173,39 @@ $node_pitr2->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
 $result = $node_pitr2->safe_psql('postgres', "SELECT max(i) FROM foo;");
 is($result, qq{3}, "check table contents after point-in-time recovery");
 
+# The 000000010000000000000003.partial file could have been generated
+# by pg_receivewal without any standby node involved. In this case, we
+# wouldn't be able to recover from 000000020000000000000003.
+# Now, test PITR to the initial recovery target staying on the backup's
+# current timeline, trying to fetch the data from the
+# 000000010000000000000003.partial file.
+
+my $node_pitr3 = PostgreSQL::Test::Cluster->new('node_pitr3');
+$node_pitr3->init_from_backup(
+	$node_primary, $backup_name,
+	standby => 0,
+	has_restoring => 1);
+$node_pitr3->append_conf(
+	'postgresql.conf', qq{
+recovery_target_name = 'rp'
+recovery_target_action = 'promote'
+recovery_target_timeline = 'current'
+});
+
+my $log_offset = -s $node_pitr3->logfile;
+$node_pitr3->start;
+
+ok( $node_pitr3->log_contains(
+		"restored log file \"000000010000000000000003.partial\" from archive",
+		$log_offset),
+	"restored 000000010000000000000003.partial");
+
+# Wait until recovery finishes.
+$node_pitr3->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
+  or die "Timed out while waiting for PITR promotion";
+
+# Check that we see the data we expect.
+$result = $node_pitr3->safe_psql('postgres', "SELECT max(i) FROM foo;");
+is($result, qq{1}, "check table contents after point-in-time recovery");
+
 done_testing();
-- 
2.34.1

#2Stefan Fercot
stefan.fercot@dataegret.com
In reply to: Stefan Fercot (#1)
Re: Recovery of .partial WAL segments

Hi,

I've added a CF entry for this patch:
https://commitfest.postgresql.org/49/5148/

Not sure why CFbot CI fails on macOS/Windows while it works with the Github
CI on my fork (
https://cirrus-ci.com/github/pgstef/postgres/partial-walseg-recovery).

Many thanks in advance for your feedback and thoughts about this patch,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

On Thu, Aug 1, 2024 at 10:23 PM Stefan Fercot <stefan.fercot@protonmail.com>
wrote:

Show quoted text

Dear hackers,

Generating a ".partial" WAL segment is pretty common nowadays (using
pg_receivewal or during standby promotion).
However, we currently don't do anything with it unless the user manually
removes that ".partial" extension.

The 028_pitr_timelines tests are highlighting that fact: with test data
being being in 000000020000000000000003 and
000000010000000000000003.partial, a recovery following the latest timeline
(2) will succeed but fail if we follow the current timeline (1).

By simply trying to fetch the ".partial" file in XLogFileRead, we can
easily recover more data and also cover that (current timeline) recovery
case.

So, this proposed patch makes XLogFileRead try to restore ".partial" WAL
archives and adds a test to 028_pitr_timelines using current
recovery_target_timeline.

As far as I've seen, the current pg_receivewal tests only seem to cover
the archives generation but not actually trying to recover using it. I
wasn't sure it was interesting to add such tests right now, so I didn't
considered it for this patch.

Many thanks in advance for your feedback and thoughts about this,
Kind Regards,
--
Stefan FERCOT
Data Egret (https://dataegret.com)

#3Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Stefan Fercot (#2)
Re: Recovery of .partial WAL segments

On Fri, Aug 02, 2024 at 08:47:02AM GMT, Stefan Fercot wrote:

Not sure why CFbot CI fails on macOS/Windows while it works with the Github
CI on my fork (
https://cirrus-ci.com/github/pgstef/postgres/partial-walseg-recovery).

I guess it's because the test has to wait a bit after the node has been
started until the log lines will appear. One can see it in the
node_pitr3 logs, first it was hit by

SELECT pg_is_in_recovery() = 'f'

and only some moments later produced

restored log file "000000010000000000000003.partial" from archive

where the test has those operations in reversed order. Seems like the
retry loop from 019_replslot_limit might help.

#4Stefan Fercot
stefan.fercot@dataegret.com
In reply to: Dmitry Dolgov (#3)
1 attachment(s)
Re: Recovery of .partial WAL segments

Hi,

On Fri, Aug 9, 2024 at 4:29 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Seems like the retry loop from 019_replslot_limit might help.

Thanks for the tip. Attached v2 adds the retry loop in the test which would
hopefully fix the cfbot.

Kind Regards,
Stefan

Attachments:

v2-0001-Make-XLogFileRead-try-to-restore-.partial-wal-arc.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Make-XLogFileRead-try-to-restore-.partial-wal-arc.patchDownload
From 203747394503b479338980948ccb3df84dc8d1a1 Mon Sep 17 00:00:00 2001
From: Stefan Fercot <pgstef@fercot.be>
Date: Fri, 5 Apr 2024 10:57:03 +0200
Subject: [PATCH v2.] Make XLogFileRead try to restore .partial wal archives

Try to restore the normal archived wal segment first and, if not found, then try to restore the archived .partial wal segment.
This is safe because the next completed wal segment should contain at least the same data.
---
 src/backend/access/transam/xlogrecovery.c | 20 +++++++++-
 src/test/recovery/t/028_pitr_timelines.pl | 47 ++++++++++++++++++++++-
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 2ed3ea2b45..c8e793e3d9 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4196,6 +4196,8 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 	char		activitymsg[MAXFNAMELEN + 16];
 	char		path[MAXPGPATH];
 	int			fd;
+	char	   *partialxlogfname;
+	bool		restoredArchivedFile;
 
 	XLogFileName(xlogfname, tli, segno, wal_segment_size);
 
@@ -4207,10 +4209,24 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 					 xlogfname);
 			set_ps_display(activitymsg);
 
-			if (!RestoreArchivedFile(path, xlogfname,
+			/*
+			 * Try to restore the normal wal segment first and, if not found,
+			 * then try to restore the .partial wal segment.
+			 */
+
+			partialxlogfname = psprintf("%s.partial", xlogfname);
+
+			restoredArchivedFile = !RestoreArchivedFile(path, xlogfname,
+														"RECOVERYXLOG",
+														wal_segment_size,
+														InRedo) &&
+				!RestoreArchivedFile(path, partialxlogfname,
 									 "RECOVERYXLOG",
 									 wal_segment_size,
-									 InRedo))
+									 InRedo);
+
+			pfree(partialxlogfname);
+			if (restoredArchivedFile)
 				return -1;
 			break;
 
diff --git a/src/test/recovery/t/028_pitr_timelines.pl b/src/test/recovery/t/028_pitr_timelines.pl
index 4b7d825b71..a958a0c6e7 100644
--- a/src/test/recovery/t/028_pitr_timelines.pl
+++ b/src/test/recovery/t/028_pitr_timelines.pl
@@ -110,7 +110,7 @@ $node_standby->stop;
 # segment 000000020000000000000003, before the timeline switching
 # record.  (They are also present in the
 # 000000010000000000000003.partial file, but .partial files are not
-# used automatically.)
+# used when recovering along the latest timeline by default.)
 
 # Now test PITR to the recovery target.  It should find the WAL in
 # segment 000000020000000000000003, but not follow the timeline switch
@@ -173,4 +173,49 @@ $node_pitr2->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
 $result = $node_pitr2->safe_psql('postgres', "SELECT max(i) FROM foo;");
 is($result, qq{3}, "check table contents after point-in-time recovery");
 
+# The 000000010000000000000003.partial file could have been generated
+# by pg_receivewal without any standby node involved. In this case, we
+# wouldn't be able to recover from 000000020000000000000003.
+# Now, test PITR to the initial recovery target staying on the backup's
+# current timeline, trying to fetch the data from the
+# 000000010000000000000003.partial file.
+
+my $node_pitr3 = PostgreSQL::Test::Cluster->new('node_pitr3');
+$node_pitr3->init_from_backup(
+	$node_primary, $backup_name,
+	standby => 0,
+	has_restoring => 1);
+$node_pitr3->append_conf(
+	'postgresql.conf', qq{
+recovery_target_name = 'rp'
+recovery_target_action = 'promote'
+recovery_target_timeline = 'current'
+});
+
+my $log_offset = -s $node_pitr3->logfile;
+$node_pitr3->start;
+
+my $msg_logged = 0;
+my $max_attempts = $PostgreSQL::Test::Utils::timeout_default;
+while ($max_attempts-- >= 0)
+{
+	if ($node_pitr3->log_contains(
+			"restored log file \"000000010000000000000003.partial\" from archive",
+			$log_offset))
+	{
+		$msg_logged = 1;
+		last;
+	}
+	sleep 1;
+}
+ok($msg_logged, "restored 000000010000000000000003.partial");
+
+# Wait until recovery finishes.
+$node_pitr3->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
+  or die "Timed out while waiting for PITR promotion";
+
+# Check that we see the data we expect.
+$result = $node_pitr3->safe_psql('postgres', "SELECT max(i) FROM foo;");
+is($result, qq{1}, "check table contents after point-in-time recovery");
+
 done_testing();
-- 
2.34.1

#5Stepan Neretin
sndcppg@gmail.com
In reply to: Stefan Fercot (#4)
Re: Recovery of .partial WAL segments

Hi there! It looks good to me! But I have a question. How is the
partialxlogfname freed in case of an error?
Best regards, Stepan Neretin.

#6Stefan Fercot
stefan.fercot@dataegret.com
In reply to: Stepan Neretin (#5)
Re: Recovery of .partial WAL segments

Hi,

On Fri, Oct 18, 2024 at 11:07 AM Stepan Neretin <sndcppg@gmail.com> wrote:

Hi there! It looks good to me!

Thanks!

But I have a question. How is the partialxlogfname freed in case of an

error?

I'm not sure I understand your question. What kind of error would you
expect exactly?
I mostly added a pfree there because we don't need it further. But why
would it be more of a problem compared to other variables/pointers?

Kind regards,
Stefan

#7Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Stefan Fercot (#1)
Re: Recovery of .partial WAL segments

On Fri, 5 Apr 2024 at 11:45, Stefan Fercot <stefan.fercot@protonmail.com> wrote:

Dear hackers,

Generating a ".partial" WAL segment is pretty common nowadays (using pg_receivewal or during standby promotion).
However, we currently don't do anything with it unless the user manually removes that ".partial" extension.

The 028_pitr_timelines tests are highlighting that fact: with test data being being in 000000020000000000000003 and 000000010000000000000003.partial, a recovery following the latest timeline (2) will succeed but fail if we follow the current timeline (1).

By simply trying to fetch the ".partial" file in XLogFileRead, we can easily recover more data and also cover that (current timeline) recovery case.

So, this proposed patch makes XLogFileRead try to restore ".partial" WAL archives and adds a test to 028_pitr_timelines using current recovery_target_timeline.

Does this path only get hit when we don't already have any WAL
segments (or partial segments) left for that timeline? I'm a bit
worried about overwriting existing (partial) segments that may have
more WAL than what we can get from archives.

(patch v2)

+            restoredArchivedFile = !RestoreArchivedFile(path, xlogfname,
+                                                        "RECOVERYXLOG",
+                                                        wal_segment_size,
+                                                        InRedo) &&
+                !RestoreArchivedFile(path, partialxlogfname,
"RECOVERYXLOG",
wal_segment_size,
-                                     InRedo))
+                                     InRedo);

The value of restoredArchiveFile is inverted with what it indicates:
It is true when we failed to restore an archived xlog segment, and
false if we did succeed.

I'm also not a fan of the additional allocation of partialxlogfname in
this code. It could well do without, by "just" reusing the xlogfname
scratch space when we fail to recover the full segment.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)