Should the archiver process always make sure that the timeline history files exist in the archive?
Hello pgsql-hackers,
While testing out some WAL archiving and PITR scenarios, it was observed that
enabling WAL archiving for the first time on a primary that was on a timeline
higher than 1 would not initially archive the timeline history file for the
timeline it was currently on. While this might be okay for most use cases, there
are scenarios where this leads to unexpected failures that seem to expose some
flaws in the logic.
Scenario 1:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Create a
standby with that backup that will be continuously restoring from the WAL
archives, the standby will not contain the timeline 2 history file. The standby
will operate normally but if you try to create a cascading standby off it using
streaming replication, the cascade standby's WAL receiver will continuously
FATAL trying to request the timeline 2 history file that the main standby does
not have.
Scenario 2:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Then try
to create a new node by doing PITR with recovery_target_timeline set to
'current' or 'latest' which will succeed. However, doing PITR with
recovery_target_timeline = '2' will fail since it is unable to find the timeline
2 history file in the WAL archives. This may be a bit contradicting since we
allow 'current' and 'latest' to recover but explicitly setting the
recovery_target_timeline to the control file's timeline id ends up with failure.
Attached is a patch containing two TAP tests that demonstrate the scenarios.
My questions are:
1. Why doesn't the archiver process try to archive timeline history files when
WAL archiving is first configured and/or continually check (maybe when the
archiver process gets started before the main loop)?
2. Why does explicitly setting the recovery_target_timeline to the control
file's timeline id not follow the same logic as recovery_target_timeline set
to 'current'?
3. Why does a cascaded standby require the timeline history file of its control
file's timeline id (startTLI) when the main replica is able to operate fine
without the timeline history file?
Note that my initial observations came from testing with pgBackRest (copying
pg_wal/ during backup is disabled by default) but using `pg_basebackup -Xnone`
reproduced the issues similarly and is what I present in the TAP tests. At the
moment, the only workaround I can think of is to manually run the
archive_command on the missing timeline history file(s).
Are these valid issues that should be looked into or are they expected? Scenario
2 seems like it could be easily fixed if we determine that the
recovery_target_timeline numeric value is equal to the control file's timeline
id (compare rtli and recoveryTargetTLI in validateRecoveryParameters()?) but I
wasn't sure if maybe the opposite was true where we should make 'current' and
'latest' require retrieving the timeline history files instead to help prevent
Scenario 1.
Regards,
Jimmy Yih
Attachments:
0001-TAP-tests-to-show-missing-timeline-history-issues.patchapplication/octet-stream; name=0001-TAP-tests-to-show-missing-timeline-history-issues.patchDownload
From 8e77a72089301886667f1761fd99c87f4df3f456 Mon Sep 17 00:00:00 2001
From: Jimmy Yih <jyih@vmware.com>
Date: Wed, 9 Aug 2023 16:50:04 -0700
Subject: [PATCH] TAP tests to show missing timeline history issues
While testing out some WAL archiving and PITR scenarios, it was
observed that enabling WAL archiving for the first time on a primary
that was on a timeline higher than 1 would not initially archive the
timeline history file for the timeline it was currently on. While this
might be okay for most use cases, there are scenarios where this leads
to unexpected failures that seem to expose some flaws in the logic.
This patch contains TAP tests that help demonstrate the issues.
---
.../t/038_cascade_with_no_timeline_history.pl | 143 ++++++++++++++++++
...039_recovery_target_no_timeline_history.pl | 123 +++++++++++++++
2 files changed, 266 insertions(+)
create mode 100644 src/test/recovery/t/038_cascade_with_no_timeline_history.pl
create mode 100644 src/test/recovery/t/039_recovery_target_no_timeline_history.pl
diff --git a/src/test/recovery/t/038_cascade_with_no_timeline_history.pl b/src/test/recovery/t/038_cascade_with_no_timeline_history.pl
new file mode 100644
index 0000000000..9f18d2dd7d
--- /dev/null
+++ b/src/test/recovery/t/038_cascade_with_no_timeline_history.pl
@@ -0,0 +1,143 @@
+# This test showcases a seemingly valid scenario where a primary on
+# timeline 2 has a standby which itself has a cascaded standby. The
+# main standby (created by a backup taken with -Xnone and recovers via
+# WAL archives) does not contain the timeline 2 history file and is
+# unable to serve it to the cascaded standby. The cascaded standby
+# will continuously FATAL trying to request the timeline 2 history
+# file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use File::Path qw(rmtree);
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to a new timeline
+my $psql_out = '';
+$node_primary_tli2->psql(
+ 'postgres',
+ "SELECT pg_promote(wait_seconds => 300)",
+ stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node. The timeline 2 history file
+# will not be pushed to the archive.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file has not been
+# archived. Timeline history file archival only happens when the
+# timeline history file is created which only occurs in two areas:
+# 1. When a standby is configured for archiving (archive_mode and
+# archive_command set) and is promoted. A timeline history file for
+# the new timeline will be created and will be immediately marked
+# as ready for archiving.
+# 2. When a standby is configured for archiving (archive_mode set to
+# 'always' and archive_command is set) and receives a timeline
+# history file from the primary via streaming replication. The file
+# will be marked as ready for archiving.
+#
+# Note: This seems to be the root cause of the failures that follow
+# because a lot of recovery logic seems to rely on the timeline
+# history files being retrievable. However, I'm not sure if this logic
+# is intentional or not.
+my $primary_archive = $node_primary_tli2->archive_dir;
+my $result_primary_tli2 =
+ $node_primary_tli2->safe_psql('postgres', "SELECT size IS NULL FROM pg_stat_file('$primary_archive/00000002.history', true)");
+is($result_primary_tli2, qq(t), 'see that the timeline 2 history file was not archived');
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal is empty.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a");
+
+# Create a restore point to later use as the recovery_target_name.
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_create_restore_point('$recovery_name');");
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+my $archive_wait_query =
+ "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+ or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will
+# recover onto the same timeline designated in the control file by
+# setting recovery_target_timeline to 'current'. The timeline 2
+# history file is not retrievable but seems to not be required. Note
+# that setting recovery_target_timeline to 'latest' would also create
+# the same scenario but using 'current' helps decrease the scope of
+# the problem.
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+ has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = 'current'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+$node_standby->start;
+
+# Sanity check that the node came up and is queryable
+my $result_standby =
+ $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+ $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+
+# Set up the cascade standby
+my $node_cascade = PostgreSQL::Test::Cluster->new('cascade');
+$node_cascade->init_from_backup($node_primary_tli2, $backup_name,
+ standby => 1);
+$node_cascade->enable_streaming($node_standby);
+# This will fail to start up because the WAL receiver continuously
+# FATALs out. The test will end here in failure.
+$node_cascade->start;
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade);
+
+# Sanity check that the node came up and is queryable
+my $result_cascade =
+ $node_cascade->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_cascade, qq(8), 'check that the node received the streamed WAL data');
+
+$node_standby->teardown_node;
+$node_cascade->teardown_node;
+
+done_testing();
diff --git a/src/test/recovery/t/039_recovery_target_no_timeline_history.pl b/src/test/recovery/t/039_recovery_target_no_timeline_history.pl
new file mode 100644
index 0000000000..7682593bd0
--- /dev/null
+++ b/src/test/recovery/t/039_recovery_target_no_timeline_history.pl
@@ -0,0 +1,123 @@
+# Test that we can do a recovery when the timeline history file is
+# unavailable and the recovery_target_timeline requested is equal to
+# the timeline in the control file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use File::Path qw(rmtree);
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to a new timeline
+my $psql_out = '';
+$node_primary_tli2->psql(
+ 'postgres',
+ "SELECT pg_promote(wait_seconds => 300)",
+ stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node. The timeline 2 history file
+# will not be pushed to the archive.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file has not been
+# archived. Timeline history file archival only happens when the
+# timeline history file is created which only occurs in two areas:
+# 1. When a standby is configured for archiving (archive_mode and
+# archive_command set) and is promoted. A timeline history file for
+# the new timeline will be created and will be immediately marked
+# as ready for archiving.
+# 2. When a standby is configured for archiving (archive_mode set to
+# 'always' and archive_command is set) and receives a timeline
+# history file from the primary via streaming replication. The file
+# will be marked as ready for archiving.
+#
+# Note: This seems to be the root cause of the failures that follow
+# because a lot of recovery logic seems to rely on the timeline
+# history files being retrievable. However, I'm not sure if this logic
+# is intentional or not.
+my $primary_archive = $node_primary_tli2->archive_dir;
+my $result_primary_tli2 =
+ $node_primary_tli2->safe_psql('postgres', "SELECT size IS NULL FROM pg_stat_file('$primary_archive/00000002.history', true)");
+is($result_primary_tli2, qq(t), 'see that the timeline 2 history file was not archived');
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal is empty.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a");
+
+# Create a restore point to later use as the recovery_target_name.
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_create_restore_point('$recovery_name');");
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+my $archive_wait_query =
+ "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+ or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will start
+# off on timeline 2 according to the control file and will finish
+# recovery onto the same timeline by explicitly setting
+# recovery_target_timeline to '2'. The timeline 2 history file is not
+# retrievable but is required. Shouldn't this scenario act the same as
+# setting recovery_target_timeline to 'current' which does not require
+# a timeline history file to be retrieved?
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+ has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = '2'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+# This will fail to start up because the timeline 2 history file is
+# not retrievable from the WAL archive. The test will end here in
+# failure.
+$node_standby->start;
+
+# Sanity check that the node came up and is queryable
+my $result_standby =
+ $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+ $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+$node_standby->teardown_node;
+
+done_testing();
--
2.24.3 (Apple Git-128)
Hello pgsql-hackers,
After doing some more debugging on the matter, I believe this issue might be a
minor regression from commit 5332b8cec541. Prior to that commit, the archiver
process when first started on a previously promoted primary would have all the
timeline history files marked as ready for immediate archiving. If that had
happened, none of my mentioned failure scenarios would be theoretically possible
(barring someone manually deleting the timeline history files). With that in
mind, I decided to look more into my Question 1 and created a patch proposal.
The attached patch will try to archive the current timeline history file if it
has not been archived yet when the archiver process starts up.
Regards,
Jimmy Yih
________________________________________
From: Jimmy Yih <jyih@vmware.com>
Sent: Wednesday, August 9, 2023 5:00 PM
To: pgsql-hackers@postgresql.org
Subject: Should the archiver process always make sure that the timeline history files exist in the archive?
Hello pgsql-hackers,
While testing out some WAL archiving and PITR scenarios, it was observed that
enabling WAL archiving for the first time on a primary that was on a timeline
higher than 1 would not initially archive the timeline history file for the
timeline it was currently on. While this might be okay for most use cases, there
are scenarios where this leads to unexpected failures that seem to expose some
flaws in the logic.
Scenario 1:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Create a
standby with that backup that will be continuously restoring from the WAL
archives, the standby will not contain the timeline 2 history file. The standby
will operate normally but if you try to create a cascading standby off it using
streaming replication, the cascade standby's WAL receiver will continuously
FATAL trying to request the timeline 2 history file that the main standby does
not have.
Scenario 2:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Then try
to create a new node by doing PITR with recovery_target_timeline set to
'current' or 'latest' which will succeed. However, doing PITR with
recovery_target_timeline = '2' will fail since it is unable to find the timeline
2 history file in the WAL archives. This may be a bit contradicting since we
allow 'current' and 'latest' to recover but explicitly setting the
recovery_target_timeline to the control file's timeline id ends up with failure.
Attached is a patch containing two TAP tests that demonstrate the scenarios.
My questions are:
1. Why doesn't the archiver process try to archive timeline history files when
WAL archiving is first configured and/or continually check (maybe when the
archiver process gets started before the main loop)?
2. Why does explicitly setting the recovery_target_timeline to the control
file's timeline id not follow the same logic as recovery_target_timeline set
to 'current'?
3. Why does a cascaded standby require the timeline history file of its control
file's timeline id (startTLI) when the main replica is able to operate fine
without the timeline history file?
Note that my initial observations came from testing with pgBackRest (copying
pg_wal/ during backup is disabled by default) but using `pg_basebackup -Xnone`
reproduced the issues similarly and is what I present in the TAP tests. At the
moment, the only workaround I can think of is to manually run the
archive_command on the missing timeline history file(s).
Are these valid issues that should be looked into or are they expected? Scenario
2 seems like it could be easily fixed if we determine that the
recovery_target_timeline numeric value is equal to the control file's timeline
id (compare rtli and recoveryTargetTLI in validateRecoveryParameters()?) but I
wasn't sure if maybe the opposite was true where we should make 'current' and
'latest' require retrieving the timeline history files instead to help prevent
Scenario 1.
Regards,
Jimmy Yih
Attachments:
0001-Archive-current-timeline-history-file-on-archiver-st.patchapplication/octet-stream; name=0001-Archive-current-timeline-history-file-on-archiver-st.patchDownload
From 7e282aea9795cd4749cfd04fb824bb2971206cc3 Mon Sep 17 00:00:00 2001
From: Jimmy Yih <jyih@vmware.com>
Date: Wed, 9 Aug 2023 16:50:04 -0700
Subject: [PATCH] Archive current timeline history file on archiver startup if
needed
Previously, the only time the timeline history file would be archived
was when a standby already configured for WAL archiving was
promoted. If WAL archiving was set up and started after the standby
was promoted, its current timeline history file would not be found in
the archive. This could cause issues when restoring backups that did
not include the timeline history files (e.g. pg_basebackup
-Xnone). Some example failures include failing to restore with
recovery_target_timeline explicitly set to the control file's timeline
id and failing to create cascade standbys after recovering with
recovery_target_timeline set to 'current' or 'latest'. To prevent
these restore issues, we now make sure that the current timeline
history file has been archived right before the archiver loop starts.
---
src/backend/access/transam/xlogarchive.c | 6 +-
src/backend/postmaster/pgarch.c | 26 ++++
.../t/038_archive_current_timeline_history.pl | 134 ++++++++++++++++++
3 files changed, 163 insertions(+), 3 deletions(-)
create mode 100644 src/test/recovery/t/038_archive_current_timeline_history.pl
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index f3fb92c8f9..dee679bd18 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -650,9 +650,9 @@ XLogArchiveIsBusy(const char *xlog)
* This is similar to XLogArchiveIsBusy(), but returns true if the file
* is already archived or is about to be archived.
*
- * This is currently only used at recovery. During normal operation this
- * would be racy: the file might get removed or marked with .ready as we're
- * checking it, or immediately after we return.
+ * This is primarily used at recovery. During normal operation this would be
+ * racy: the file might get removed or marked with .ready as we're checking
+ * it, or immediately after we return.
*/
bool
XLogArchiveIsReadyOrDone(const char *xlog)
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 46af349564..65bd3db635 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -31,6 +31,7 @@
#include "access/xlog.h"
#include "access/xlog_internal.h"
+#include "access/xlogarchive.h"
#include "archive/archive_module.h"
#include "archive/shell_archive.h"
#include "lib/binaryheap.h"
@@ -371,10 +372,35 @@ static void
pgarch_ArchiverCopyLoop(void)
{
char xlog[MAX_XFN_CHARS + 1];
+ TimeLineID current_timeline_id;
/* force directory scan in the first call to pgarch_readyXlog() */
arch_files->arch_files_size = 0;
+ /*
+ * We should make sure that the timeline history file has been archived if
+ * this is the first time the archiver loop is being executed. Skip if the
+ * current timeline ID is 1 since there's no timeline history file for it.
+ */
+ current_timeline_id = GetWALInsertionTimeLine();
+ if (current_timeline_id > 1)
+ {
+ char histfname[MAXFNAMELEN];
+
+ TLHistoryFileName(histfname, current_timeline_id);
+
+ /*
+ * Timeline history .done files do not get removed automatically so
+ * this check should be valid to make sure we don't archive the
+ * timeline history file again on restart. However, if the timeline
+ * history .done file was manually removed for some reason, then we
+ * make the assumption that the archive_command is set up properly to
+ * gracefully handle the re-archiving attempt.
+ */
+ if (!XLogArchiveIsReadyOrDone(histfname))
+ XLogArchiveNotify(histfname);
+ }
+
/*
* loop through all xlogs with archive_status of .ready and archive
* them...mostly we expect this to be a single file, though it is possible
diff --git a/src/test/recovery/t/038_archive_current_timeline_history.pl b/src/test/recovery/t/038_archive_current_timeline_history.pl
new file mode 100644
index 0000000000..41e2b5b2d4
--- /dev/null
+++ b/src/test/recovery/t/038_archive_current_timeline_history.pl
@@ -0,0 +1,134 @@
+# Test that setting up and starting WAL archiving on an
+# already-promoted node will result in the archival of its current
+# timeline history file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary; it's not needed anymore
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to timeline 2
+my $psql_out = '';
+$node_primary_tli2->psql(
+ 'postgres',
+ "SELECT pg_promote(wait_seconds => 300);",
+ stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node. The timeline 2 history file
+# will be pushed to the archive.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file has been archived. The file
+# is marked ready and immediately archived when the archiver process
+# starts up but loop to make this check deterministic.
+my $primary_archive = $node_primary_tli2->archive_dir;
+$node_primary_tli2->poll_query_until('postgres',
+ "SELECT size IS NOT NULL FROM pg_stat_file('$primary_archive/00000002.history', true);")
+ or die "Timed out while waiting for 00000002.history to be archived";
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal will
+# be empty and restore will retrieve the necessary WAL and timeline
+# history file(s) from the archive.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a;");
+
+# Create a restore point to later use as the recovery_target_name
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_create_restore_point('$recovery_name');");
+
+# Find the next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make the WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal();');
+
+# Wait until the WAL segment has been archived
+my $archive_wait_query =
+ "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+ or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will start
+# off on timeline 2 according to the control file and will finish
+# recovery onto the same timeline by explicitly setting
+# recovery_target_timeline to '2'. We explicitly set the timeline id
+# because startup will fail if the timeline history file is not
+# retrievable from the archive but will not fail if we use 'current'
+# or 'latest'.
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+ has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = '2'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+$node_standby->start;
+
+# Sanity check that the timeline history file was retrieved
+my $log_contents = slurp_file($node_standby->logfile);
+ok( $log_contents =~ qr/restored log file "00000002.history" from archive/,
+ "00000002.history retrieved from the archives");
+ok ( -f $node_standby->data_dir . "/pg_wal/00000002.history",
+ "00000002.history exists in the standby's pg_wal directory");
+
+# Sanity check that the node is queryable
+my $result_standby =
+ $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+ $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+
+# Set up a cascade standby node to validate that there's no issues
+# since the WAL receiver will request all necessary timeline history
+# files from the standby node's WAL sender.
+my $node_cascade = PostgreSQL::Test::Cluster->new('cascade');
+$node_cascade->init_from_backup($node_primary_tli2, $backup_name,
+ standby => 1);
+$node_cascade->enable_streaming($node_standby);
+$node_cascade->start;
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade);
+
+# Sanity check that the cascade standby node came up and is queryable
+my $result_cascade =
+ $node_cascade->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_cascade, qq(8), 'check that the node received the streamed WAL data');
+
+$node_standby->teardown_node;
+$node_cascade->teardown_node;
+
+done_testing();
--
2.24.3 (Apple Git-128)
At Wed, 16 Aug 2023 07:33:29 +0000, Jimmy Yih <jyih@vmware.com> wrote in
Hello pgsql-hackers,
After doing some more debugging on the matter, I believe this issue might be a
minor regression from commit 5332b8cec541. Prior to that commit, the archiver
process when first started on a previously promoted primary would have all the
timeline history files marked as ready for immediate archiving. If that had
happened, none of my mentioned failure scenarios would be theoretically possible
(barring someone manually deleting the timeline history files). With that in
mind, I decided to look more into my Question 1 and created a patch proposal.
The attached patch will try to archive the current timeline history file if it
has not been archived yet when the archiver process starts up.
In essence, after taking a subtle but not necessarily wrong steps,
there's a case where a primary server lacks the timeline history file
for the current timeline in both pg_wal and archive, even if that
timeline is larger than 1. This primary can start, but a new standby
created form the primary cannot start streaming, as it can't fetch the
timeline history file for the initial TLI.
A. The OP suggests archiving the timeline history file for the current
timeline every time the archiver starts. However, I don't think we
want to keep archiving the same file over and over. (Granted, we're
not always perfect at avoiding that..)
B. Given that the steps valid, I concur to what is described in the
test script provided: standbys don't really need that history file
for the initial TLI (though I have yet to fully verify this). If the
walreceiver just overlooks a fetch error for this file, the standby
can successfully start. (Just skipping the first history file seems
to work, but it feels a tad aggressive to me.)
C. If those steps aren't valid, we might want to add a note stating
that -X none basebackups do need the timeline history file for the
initial TLI. And don't forget to enable archive mode before the
latest timeline switch if any.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Thanks for the insightful response! I have attached an updated patch
that moves the proposed logic to the end of StartupXLOG where it seems
more correct to do this. It also helps with backporting (if it's
needed) since the archiver process only has access to shared memory
starting from Postgres 14.
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
A. The OP suggests archiving the timeline history file for the current
timeline every time the archiver starts. However, I don't think we
want to keep archiving the same file over and over. (Granted, we're
not always perfect at avoiding that..)
With the updated proposed patch, we'll be checking if the current
timeline history file needs to be archived at the end of StartupXLOG
if archiving is enabled. If it detects that a .ready or .done file
already exists, then it won't do anything (which will be the common
case). I agree though that this may be an excessive check since it'll
be a no-op the majority of the time. However, it shouldn't execute
often and seems like a quick safe preventive measure. Could you give
more details on why this would be too cumbersome?
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
B. Given that the steps valid, I concur to what is described in the
test script provided: standbys don't really need that history file
for the initial TLI (though I have yet to fully verify this). If the
walreceiver just overlooks a fetch error for this file, the standby
can successfully start. (Just skipping the first history file seems
to work, but it feels a tad aggressive to me.)
This was my initial thought as well but I wasn't sure if it was okay
to overlook the fetch error. Initial testing and brainstorming seems
to show that it's okay. I think the main bad thing is that these new
standbys will not have their initial timeline history files which can
be useful for administration. I've attached a patch that attempts this
approach if we want to switch to this approach as the solution. The
patch contains an updated TAP test as well to better showcase the
issue and fix.
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
C. If those steps aren't valid, we might want to add a note stating
that -X none basebackups do need the timeline history file for the
initial TLI.
The difficult thing about only documenting this is that it forces the
user to manually store and track the timeline history files. It can be
a bit cumbersome for WAL archiving users to recognize this scenario
when they're just trying to optimize their basebackups by using
-Xnone. But then again -Xnone does seem like it's designed for
advanced users so this might be okay.
Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
And don't forget to enable archive mode before the latest timeline
switch if any.
This might not be reasonable since a user could've been using
streaming replication and doing failover/failbacks as part of general
high availability to manage their Postgres without knowing they were
going to enable WAL archiving later on. The user would need to
configure archiving and force a failover which may not be
straightforward.
Regards,
Jimmy Yih
Attachments:
v2-0001-Archive-current-timeline-history-file-after-recovery.patchapplication/octet-stream; name=v2-0001-Archive-current-timeline-history-file-after-recovery.patchDownload
From 1b06231da8cd213e30c4fbe5115bfc87c8873754 Mon Sep 17 00:00:00 2001
From: Jimmy Yih <jyih@vmware.com>
Date: Wed, 9 Aug 2023 16:50:04 -0700
Subject: [PATCH] Archive current timeline history file after recovery finishes
if needed
Previously, the only time the timeline history file would be archived
was when a standby already configured for WAL archiving was
promoted. If WAL archiving was set up and started after the standby
was promoted, its current timeline history file would not be found in
the archive. This could cause issues when restoring backups that did
not include the timeline history files (e.g. pg_basebackup
-Xnone). Some example failures include failing to restore with
recovery_target_timeline explicitly set to the control file's timeline
id and failing to create cascade standbys after recovering with
recovery_target_timeline set to 'current' or 'latest'. To prevent
these restore issues, we now ensure that the current timeline history
file has been archived by marking it as ready for archiving after
recovery finishes if it was found that it has not already been
archived or marked ready.
---
src/backend/access/transam/xlog.c | 31 ++++
src/backend/access/transam/xlogarchive.c | 6 +-
.../t/038_archive_current_timeline_history.pl | 134 ++++++++++++++++++
3 files changed, 168 insertions(+), 3 deletions(-)
create mode 100644 src/test/recovery/t/038_archive_current_timeline_history.pl
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..99b53b9a7e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5800,6 +5800,37 @@ StartupXLOG(void)
if (standbyState != STANDBY_DISABLED)
ShutdownRecoveryTransactionEnvironment();
+ /*
+ * It's possible that this could be the first time WAL archiving has been
+ * enabled. If the timeline is greater than 1, we need to check if the
+ * current timeline history file needs to be archived. This will prevent
+ * any PITR-related issues later on where a timeline history file is
+ * required.
+ */
+ if (XLogArchivingActive())
+ {
+ TimeLineID currentTimeLineID;
+
+ currentTimeLineID = GetWALInsertionTimeLine();
+ if (currentTimeLineID > 1) {
+ char histfname[MAXFNAMELEN];
+
+ /*
+ * Timeline history .done files do not get removed automatically
+ * so this check should be valid to make sure we don't archive the
+ * timeline history file again on restart. However, if the
+ * timeline history .done file was manually removed for some
+ * reason, then we make the assumption that the archive_command is
+ * set up properly to gracefully handle the re-archiving attempt.
+ * If there's already a .ready or .done file, then there's nothing
+ * to do.
+ */
+ TLHistoryFileName(histfname, currentTimeLineID);
+ if (!XLogArchiveIsReadyOrDone(histfname))
+ XLogArchiveNotify(histfname);
+ }
+ }
+
/*
* If there were cascading standby servers connected to us, nudge any wal
* sender processes to notice that we've been promoted.
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index f3fb92c8f9..dee679bd18 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -650,9 +650,9 @@ XLogArchiveIsBusy(const char *xlog)
* This is similar to XLogArchiveIsBusy(), but returns true if the file
* is already archived or is about to be archived.
*
- * This is currently only used at recovery. During normal operation this
- * would be racy: the file might get removed or marked with .ready as we're
- * checking it, or immediately after we return.
+ * This is primarily used at recovery. During normal operation this would be
+ * racy: the file might get removed or marked with .ready as we're checking
+ * it, or immediately after we return.
*/
bool
XLogArchiveIsReadyOrDone(const char *xlog)
diff --git a/src/test/recovery/t/038_archive_current_timeline_history.pl b/src/test/recovery/t/038_archive_current_timeline_history.pl
new file mode 100644
index 0000000000..41e2b5b2d4
--- /dev/null
+++ b/src/test/recovery/t/038_archive_current_timeline_history.pl
@@ -0,0 +1,134 @@
+# Test that setting up and starting WAL archiving on an
+# already-promoted node will result in the archival of its current
+# timeline history file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary; it's not needed anymore
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to timeline 2
+my $psql_out = '';
+$node_primary_tli2->psql(
+ 'postgres',
+ "SELECT pg_promote(wait_seconds => 300);",
+ stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node. The timeline 2 history file
+# will be pushed to the archive.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file has been archived. The file
+# is marked ready and immediately archived when the archiver process
+# starts up but loop to make this check deterministic.
+my $primary_archive = $node_primary_tli2->archive_dir;
+$node_primary_tli2->poll_query_until('postgres',
+ "SELECT size IS NOT NULL FROM pg_stat_file('$primary_archive/00000002.history', true);")
+ or die "Timed out while waiting for 00000002.history to be archived";
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal will
+# be empty and restore will retrieve the necessary WAL and timeline
+# history file(s) from the archive.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a;");
+
+# Create a restore point to later use as the recovery_target_name
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_create_restore_point('$recovery_name');");
+
+# Find the next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make the WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal();');
+
+# Wait until the WAL segment has been archived
+my $archive_wait_query =
+ "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+ or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will start
+# off on timeline 2 according to the control file and will finish
+# recovery onto the same timeline by explicitly setting
+# recovery_target_timeline to '2'. We explicitly set the timeline id
+# because startup will fail if the timeline history file is not
+# retrievable from the archive but will not fail if we use 'current'
+# or 'latest'.
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+ has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = '2'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+$node_standby->start;
+
+# Sanity check that the timeline history file was retrieved
+my $log_contents = slurp_file($node_standby->logfile);
+ok( $log_contents =~ qr/restored log file "00000002.history" from archive/,
+ "00000002.history retrieved from the archives");
+ok ( -f $node_standby->data_dir . "/pg_wal/00000002.history",
+ "00000002.history exists in the standby's pg_wal directory");
+
+# Sanity check that the node is queryable
+my $result_standby =
+ $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+ $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+
+# Set up a cascade standby node to validate that there's no issues
+# since the WAL receiver will request all necessary timeline history
+# files from the standby node's WAL sender.
+my $node_cascade = PostgreSQL::Test::Cluster->new('cascade');
+$node_cascade->init_from_backup($node_primary_tli2, $backup_name,
+ standby => 1);
+$node_cascade->enable_streaming($node_standby);
+$node_cascade->start;
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade);
+
+# Sanity check that the cascade standby node came up and is queryable
+my $result_cascade =
+ $node_cascade->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_cascade, qq(8), 'check that the node received the streamed WAL data');
+
+$node_standby->teardown_node;
+$node_cascade->teardown_node;
+
+done_testing();
--
2.24.3 (Apple Git-128)
v1-0001-Allow-recovery-to-proceed-when-initial-timeline-hist.patchapplication/octet-stream; name=v1-0001-Allow-recovery-to-proceed-when-initial-timeline-hist.patchDownload
From c87f9d7eaf2720944d625cb529b6cc355fec7771 Mon Sep 17 00:00:00 2001
From: Jimmy Yih <jyih@vmware.com>
Date: Wed, 9 Aug 2023 16:50:04 -0700
Subject: [PATCH] Allow recovery to proceed when initial timeline history file
is missing
For WAL archive recovery, setting recovery_target_timeline to 'current'
or 'latest' will only output a WARNING if the initial timeline history
file cannot be found/retrieved as it proceeds with recovery without
any issues. However, setting the recovery_target_timeline explicitly
to the current control file's timeline id (similar to what 'current'
does) will result in a FATAL when the initial timeline history file
cannot be found/retrieved. Since 'current' and 'latest' work fine, we
should also not FATAL when the timeline history file cannot be
found/retrieved for when recovery_target_timeline is explicitly set to
the same timeline id from the control file.
For WAL streaming, the standby's WAL receiver will FATAL and loop on
trying to retrieve the initial timeline history from the primary (or
standby in the case of cascading). However, it doesn't seem to be
required if the above WAL archive recovery claims are valid. To align
with the same logic, we should also not FATAL when the WAL receiver
cannot find/retrieve the initial timeline history file.
---
src/backend/access/transam/xlogrecovery.c | 2 +-
.../libpqwalreceiver/libpqwalreceiver.c | 26 +++-
src/backend/replication/walreceiver.c | 12 +-
src/include/replication/walreceiver.h | 7 +-
...andbys_with_no_initial_timeline_history.pl | 136 ++++++++++++++++++
5 files changed, 171 insertions(+), 12 deletions(-)
create mode 100644 src/test/recovery/t/038_standbys_with_no_initial_timeline_history.pl
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index becc2bda62..b223a82cfb 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1124,7 +1124,7 @@ validateRecoveryParameters(void)
TimeLineID rtli = recoveryTargetTLIRequested;
/* Timeline 1 does not have a history file, all else should */
- if (rtli != 1 && !existsTimeLineHistory(rtli))
+ if (rtli != 1 && !existsTimeLineHistory(rtli) && rtli != recoveryTargetTLI)
ereport(FATAL,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("recovery target timeline %u does not exist",
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 60d5c1fc40..ba12ed3911 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -61,7 +61,7 @@ static char *libpqrcv_identify_system(WalReceiverConn *conn,
static int libpqrcv_server_version(WalReceiverConn *conn);
static void libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
TimeLineID tli, char **filename,
- char **content, int *len);
+ char **content, int *len, bool missing_ok);
static bool libpqrcv_startstreaming(WalReceiverConn *conn,
const WalRcvStreamOptions *options);
static void libpqrcv_endstreaming(WalReceiverConn *conn,
@@ -603,7 +603,7 @@ libpqrcv_endstreaming(WalReceiverConn *conn, TimeLineID *next_tli)
static void
libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
TimeLineID tli, char **filename,
- char **content, int *len)
+ char **content, int *len, bool missing_ok)
{
PGresult *res;
char cmd[64];
@@ -618,11 +618,23 @@ libpqrcv_readtimelinehistoryfile(WalReceiverConn *conn,
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
PQclear(res);
- ereport(ERROR,
- (errcode(ERRCODE_PROTOCOL_VIOLATION),
- errmsg("could not receive timeline history file from "
- "the primary server: %s",
- pchomp(PQerrorMessage(conn->streamConn)))));
+
+ if (missing_ok)
+ {
+ *filename = NULL;
+ *content = NULL;
+ ereport(WARNING,
+ (errmsg("could not receive timeline history file from "
+ "the primary server: %s",
+ pchomp(PQerrorMessage(conn->streamConn)))));
+ return;
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("could not receive timeline history file from "
+ "the primary server: %s",
+ pchomp(PQerrorMessage(conn->streamConn)))));
}
if (PQnfields(res) != 2 || PQntuples(res) != 1)
{
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index feff709435..26368fe4f3 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -754,12 +754,22 @@ WalRcvFetchTimeLineHistoryFiles(TimeLineID first, TimeLineID last)
char *content;
int len;
char expectedfname[MAXFNAMELEN];
+ bool missing_ok;
ereport(LOG,
(errmsg("fetching timeline history file for timeline %u from primary server",
tli)));
- walrcv_readtimelinehistoryfile(wrconn, tli, &fname, &content, &len);
+ missing_ok = (tli == first);
+ walrcv_readtimelinehistoryfile(wrconn, tli, &fname, &content, &len, missing_ok);
+
+ /*
+ * If the requested timeline id is the first one, we can overlook
+ * the timeline history file fetch error since it's not required
+ * to start the standby.
+ */
+ if (missing_ok && fname == NULL && content == NULL)
+ continue;
/*
* Check that the filename on the primary matches what we
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 281626fa6f..197c82bb53 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -298,7 +298,8 @@ typedef void (*walrcv_readtimelinehistoryfile_fn) (WalReceiverConn *conn,
TimeLineID tli,
char **filename,
char **content,
- int *size);
+ int *size,
+ bool missing_ok);
/*
* walrcv_startstreaming_fn
@@ -419,8 +420,8 @@ extern PGDLLIMPORT WalReceiverFunctionsType *WalReceiverFunctions;
WalReceiverFunctions->walrcv_identify_system(conn, primary_tli)
#define walrcv_server_version(conn) \
WalReceiverFunctions->walrcv_server_version(conn)
-#define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size) \
- WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size)
+#define walrcv_readtimelinehistoryfile(conn, tli, filename, content, size, missing_ok) \
+ WalReceiverFunctions->walrcv_readtimelinehistoryfile(conn, tli, filename, content, size, missing_ok)
#define walrcv_startstreaming(conn, options) \
WalReceiverFunctions->walrcv_startstreaming(conn, options)
#define walrcv_endstreaming(conn, next_tli) \
diff --git a/src/test/recovery/t/038_standbys_with_no_initial_timeline_history.pl b/src/test/recovery/t/038_standbys_with_no_initial_timeline_history.pl
new file mode 100644
index 0000000000..9f2b33ed18
--- /dev/null
+++ b/src/test/recovery/t/038_standbys_with_no_initial_timeline_history.pl
@@ -0,0 +1,136 @@
+# Test that setting up and starting WAL archiving on an
+# already-promoted node will result in the archival of its current
+# timeline history file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary; it's not needed anymore
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to timeline 2
+my $psql_out = '';
+$node_primary_tli2->psql(
+ 'postgres',
+ "SELECT pg_promote(wait_seconds => 300);",
+ stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file was not archived after
+# enabling WAL archiving since timeline history files are only
+# archived at the moment of switching timelines and not any time
+# after.
+my $primary_tli2_archive = $node_primary_tli2->archive_dir;
+my $primary_tli2_datadir = $node_primary_tli2->data_dir;
+ok(-f "$primary_tli2_datadir/pg_wal/00000002.history",
+ 'timeline 2 history file was created');
+ok(! -f "$primary_tli2_datadir/pg_wal/archive_status/00000002.history.ready",
+ 'timeline 2 history file was not marked for WAL archiving');
+ok(! -f "$primary_tli2_datadir/pg_wal/archive_status/00000002.history.done",
+ 'timeline 2 history file was not archived archived');
+ok(! -f "$primary_tli2_archive/00000002.history",
+ 'timeline 2 history file does not exist in the archive');
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal will
+# be empty and restore will retrieve the necessary WAL and timeline
+# history file(s) from the archive.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a;");
+
+# Create a restore point to later use as the recovery_target_name
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_create_restore_point('$recovery_name');");
+
+# Find the next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+ "SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make the WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal();');
+
+# Wait until the WAL segment has been archived
+my $archive_wait_query =
+ "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+ or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will start
+# off on timeline 2 according to the control file and will finish
+# recovery onto the same timeline by explicitly setting
+# recovery_target_timeline to '2'. We explicitly set the target
+# timeline to show that it doesn't require the timeline history file
+# and works the same as if we used 'current' or 'latest'.
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+ has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = '2'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+$node_standby->start;
+
+# Check that the timeline history file was not retrieved
+ok ( ! -f $node_standby->data_dir . "/pg_wal/00000002.history",
+ "00000002.history does not exist in the standby's pg_wal directory");
+
+# Sanity check that the node is queryable
+my $result_standby =
+ $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+ $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+
+# Set up a cascade standby node to validate that there's no issues
+# since the WAL receiver will request all necessary timeline history
+# files from the standby node's WAL sender.
+my $node_cascade = PostgreSQL::Test::Cluster->new('cascade');
+$node_cascade->init_from_backup($node_primary_tli2, $backup_name,
+ standby => 1);
+$node_cascade->enable_streaming($node_standby);
+$node_cascade->start;
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade);
+
+# Sanity check that the cascade standby node came up and is queryable
+my $result_cascade =
+ $node_cascade->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_cascade, qq(8), 'check that the node received the streamed WAL data');
+
+$node_standby->teardown_node;
+$node_cascade->teardown_node;
+
+done_testing();
--
2.24.3 (Apple Git-128)
On Mon, Aug 28, 2023 at 8:59 PM Jimmy Yih <jyih@vmware.com> wrote:
Thanks for the insightful response! I have attached an updated patch
that moves the proposed logic to the end of StartupXLOG where it seems
more correct to do this. It also helps with backporting (if it's
needed) since the archiver process only has access to shared memory
starting from Postgres 14.
Hmm. Do I understand correctly that the two patches you attached are
alternatives to each other, i.e. we need one or the other to fix the
issue, but not both?
It seems to me that trying to fetch a timeline history file and then
ignoring any error has got to be wrong. Either the file is needed or
it isn't. If it's needed, then failing to fetch it is a problem. If
it's not needed, there's no reason to try fetching it in the first
place. So I feel like we could either try to archive the file at the
end of recovery, as you propose in
v2-0001-Archive-current-timeline-history-file-after-recovery.patch.
Alternatively, we could try to find a way not to request the file in
the first place, if it's not required. But
v1-0001-Allow-recovery-to-proceed-when-initial-timeline-hist.patch
doesn't seem good to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, 29 Aug 2023 at 06:29, Jimmy Yih <jyih@vmware.com> wrote:
Thanks for the insightful response! I have attached an updated patch
that moves the proposed logic to the end of StartupXLOG where it seems
more correct to do this. It also helps with backporting (if it's
needed) since the archiver process only has access to shared memory
starting from Postgres 14.Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
A. The OP suggests archiving the timeline history file for the current
timeline every time the archiver starts. However, I don't think we
want to keep archiving the same file over and over. (Granted, we're
not always perfect at avoiding that..)With the updated proposed patch, we'll be checking if the current
timeline history file needs to be archived at the end of StartupXLOG
if archiving is enabled. If it detects that a .ready or .done file
already exists, then it won't do anything (which will be the common
case). I agree though that this may be an excessive check since it'll
be a no-op the majority of the time. However, it shouldn't execute
often and seems like a quick safe preventive measure. Could you give
more details on why this would be too cumbersome?Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
B. Given that the steps valid, I concur to what is described in the
test script provided: standbys don't really need that history file
for the initial TLI (though I have yet to fully verify this). If the
walreceiver just overlooks a fetch error for this file, the standby
can successfully start. (Just skipping the first history file seems
to work, but it feels a tad aggressive to me.)This was my initial thought as well but I wasn't sure if it was okay
to overlook the fetch error. Initial testing and brainstorming seems
to show that it's okay. I think the main bad thing is that these new
standbys will not have their initial timeline history files which can
be useful for administration. I've attached a patch that attempts this
approach if we want to switch to this approach as the solution. The
patch contains an updated TAP test as well to better showcase the
issue and fix.Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
C. If those steps aren't valid, we might want to add a note stating
that -X none basebackups do need the timeline history file for the
initial TLI.The difficult thing about only documenting this is that it forces the
user to manually store and track the timeline history files. It can be
a bit cumbersome for WAL archiving users to recognize this scenario
when they're just trying to optimize their basebackups by using
-Xnone. But then again -Xnone does seem like it's designed for
advanced users so this might be okay.Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
And don't forget to enable archive mode before the latest timeline
switch if any.This might not be reasonable since a user could've been using
streaming replication and doing failover/failbacks as part of general
high availability to manage their Postgres without knowing they were
going to enable WAL archiving later on. The user would need to
configure archiving and force a failover which may not be
straightforward.
I have changed the status of the patch to "Waiting on Author" as
Robert's suggestions have not yet been addressed. Feel free to address
the suggestions and update the status accordingly.
Regards,
Vignesh
On Thu, 11 Jan 2024 at 20:38, vignesh C <vignesh21@gmail.com> wrote:
On Tue, 29 Aug 2023 at 06:29, Jimmy Yih <jyih@vmware.com> wrote:
Thanks for the insightful response! I have attached an updated patch
that moves the proposed logic to the end of StartupXLOG where it seems
more correct to do this. It also helps with backporting (if it's
needed) since the archiver process only has access to shared memory
starting from Postgres 14.Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
A. The OP suggests archiving the timeline history file for the current
timeline every time the archiver starts. However, I don't think we
want to keep archiving the same file over and over. (Granted, we're
not always perfect at avoiding that..)With the updated proposed patch, we'll be checking if the current
timeline history file needs to be archived at the end of StartupXLOG
if archiving is enabled. If it detects that a .ready or .done file
already exists, then it won't do anything (which will be the common
case). I agree though that this may be an excessive check since it'll
be a no-op the majority of the time. However, it shouldn't execute
often and seems like a quick safe preventive measure. Could you give
more details on why this would be too cumbersome?Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
B. Given that the steps valid, I concur to what is described in the
test script provided: standbys don't really need that history file
for the initial TLI (though I have yet to fully verify this). If the
walreceiver just overlooks a fetch error for this file, the standby
can successfully start. (Just skipping the first history file seems
to work, but it feels a tad aggressive to me.)This was my initial thought as well but I wasn't sure if it was okay
to overlook the fetch error. Initial testing and brainstorming seems
to show that it's okay. I think the main bad thing is that these new
standbys will not have their initial timeline history files which can
be useful for administration. I've attached a patch that attempts this
approach if we want to switch to this approach as the solution. The
patch contains an updated TAP test as well to better showcase the
issue and fix.Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
C. If those steps aren't valid, we might want to add a note stating
that -X none basebackups do need the timeline history file for the
initial TLI.The difficult thing about only documenting this is that it forces the
user to manually store and track the timeline history files. It can be
a bit cumbersome for WAL archiving users to recognize this scenario
when they're just trying to optimize their basebackups by using
-Xnone. But then again -Xnone does seem like it's designed for
advanced users so this might be okay.Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
And don't forget to enable archive mode before the latest timeline
switch if any.This might not be reasonable since a user could've been using
streaming replication and doing failover/failbacks as part of general
high availability to manage their Postgres without knowing they were
going to enable WAL archiving later on. The user would need to
configure archiving and force a failover which may not be
straightforward.I have changed the status of the patch to "Waiting on Author" as
Robert's suggestions have not yet been addressed. Feel free to address
the suggestions and update the status accordingly.
The patch which you submitted has been awaiting your attention for
quite some time now. As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible. Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.
Regards,
Vignesh