Should the archiver process always make sure that the timeline history files exist in the archive?

Started by Jimmy Yihover 2 years ago7 messages

jyih@vmware.com

over 2 years ago

1 attachment(s)

Hello pgsql-hackers,

While testing out some WAL archiving and PITR scenarios, it was observed that
enabling WAL archiving for the first time on a primary that was on a timeline
higher than 1 would not initially archive the timeline history file for the
timeline it was currently on. While this might be okay for most use cases, there
are scenarios where this leads to unexpected failures that seem to expose some
flaws in the logic.

Scenario 1:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Create a
standby with that backup that will be continuously restoring from the WAL
archives, the standby will not contain the timeline 2 history file. The standby
will operate normally but if you try to create a cascading standby off it using
streaming replication, the cascade standby's WAL receiver will continuously
FATAL trying to request the timeline 2 history file that the main standby does
not have.

Scenario 2:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Then try
to create a new node by doing PITR with recovery_target_timeline set to
'current' or 'latest' which will succeed. However, doing PITR with
recovery_target_timeline = '2' will fail since it is unable to find the timeline
2 history file in the WAL archives. This may be a bit contradicting since we
allow 'current' and 'latest' to recover but explicitly setting the
recovery_target_timeline to the control file's timeline id ends up with failure.

Attached is a patch containing two TAP tests that demonstrate the scenarios.

My questions are:
1. Why doesn't the archiver process try to archive timeline history files when
WAL archiving is first configured and/or continually check (maybe when the
archiver process gets started before the main loop)?
2. Why does explicitly setting the recovery_target_timeline to the control
file's timeline id not follow the same logic as recovery_target_timeline set
to 'current'?
3. Why does a cascaded standby require the timeline history file of its control
file's timeline id (startTLI) when the main replica is able to operate fine
without the timeline history file?

Note that my initial observations came from testing with pgBackRest (copying
pg_wal/ during backup is disabled by default) but using `pg_basebackup -Xnone`
reproduced the issues similarly and is what I present in the TAP tests. At the
moment, the only workaround I can think of is to manually run the
archive_command on the missing timeline history file(s).

Are these valid issues that should be looked into or are they expected? Scenario
2 seems like it could be easily fixed if we determine that the
recovery_target_timeline numeric value is equal to the control file's timeline
id (compare rtli and recoveryTargetTLI in validateRecoveryParameters()?) but I
wasn't sure if maybe the opposite was true where we should make 'current' and
'latest' require retrieving the timeline history files instead to help prevent
Scenario 1.

Regards,
Jimmy Yih

Attachments:

0001-TAP-tests-to-show-missing-timeline-history-issues.patchapplication/octet-stream; name=0001-TAP-tests-to-show-missing-timeline-history-issues.patchDownload

From 8e77a72089301886667f1761fd99c87f4df3f456 Mon Sep 17 00:00:00 2001
From: Jimmy Yih <jyih@vmware.com>
Date: Wed, 9 Aug 2023 16:50:04 -0700
Subject: [PATCH] TAP tests to show missing timeline history issues

While testing out some WAL archiving and PITR scenarios, it was
observed that enabling WAL archiving for the first time on a primary
that was on a timeline higher than 1 would not initially archive the
timeline history file for the timeline it was currently on. While this
might be okay for most use cases, there are scenarios where this leads
to unexpected failures that seem to expose some flaws in the logic.
This patch contains TAP tests that help demonstrate the issues.
---
 .../t/038_cascade_with_no_timeline_history.pl | 143 ++++++++++++++++++
 ...039_recovery_target_no_timeline_history.pl | 123 +++++++++++++++
 2 files changed, 266 insertions(+)
 create mode 100644 src/test/recovery/t/038_cascade_with_no_timeline_history.pl
 create mode 100644 src/test/recovery/t/039_recovery_target_no_timeline_history.pl

diff --git a/src/test/recovery/t/038_cascade_with_no_timeline_history.pl b/src/test/recovery/t/038_cascade_with_no_timeline_history.pl
new file mode 100644
index 0000000000..9f18d2dd7d
--- /dev/null
+++ b/src/test/recovery/t/038_cascade_with_no_timeline_history.pl
@@ -0,0 +1,143 @@
+# This test showcases a seemingly valid scenario where a primary on
+# timeline 2 has a standby which itself has a cascaded standby. The
+# main standby (created by a backup taken with -Xnone and recovers via
+# WAL archives) does not contain the timeline 2 history file and is
+# unable to serve it to the cascaded standby. The cascaded standby
+# will continuously FATAL trying to request the timeline 2 history
+# file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use File::Path qw(rmtree);
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to a new timeline
+my $psql_out = '';
+$node_primary_tli2->psql(
+	'postgres',
+	"SELECT pg_promote(wait_seconds => 300)",
+	stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node. The timeline 2 history file
+# will not be pushed to the archive.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file has not been
+# archived. Timeline history file archival only happens when the
+# timeline history file is created which only occurs in two areas:
+# 1. When a standby is configured for archiving (archive_mode and
+#    archive_command set) and is promoted. A timeline history file for
+#    the new timeline will be created and will be immediately marked
+#    as ready for archiving.
+# 2. When a standby is configured for archiving (archive_mode set to
+#    'always' and archive_command is set) and receives a timeline
+#    history file from the primary via streaming replication. The file
+#    will be marked as ready for archiving.
+#
+# Note: This seems to be the root cause of the failures that follow
+# because a lot of recovery logic seems to rely on the timeline
+# history files being retrievable. However, I'm not sure if this logic
+# is intentional or not.
+my $primary_archive = $node_primary_tli2->archive_dir;
+my $result_primary_tli2 =
+  $node_primary_tli2->safe_psql('postgres', "SELECT size IS NULL FROM pg_stat_file('$primary_archive/00000002.history', true)");
+is($result_primary_tli2, qq(t), 'see that the timeline 2 history file was not archived');
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal is empty.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a");
+
+# Create a restore point to later use as the recovery_target_name.
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+	"SELECT pg_create_restore_point('$recovery_name');");
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+my $archive_wait_query =
+  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+  or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will
+# recover onto the same timeline designated in the control file by
+# setting recovery_target_timeline to 'current'. The timeline 2
+# history file is not retrievable but seems to not be required. Note
+# that setting recovery_target_timeline to 'latest' would also create
+# the same scenario but using 'current' helps decrease the scope of
+# the problem.
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+	has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = 'current'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+$node_standby->start;
+
+# Sanity check that the node came up and is queryable
+my $result_standby =
+  $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+  $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+
+# Set up the cascade standby
+my $node_cascade = PostgreSQL::Test::Cluster->new('cascade');
+$node_cascade->init_from_backup($node_primary_tli2, $backup_name,
+	standby => 1);
+$node_cascade->enable_streaming($node_standby);
+# This will fail to start up because the WAL receiver continuously
+# FATALs out. The test will end here in failure.
+$node_cascade->start;
+
+# Wait for the replication to catch up
+$node_standby->wait_for_catchup($node_cascade);
+
+# Sanity check that the node came up and is queryable
+my $result_cascade =
+  $node_cascade->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_cascade, qq(8), 'check that the node received the streamed WAL data');
+
+$node_standby->teardown_node;
+$node_cascade->teardown_node;
+
+done_testing();
diff --git a/src/test/recovery/t/039_recovery_target_no_timeline_history.pl b/src/test/recovery/t/039_recovery_target_no_timeline_history.pl
new file mode 100644
index 0000000000..7682593bd0
--- /dev/null
+++ b/src/test/recovery/t/039_recovery_target_no_timeline_history.pl
@@ -0,0 +1,123 @@
+# Test that we can do a recovery when the timeline history file is
+# unavailable and the recovery_target_timeline requested is equal to
+# the timeline in the control file.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use File::Path qw(rmtree);
+
+$ENV{PGDATABASE} = 'postgres';
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+
+# Take a backup
+my $backup_name = 'my_backup_1';
+$node_primary->backup($backup_name);
+
+# Create a standby that will be promoted onto timeline 2
+my $node_primary_tli2 = PostgreSQL::Test::Cluster->new('primary_tli2');
+$node_primary_tli2->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_primary_tli2->start;
+
+# Stop and remove the primary
+$node_primary->teardown_node;
+
+# Promote the standby using "pg_promote", switching it to a new timeline
+my $psql_out = '';
+$node_primary_tli2->psql(
+	'postgres',
+	"SELECT pg_promote(wait_seconds => 300)",
+	stdout => \$psql_out);
+is($psql_out, 't', "promotion of standby with pg_promote");
+
+# Enable archiving on the promoted node. The timeline 2 history file
+# will not be pushed to the archive.
+$node_primary_tli2->enable_archiving;
+$node_primary_tli2->restart;
+
+# Check that the timeline 2 history file has not been
+# archived. Timeline history file archival only happens when the
+# timeline history file is created which only occurs in two areas:
+# 1. When a standby is configured for archiving (archive_mode and
+#    archive_command set) and is promoted. A timeline history file for
+#    the new timeline will be created and will be immediately marked
+#    as ready for archiving.
+# 2. When a standby is configured for archiving (archive_mode set to
+#    'always' and archive_command is set) and receives a timeline
+#    history file from the primary via streaming replication. The file
+#    will be marked as ready for archiving.
+#
+# Note: This seems to be the root cause of the failures that follow
+# because a lot of recovery logic seems to rely on the timeline
+# history files being retrievable. However, I'm not sure if this logic
+# is intentional or not.
+my $primary_archive = $node_primary_tli2->archive_dir;
+my $result_primary_tli2 =
+  $node_primary_tli2->safe_psql('postgres', "SELECT size IS NULL FROM pg_stat_file('$primary_archive/00000002.history', true)");
+is($result_primary_tli2, qq(t), 'see that the timeline 2 history file was not archived');
+
+# Take backup of node_primary_tli2 and use -Xnone so that pg_wal is empty.
+$backup_name = 'my_backup_2';
+$node_primary_tli2->backup($backup_name, backup_options => ['-Xnone']);
+
+# Create simple WAL that will be archived and restored
+$node_primary_tli2->safe_psql('postgres', "CREATE TABLE tab_int AS SELECT 8 AS a");
+
+# Create a restore point to later use as the recovery_target_name.
+my $recovery_name = "my_target";
+$node_primary_tli2->safe_psql('postgres',
+	"SELECT pg_create_restore_point('$recovery_name');");
+
+# Find next WAL segment to be archived
+my $walfile_to_be_archived = $node_primary_tli2->safe_psql('postgres',
+	"SELECT pg_walfile_name(pg_current_wal_lsn());");
+
+# Make WAL segment eligible for archival
+$node_primary_tli2->safe_psql('postgres', 'SELECT pg_switch_wal()');
+
+# Wait until the WAL segment has been archived.
+my $archive_wait_query =
+  "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver";
+$node_primary_tli2->poll_query_until('postgres', $archive_wait_query)
+  or die "Timed out while waiting for WAL segment to be archived";
+$node_primary_tli2->teardown_node;
+
+# Initialize a new standby node from the backup. This node will start
+# off on timeline 2 according to the control file and will finish
+# recovery onto the same timeline by explicitly setting
+# recovery_target_timeline to '2'. The timeline 2 history file is not
+# retrievable but is required. Shouldn't this scenario act the same as
+# setting recovery_target_timeline to 'current' which does not require
+# a timeline history file to be retrieved?
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary_tli2, $backup_name,
+	has_restoring => 1, standby => 0);
+$node_standby->append_conf('postgresql.conf', qq{
+recovery_target_timeline = '2'
+recovery_target_action = 'pause'
+recovery_target_name = 'my_target'
+archive_mode = 'off'
+primary_conninfo = ''
+});
+# This will fail to start up because the timeline 2 history file is
+# not retrievable from the WAL archive. The test will end here in
+# failure.
+$node_standby->start;
+
+# Sanity check that the node came up and is queryable
+my $result_standby =
+  $node_standby->safe_psql('postgres', "SELECT timeline_id FROM pg_control_checkpoint();");
+is($result_standby, qq(2), 'check that the node is on timeline 2');
+$result_standby =
+  $node_standby->safe_psql('postgres', "SELECT * FROM tab_int;");
+is($result_standby, qq(8), 'check that the node did archive recovery');
+$node_standby->teardown_node;
+
+done_testing();
-- 
2.24.3 (Apple Git-128)

Jimmy Yih