TAP test for recovery_end_command

Started by Amul Sulover 4 years ago17 messages

sulamul@gmail.com

over 4 years ago

1 attachment(s)

Hi,

The attached patch adds a small test for recovery_end_command execution.

Currently, patch tests execution of recovery_end_command by creating
dummy file, I am not wedded only to this approach, other suggestions
also welcome.

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.

Thanks to my colleague Neha Sharma for confirming the test execution on Windows.

Regards,
Amul

Attachments:

TAP-test_recovery_end_command.patchapplication/x-patch; name=TAP-test_recovery_end_command.patchDownload

diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index ce60159f036..f90c698ba47 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 3;
+use Test::More tests => 4;
 use File::Copy;
 
 # Initialize primary node, doing archives
@@ -65,13 +65,23 @@ $node_standby->promote;
 my $node_standby2 = PostgresNode->new('standby2');
 $node_standby2->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
+my $node_standby2_data = $node_standby2->data_dir;
+
+# Also, test recovery_end_command by creating empty file.
+my $recovery_end_command_file = "$node_standby2_data/recovery_end_command.done";
+$node_standby2->append_conf('postgresql.conf',
+	"recovery_end_command='echo recovery_ended > $recovery_end_command_file'");
+
 $node_standby2->start;
 
 # Now promote standby2, and check that temporary files specifically
 # generated during archive recovery are removed by the end of recovery.
 $node_standby2->promote;
-my $node_standby2_data = $node_standby2->data_dir;
 ok( !-f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
 	"RECOVERYHISTORY removed after promotion");
 ok( !-f "$node_standby2_data/pg_wal/RECOVERYXLOG",
 	"RECOVERYXLOG removed after promotion");
+
+# Also, check recovery_end_command execution.
+ok(-f "$recovery_end_command_file",
+	'recovery_end_command executed after promotion');

Euler Taveira

euler@eulerto.com

over 4 years ago

In reply to: Amul Sul (#1)

Re: TAP test for recovery_end_command

On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:

The attached patch adds a small test for recovery_end_command execution.

Additional coverage is always a good thing.

Currently, patch tests execution of recovery_end_command by creating
dummy file, I am not wedded only to this approach, other suggestions
also welcome.

This test file is for archiving only. It seems 020_archive_status.pl is more
appropriate for testing this parameter.

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.

Setup a replica and stop it. It triggers a restartpoint during the shutdown.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Euler Taveira (#2)

Re: TAP test for recovery_end_command

On Sun, Sep 12, 2021 at 09:25:32PM -0300, Euler Taveira wrote:

On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.

Setup a replica and stop it. It triggers a restartpoint during the shutdown.

+$node_standby2->append_conf('postgresql.conf',
+       "recovery_end_command='echo recovery_ended > $recovery_end_command_file'");
This is not going to work on Windows.
--
Michael

Amul Sul

sulamul@gmail.com

over 4 years ago

In reply to: Michael Paquier (#3)

Re: TAP test for recovery_end_command

On Mon, Sep 13, 2021 at 8:44 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Sep 12, 2021 at 09:25:32PM -0300, Euler Taveira wrote:

On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.

Setup a replica and stop it. It triggers a restartpoint during the shutdown.
+$node_standby2->append_conf('postgresql.conf',
+       "recovery_end_command='echo recovery_ended > $recovery_end_command_file'");
This is not going to work on Windows.

Unfortunately, I don't have Windows, but my colleague Neha Sharma has
confirmed it works there.

Regards,
Amul

Amul Sul

sulamul@gmail.com

over 4 years ago

In reply to: Euler Taveira (#2)

1 attachment(s)

Re: TAP test for recovery_end_command

On Mon, Sep 13, 2021 at 5:56 AM Euler Taveira <euler@eulerto.com> wrote:

On Thu, Sep 9, 2021, at 8:18 AM, Amul Sul wrote:

The attached patch adds a small test for recovery_end_command execution.

Additional coverage is always a good thing.

Thanks for the confirmation.

Currently, patch tests execution of recovery_end_command by creating
dummy file, I am not wedded only to this approach, other suggestions
also welcome.

This test file is for archiving only. It seems 020_archive_status.pl is more
appropriate for testing this parameter.

Ok, moved to 020_archive_status.pl in the attached version.

Also, we don't have a good test for archive_cleanup_command as well, I
am not sure how we could test that which executes with every
restart-point.

Setup a replica and stop it. It triggers a restartpoint during the shutdown.

Yeah, added that test too. I triggered the restartpoint via a
CHECKPOINT command in the attached version.

Note that I haven't tested the current version on Windows, will
cross-check that tomorrow.

Regards,
Amul

Attachments:

TAP-test_recovery_end_command_v2.patchapplication/x-patch; name=TAP-test_recovery_end_command_v2.patchDownload

diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index cea65735a39..a89baa93401 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -8,7 +8,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 16;
+use Test::More tests => 20;
 use Config;
 
 my $primary = PostgresNode->new('primary');
@@ -234,3 +234,36 @@ ok( -f "$standby2_data/$segment_path_1_done"
 	  && -f "$standby2_data/$segment_path_2_done",
 	".done files created after archive success with archive_mode=always on standby"
 );
+
+# Test archive_cleanup_command and recovery_end_command execution
+my $standby3 = PostgresNode->new('standby3');
+$standby3->init_from_backup($primary, 'backup', has_restoring => 1);
+my $standby3_data = $standby3->data_dir;
+my $archive_cleanup_command_file = "$standby3_data/archive_cleanup_command.done";
+my $recovery_end_command_file = "$standby3_data/recovery_end_command.done";
+$standby3->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$standby3->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
+
+$standby3->start;
+$primary_lsn = $primary->lsn('write');
+$standby2->poll_query_until('postgres',
+	qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 }
+) or die "Timed out while waiting for xlog replay on standby2";
+
+# archive_cleanup_command executed with every restart points
+ok( !-f "$archive_cleanup_command_file",
+	'archive_cleanup_command not executed yet');
+# Checkpoint will trigger restart point on standby.
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+	'archive_cleanup_command executed on checkpoint');
+
+# recovery_end_command_file executed only on recovery end which can happen on
+# promotion.
+ok( !-f "$recovery_end_command_file",
+	'recovery_end_command not executed yet');
+$standby3->promote;
+ok(-f "$recovery_end_command_file",
+	'recovery_end_command executed after promotion');

Euler Taveira

euler@eulerto.com

over 4 years ago

In reply to: Amul Sul (#5)

Re: TAP test for recovery_end_command

On Mon, Sep 13, 2021, at 10:09 AM, Amul Sul wrote:

Yeah, added that test too. I triggered the restartpoint via a
CHECKPOINT command in the attached version.

+# archive_cleanup_command executed with every restart points
+ok( !-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command not executed yet');

Why are you including a test whose result is known? Fresh cluster does
not contain archive_cleanup_command.done or recovery_end_command.done.

+# Checkpoint will trigger restart point on standby.
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command executed on checkpoint');

Is this test reliable?

--
Euler Taveira
EDB https://www.enterprisedb.com/

Amul Sul

sulamul@gmail.com

over 4 years ago

In reply to: Euler Taveira (#6)

1 attachment(s)

Re: TAP test for recovery_end_command

On Mon, Sep 13, 2021 at 8:39 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 13, 2021, at 10:09 AM, Amul Sul wrote:

Yeah, added that test too. I triggered the restartpoint via a
CHECKPOINT command in the attached version.
+# archive_cleanup_command executed with every restart points
+ok( !-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command not executed yet');
Why are you including a test whose result is known? Fresh cluster does
not contain archive_cleanup_command.done or recovery_end_command.done.

Make sense, removed in the attached version.

+# Checkpoint will trigger restart point on standby.
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+ 'archive_cleanup_command executed on checkpoint');

Is this test reliable?

I think yes, it will be the same as you are suggesting a shutdown
which eventually performs a checkpoint. That checkpoint on the
recovery server performs the restart point instead. Still, there could
be a case where archive_cleanup_command execution could be skipped,
if there is no record replied after the previous restart point, which
could happen in case of shutdown as well. Hope that makes sense.

Regards,
Amul

Attachments:

TAP-test_recovery_end_command_v3.patchapplication/x-patch; name=TAP-test_recovery_end_command_v3.patchDownload

diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index cea65735a39..2277d09a306 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -8,7 +8,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 16;
+use Test::More tests => 18;
 use Config;
 
 my $primary = PostgresNode->new('primary');
@@ -234,3 +234,32 @@ ok( -f "$standby2_data/$segment_path_1_done"
 	  && -f "$standby2_data/$segment_path_2_done",
 	".done files created after archive success with archive_mode=always on standby"
 );
+
+# Test archive_cleanup_command and recovery_end_command execution
+my $standby3 = PostgresNode->new('standby3');
+$standby3->init_from_backup($primary, 'backup', has_restoring => 1);
+my $standby3_data = $standby3->data_dir;
+my $archive_cleanup_command_file = "$standby3_data/archive_cleanup_command.done";
+my $recovery_end_command_file = "$standby3_data/recovery_end_command.done";
+$standby3->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$standby3->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
+
+$standby3->start;
+$primary_lsn = $primary->lsn('write');
+$standby2->poll_query_until('postgres',
+	qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 }
+) or die "Timed out while waiting for xlog replay on standby2";
+
+# archive_cleanup_command executed with every restart point and that can be
+# trigger using checkpoint
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+	'archive_cleanup_command executed on checkpoint');
+
+# recovery_end_command_file executed only on recovery end which can happen on
+# promotion.
+$standby3->promote;
+ok(-f "$recovery_end_command_file",
+	'recovery_end_command executed after promotion');

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Amul Sul (#7)

Re: TAP test for recovery_end_command

Hi,

On 2021-09-14 10:34:09 +0530, Amul Sul wrote:

+# recovery_end_command_file executed only on recovery end which can happen on
+# promotion.
+$standby3->promote;
+ok(-f "$recovery_end_command_file",
+	'recovery_end_command executed after promotion');

It'd be good to test what happens when recovery_end_command fails...

Greetings,

Andres Freund

Amul Sul

sulamul@gmail.com

over 4 years ago

In reply to: Andres Freund (#8)

1 attachment(s)

Re: TAP test for recovery_end_command

On Wed, Oct 6, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-09-14 10:34:09 +0530, Amul Sul wrote:
+# recovery_end_command_file executed only on recovery end which can happen on
+# promotion.
+$standby3->promote;
+ok(-f "$recovery_end_command_file",
+     'recovery_end_command executed after promotion');
It'd be good to test what happens when recovery_end_command fails...

Thanks for the suggestion, added the same in the attached version.

Regards,
Amul

Attachments:

TAP-test_recovery_end_command_v4.patchapplication/octet-stream; name=TAP-test_recovery_end_command_v4.patchDownload

From d325bf8a50923cc90df3cfbab77c496617d11c2c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 6 Oct 2021 09:16:45 -0400
Subject: [PATCH] TAP-test_recovery_end_command_v4

---
 src/test/recovery/t/020_archive_status.pl | 52 ++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl
index cea65735a39..9ec5a2fff93 100644
--- a/src/test/recovery/t/020_archive_status.pl
+++ b/src/test/recovery/t/020_archive_status.pl
@@ -8,7 +8,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 16;
+use Test::More tests => 19;
 use Config;
 
 my $primary = PostgresNode->new('primary');
@@ -234,3 +234,53 @@ ok( -f "$standby2_data/$segment_path_1_done"
 	  && -f "$standby2_data/$segment_path_2_done",
 	".done files created after archive success with archive_mode=always on standby"
 );
+
+# Test archive_cleanup_command and recovery_end_command execution
+my $standby3 = PostgresNode->new('standby3');
+$standby3->init_from_backup($primary, 'backup', has_restoring => 1);
+my $standby3_data = $standby3->data_dir;
+my $archive_cleanup_command_file = "$standby3_data/archive_cleanup_command.done";
+my $recovery_end_command_file = "$standby3_data/recovery_end_command.done";
+$standby3->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$standby3->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
+
+$standby3->start;
+$primary_lsn = $primary->lsn('write');
+$standby3->poll_query_until('postgres',
+	qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 }
+) or die "Timed out while waiting for xlog replay on standby3";
+
+# archive_cleanup_command executed with every restart point and that can be
+# trigger using checkpoint
+$standby3->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+	'archive_cleanup_command executed on checkpoint');
+
+# recovery_end_command_file executed only on recovery end which can happen on
+# promotion.
+$standby3->promote;
+ok(-f "$recovery_end_command_file",
+	'recovery_end_command executed after promotion');
+
+# Test effect of failing archive_cleanup_command and recovery_end_command execution
+my $standby4 = PostgresNode->new('standby4');
+$standby4->init_from_backup($primary, 'backup', has_restoring => 1);
+$standby4->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo archive_cleanuped > /non_existing_dir/file'");
+$standby4->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'");
+
+$standby4->start;
+$primary_lsn = $primary->lsn('write');
+$standby4->poll_query_until('postgres',
+	qq{ SELECT pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '$primary_lsn') >= 0 }
+) or die "Timed out while waiting for xlog replay on standby4";
+
+# Failing to execute archive_cleanup_command and/or recovery_end_command does
+# not affect promotion.
+$standby4->safe_psql('postgres', q{CHECKPOINT});
+$standby4->promote;
+is($standby4->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f',
+	"standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command");
-- 
2.18.0

#10

Michael Paquier

michael@paquier.xyz

about 4 years ago

In reply to: Amul Sul (#9)

Re: TAP test for recovery_end_command

On Wed, Oct 06, 2021 at 06:49:10PM +0530, Amul Sul wrote:

Thanks for the suggestion, added the same in the attached version.

Hmm. The run-time of 020_archive_status.p bumps from 4.7s to 5.8s on
my laptop, so the change is noticeable. I agree that it would be good
to have more coverage for those commands, but I also think that we
should make things cheaper if we can, particularly knowing that those
commands just touch a file. This patch creates two stanbys for its
purpose, but do we need that much?

On top of that, 020_archive_status.pl does not look like the correct
place for this set of tests. 002_archiving.pl would be a better
candidate, where we already have two standbys that get promoted, so
you could have both the failure and success cases there. There should
be no need for extra wait phases either.

+$standby4->append_conf('postgresql.conf',
+   "recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'");
I am wondering how this finishes on Windows.
--
Michael

#11

Amul Sul

sulamul@gmail.com

about 4 years ago

In reply to: Michael Paquier (#10)

1 attachment(s)

Re: TAP test for recovery_end_command

On Wed, Oct 20, 2021 at 11:09 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Oct 06, 2021 at 06:49:10PM +0530, Amul Sul wrote:

Thanks for the suggestion, added the same in the attached version.

Hmm. The run-time of 020_archive_status.p bumps from 4.7s to 5.8s on
my laptop, so the change is noticeable. I agree that it would be good
to have more coverage for those commands, but I also think that we
should make things cheaper if we can, particularly knowing that those
commands just touch a file. This patch creates two stanbys for its
purpose, but do we need that much?

On top of that, 020_archive_status.pl does not look like the correct
place for this set of tests. 002_archiving.pl would be a better
candidate, where we already have two standbys that get promoted, so
you could have both the failure and success cases there. There should
be no need for extra wait phases either.

Understood, moved tests to 002_archiving.pl in the attached version.

+$standby4->append_conf('postgresql.conf',
+   "recovery_end_command = 'echo recovery_ended > /non_existing_dir/file'");
I am wondering how this finishes on Windows.

My colleague Neha Sharma has confirmed that the attached version is
passing on the Windows.

Regards,
Amul

Attachments:

v5-0001-TAP-test-for-recovery_end_command-and-archive_cle.patchapplication/x-patch; name=v5-0001-TAP-test-for-recovery_end_command-and-archive_cle.patchDownload

From a340aaaf9298e6adfd322afb82f825f296650375 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Mon, 25 Oct 2021 03:38:25 -0400
Subject: [PATCH v5] TAP test for recovery_end_command and
 archive_cleanup_command

---
 src/test/recovery/t/002_archiving.pl | 48 +++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index 24852c97fda..de7b10c8f85 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 3;
+use Test::More tests => 6;
 use File::Copy;
 
 # Initialize primary node, doing archives
@@ -28,11 +28,22 @@ $node_standby->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
 	"wal_retrieve_retry_interval = '100ms'");
+
+# Set archive_cleanup_command and recovery_end_command
+my $data_dir = $node_standby->data_dir;
+my $archive_cleanup_command_file = "$data_dir/archive_cleanup_command.done";
+my $recovery_end_command_file = "$data_dir/recovery_end_command.done";
+$node_standby->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$node_standby->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
+
 $node_standby->start;
 
 # Create some content on primary
 $node_primary->safe_psql('postgres',
 	"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
+$node_primary->safe_psql('postgres', "CHECKPOINT");
 my $current_lsn =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
 
@@ -53,18 +64,34 @@ my $result =
   $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 is($result, qq(1000), 'check content from archives');
 
-# Check the presence of temporary files specifically generated during
-# archive recovery.  To ensure the presence of the temporary history
-# file, switch to a timeline large enough to allow a standby to recover
-# a history file from an archive.  As this requires at least two timeline
-# switches, promote the existing standby first.  Then create a second
-# standby based on the promoted one.  Finally, the second standby is
-# promoted.
+# archive_cleanup_command executed with every restart point and that can be
+# trigger using checkpoint
+$node_standby->safe_psql('postgres', q{CHECKPOINT});
+ok(-f "$archive_cleanup_command_file",
+	'archive_cleanup_command executed on checkpoint');
+
+# Check recovery_end_command executed on recovery end which on promotion and the
+# presence of temporary files specifically generated during archive recovery.
+# To ensure the presence of the temporary history file, switch to a timeline
+# large enough to allow a standby to recover a history file from an archive.  As
+# this requires at least two timeline switches, promote the existing standby
+# first.  Then create a second standby based on the promoted one.  Finally, the
+# second standby is promoted.
 $node_standby->promote;
+ok(-f "$recovery_end_command_file",
+	'recovery_end_command executed after promotion');
 
 my $node_standby2 = PostgreSQL::Test::Cluster->new('standby2');
 $node_standby2->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
+
+# Check effect of failing archive_cleanup_command and recovery_end_command
+# execution on the promotion by setting wrong command.
+$node_standby2->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
+$node_standby2->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
+
 $node_standby2->start;
 
 # Now promote standby2, and check that temporary files specifically
@@ -75,3 +102,8 @@ ok( !-f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
 	"RECOVERYHISTORY removed after promotion");
 ok( !-f "$node_standby2_data/pg_wal/RECOVERYXLOG",
 	"RECOVERYXLOG removed after promotion");
+
+# Failing to execute archive_cleanup_command and/or recovery_end_command does
+# not affect promotion.
+is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f',
+	"standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command");
-- 
2.18.0

#12

Michael Paquier

michael@paquier.xyz

about 4 years ago

In reply to: Amul Sul (#11)

1 attachment(s)

Re: TAP test for recovery_end_command

On Mon, Oct 25, 2021 at 02:42:28PM +0530, Amul Sul wrote:

Understood, moved tests to 002_archiving.pl in the attached version.

Thanks for the new patch. I have reviewed its contents, and there
were a couple of things that caught my attention while putting my
hands on it.

+$node_standby->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$node_standby->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
This can be formatted with a single append_conf() call and qq() to
have the correct set of quotes.

+$node_primary->safe_psql('postgres', "CHECKPOINT");
my $current_lsn =
$node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
This had better say that the checkpoint is necessary because we need
one before switching to a new segment on the primary, as much as the
checkpoint on the first standby is needed to trigger the command whose
execution is checked.

+$node_standby2->append_conf('postgresql.conf',
+	"archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
+$node_standby2->append_conf('postgresql.conf',
+	"recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
[...]
+# Failing to execute archive_cleanup_command and/or recovery_end_command does
+# not affect promotion.
+is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f',
+	"standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command");

This SQL test is mostly useless IMO, as the promote() call done above
ensures that this state is reached properly, and the same thing could
be with the removals of RECOVERYHISTORY and RECOVERYXLOG. I think
that it would be better to check directly if the commands are run or
not. This is simple to test: look at the logs from a position just
before the promotion, slurp the log file of $standby2 from this
position, and finally compare its contents with a regex of your
choice. I have chosen a simple "qr/WARNING:.*recovery_end_command/s"
for the purpose of this test. Having a test for
archive_cleanup_command here would be nice, but that would be much
more costly than the end-of-recovery command, so I have left that
out. Perhaps we could just append it in the conf as a dummy, as you
did, though, but its execution is not deterministic in this test so we
are better without for now IMO.

perltidy was also complaining a bit, this is fixed as of the attached.
I have checked things on my own Windows dev box, while on it.
--
Michael

Attachments:

v6-0001-TAP-test-for-recovery_end_command-and-archive_cle.patchtext/x-diff; charset=us-asciiDownload

From 01629266d79bca20b3b653ce8c59062ef7187515 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 27 Oct 2021 12:59:13 +0900
Subject: [PATCH v6] TAP test for recovery_end_command and
 archive_cleanup_command

---
 src/test/recovery/t/002_archiving.pl | 47 +++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index 24852c97fd..7dc2f4e4c8 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 3;
+use Test::More tests => 7;
 use File::Copy;
 
 # Initialize primary node, doing archives
@@ -28,6 +28,17 @@ $node_standby->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
 	"wal_retrieve_retry_interval = '100ms'");
+
+# Set archive_cleanup_command and recovery_end_command, checking their
+# execution by the backend with dummy commands.
+my $data_dir                     = $node_standby->data_dir;
+my $archive_cleanup_command_file = "$data_dir/archive_cleanup_command.done";
+my $recovery_end_command_file    = "$data_dir/recovery_end_command.done";
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file'
+recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file'
+));
 $node_standby->start;
 
 # Create some content on primary
@@ -36,6 +47,10 @@ $node_primary->safe_psql('postgres',
 my $current_lsn =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
 
+# Note the presence of this checkpoint for the archive_cleanup_command
+# check done below, before switching to a new segment.
+$node_primary->safe_psql('postgres', "CHECKPOINT");
+
 # Force archiving of WAL file to make it present on primary
 $node_primary->safe_psql('postgres', "SELECT pg_switch_wal()");
 
@@ -53,6 +68,14 @@ my $result =
   $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 is($result, qq(1000), 'check content from archives');
 
+# archive_cleanup_command is executed after generating a restart point,
+# with a checkpoint.
+$node_standby->safe_psql('postgres', q{CHECKPOINT});
+ok(-f $archive_cleanup_command_file,
+	'archive_cleanup_command executed on checkpoint');
+ok(!-f $recovery_end_command_file,
+	'recovery_end_command executed after promotion');
+
 # Check the presence of temporary files specifically generated during
 # archive recovery.  To ensure the presence of the temporary history
 # file, switch to a timeline large enough to allow a standby to recover
@@ -62,11 +85,26 @@ is($result, qq(1000), 'check content from archives');
 # promoted.
 $node_standby->promote;
 
+# recovery_end_command should have been triggered on promotion.
+ok(-f $recovery_end_command_file,
+	'recovery_end_command executed after promotion');
+
 my $node_standby2 = PostgreSQL::Test::Cluster->new('standby2');
 $node_standby2->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
+
+# Check that failures when executing and recovery_end_command at
+# promotion has no effect, but its failure should be logged.
+$node_standby2->append_conf(
+	'postgresql.conf', qq(
+recovery_end_command = 'echo recovery_end_failed > $data_dir/missing_dir/xyz.file'
+));
+
 $node_standby2->start;
 
+# Save the log location, to see the failure of recovery_end_command.
+my $log_location = -s $node_standby2->logfile;
+
 # Now promote standby2, and check that temporary files specifically
 # generated during archive recovery are removed by the end of recovery.
 $node_standby2->promote;
@@ -75,3 +113,10 @@ ok( !-f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
 	"RECOVERYHISTORY removed after promotion");
 ok( !-f "$node_standby2_data/pg_wal/RECOVERYXLOG",
 	"RECOVERYXLOG removed after promotion");
+
+# Check the logs of the standby to see that the commands have failed.
+my $log_contents = slurp_file($node_standby2->logfile, $log_location);
+like(
+	$log_contents,
+	qr/WARNING:.*recovery_end_command/s,
+	"recovery_end_command failure detected in logs after promotion");
-- 
2.33.0

#13

Amul Sul

sulamul@gmail.com

about 4 years ago

In reply to: Michael Paquier (#12)

1 attachment(s)

Re: TAP test for recovery_end_command

On Wed, Oct 27, 2021 at 9:37 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Oct 25, 2021 at 02:42:28PM +0530, Amul Sul wrote:

Understood, moved tests to 002_archiving.pl in the attached version.

Thanks for the new patch. I have reviewed its contents, and there
were a couple of things that caught my attention while putting my
hands on it.
+$node_standby->append_conf('postgresql.conf',
+       "archive_cleanup_command = 'echo archive_cleanuped > $archive_cleanup_command_file'");
+$node_standby->append_conf('postgresql.conf',
+       "recovery_end_command = 'echo recovery_ended > $recovery_end_command_file'");
This can be formatted with a single append_conf() call and qq() to
have the correct set of quotes.
+$node_primary->safe_psql('postgres', "CHECKPOINT");
my $current_lsn =
$node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
This had better say that the checkpoint is necessary because we need
one before switching to a new segment on the primary, as much as the
checkpoint on the first standby is needed to trigger the command whose
execution is checked.
+$node_standby2->append_conf('postgresql.conf',
+       "archive_cleanup_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
+$node_standby2->append_conf('postgresql.conf',
+       "recovery_end_command = 'echo xyz > $data_dir/unexisting_dir/xyz.file'");
[...]
+# Failing to execute archive_cleanup_command and/or recovery_end_command does
+# not affect promotion.
+is($node_standby2->safe_psql( 'postgres', q{SELECT pg_is_in_recovery()}), 'f',
+       "standby promoted successfully despite incorrect archive_cleanup_command and recovery_end_command");
This SQL test is mostly useless IMO, as the promote() call done above
ensures that this state is reached properly, and the same thing could
be with the removals of RECOVERYHISTORY and RECOVERYXLOG. I think
that it would be better to check directly if the commands are run or
not. This is simple to test: look at the logs from a position just
before the promotion, slurp the log file of $standby2 from this
position, and finally compare its contents with a regex of your
choice. I have chosen a simple "qr/WARNING:.*recovery_end_command/s"
for the purpose of this test. Having a test for
archive_cleanup_command here would be nice, but that would be much
more costly than the end-of-recovery command, so I have left that
out. Perhaps we could just append it in the conf as a dummy, as you
did, though, but its execution is not deterministic in this test so we
are better without for now IMO.

perltidy was also complaining a bit, this is fixed as of the attached.
I have checked things on my own Windows dev box, while on it.

Thanks for the updated version. The patch is much better than before
except needing minor changes to the test description that testing
recovery_end_command_file before promotion, I did the same in the
attached version.

Regards,
Amul

Attachments:

v7-0001-TAP-test-for-recovery_end_command-and-archive_cle.patchapplication/octet-stream; name=v7-0001-TAP-test-for-recovery_end_command-and-archive_cle.patchDownload

From b077f3077c7549a3759721ccae812e815f968bb5 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 27 Oct 2021 00:57:57 -0400
Subject: [PATCH v7] TAP test for recovery_end_command and
 archive_cleanup_command

---
 src/test/recovery/t/002_archiving.pl | 47 +++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index 24852c97fda..f64a9c6ee46 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 3;
+use Test::More tests => 7;
 use File::Copy;
 
 # Initialize primary node, doing archives
@@ -28,6 +28,17 @@ $node_standby->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
 $node_standby->append_conf('postgresql.conf',
 	"wal_retrieve_retry_interval = '100ms'");
+
+# Set archive_cleanup_command and recovery_end_command, checking their
+# execution by the backend with dummy commands.
+my $data_dir                     = $node_standby->data_dir;
+my $archive_cleanup_command_file = "$data_dir/archive_cleanup_command.done";
+my $recovery_end_command_file    = "$data_dir/recovery_end_command.done";
+$node_standby->append_conf(
+	'postgresql.conf', qq(
+archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file'
+recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file'
+));
 $node_standby->start;
 
 # Create some content on primary
@@ -36,6 +47,10 @@ $node_primary->safe_psql('postgres',
 my $current_lsn =
   $node_primary->safe_psql('postgres', "SELECT pg_current_wal_lsn();");
 
+# Note the presence of this checkpoint for the archive_cleanup_command
+# check done below, before switching to a new segment.
+$node_primary->safe_psql('postgres', "CHECKPOINT");
+
 # Force archiving of WAL file to make it present on primary
 $node_primary->safe_psql('postgres', "SELECT pg_switch_wal()");
 
@@ -53,6 +68,14 @@ my $result =
   $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 is($result, qq(1000), 'check content from archives');
 
+# archive_cleanup_command is executed after generating a restart point,
+# with a checkpoint.
+$node_standby->safe_psql('postgres', q{CHECKPOINT});
+ok(-f $archive_cleanup_command_file,
+	'archive_cleanup_command executed on checkpoint');
+ok(!-f $recovery_end_command_file,
+	'recovery_end_command not executed yet');
+
 # Check the presence of temporary files specifically generated during
 # archive recovery.  To ensure the presence of the temporary history
 # file, switch to a timeline large enough to allow a standby to recover
@@ -62,11 +85,26 @@ is($result, qq(1000), 'check content from archives');
 # promoted.
 $node_standby->promote;
 
+# recovery_end_command should have been triggered on promotion.
+ok(-f $recovery_end_command_file,
+	'recovery_end_command executed after promotion');
+
 my $node_standby2 = PostgreSQL::Test::Cluster->new('standby2');
 $node_standby2->init_from_backup($node_primary, $backup_name,
 	has_restoring => 1);
+
+# Check that failures when executing and recovery_end_command at
+# promotion has no effect, but its failure should be logged.
+$node_standby2->append_conf(
+	'postgresql.conf', qq(
+recovery_end_command = 'echo recovery_end_failed > $data_dir/missing_dir/xyz.file'
+));
+
 $node_standby2->start;
 
+# Save the log location, to see the failure of recovery_end_command.
+my $log_location = -s $node_standby2->logfile;
+
 # Now promote standby2, and check that temporary files specifically
 # generated during archive recovery are removed by the end of recovery.
 $node_standby2->promote;
@@ -75,3 +113,10 @@ ok( !-f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
 	"RECOVERYHISTORY removed after promotion");
 ok( !-f "$node_standby2_data/pg_wal/RECOVERYXLOG",
 	"RECOVERYXLOG removed after promotion");
+
+# Check the logs of the standby to see that the commands have failed.
+my $log_contents = slurp_file($node_standby2->logfile, $log_location);
+like(
+	$log_contents,
+	qr/WARNING:.*recovery_end_command/s,
+	"recovery_end_command failure detected in logs after promotion");
-- 
2.18.0

#14

Michael Paquier

michael@paquier.xyz

about 4 years ago

In reply to: Amul Sul (#13)

Re: TAP test for recovery_end_command

On Wed, Oct 27, 2021 at 10:32:22AM +0530, Amul Sul wrote:

Thanks for the updated version. The patch is much better than before
except needing minor changes to the test description that testing
recovery_end_command_file before promotion, I did the same in the
attached version.

 ok(!-f $recovery_end_command_file,
-   'recovery_end_command executed after promotion');
+   'recovery_end_command not executed yet');
Indeed :p
--
Michael

#15

Michael Paquier

michael@paquier.xyz

about 4 years ago

In reply to: Michael Paquier (#14)

Re: TAP test for recovery_end_command

On Wed, Oct 27, 2021 at 02:20:50PM +0900, Michael Paquier wrote:

ok(!-f $recovery_end_command_file,
-   'recovery_end_command executed after promotion');
+   'recovery_end_command not executed yet');
Indeed :p

While looking at that this morning, I have noticed an extra bug. If
the path of the data folder included a space, the command would have
failed. I have to wonder if we should do something like
cp_history_files, but that did not seem necessary to me once we rely
on each command being executed from the root of the data folder.

Anyway, I am curious to see what the buildfarm thinks about all that,
particularly with Msys, so I have applied the patch. I am keeping an
eye on things, though.
--
Michael

#16

Amul Sul

sulamul@gmail.com

about 4 years ago

In reply to: Michael Paquier (#15)

Re: TAP test for recovery_end_command

On Thu, Oct 28, 2021 at 7:24 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Oct 27, 2021 at 02:20:50PM +0900, Michael Paquier wrote:
ok(!-f $recovery_end_command_file,
-   'recovery_end_command executed after promotion');
+   'recovery_end_command not executed yet');
Indeed :p
While looking at that this morning, I have noticed an extra bug. If
the path of the data folder included a space, the command would have
failed. I have to wonder if we should do something like
cp_history_files, but that did not seem necessary to me once we rely
on each command being executed from the root of the data folder.

Anyway, I am curious to see what the buildfarm thinks about all that,
particularly with Msys, so I have applied the patch. I am keeping an
eye on things, though.

Thanks a lot, Michael.

Regards,
Amul

#17

Michael Paquier

michael@paquier.xyz

about 4 years ago

In reply to: Amul Sul (#16)

Re: TAP test for recovery_end_command

On Thu, Oct 28, 2021 at 09:25:43AM +0530, Amul Sul wrote:

Thanks a lot, Michael.

So.. The buildfarm members running on Windows and running the
recovery tests are jacana (MinGW) and fairywren (Msys), both reporting
green. drongo (MSVC) skips those tests, but I have tested MSVC by
myself so I think that we are good here.
--
Michael