[PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

Started by Craig Ringerabout 9 years ago10 messages
#1Craig Ringer
craig@2ndquadrant.com
3 attachment(s)

Every test I write with the TAP framework for recovery seems to need
to wait for one node to catch up to another or examine replication
slot state. So: attached is a patch to add helper methods for these
tasks.

Also add a new pg_lsn.pm helper class to deal with PostgreSQL's
awkward representation of LSNs in perl. Made extra fun by our use of
Perl 5.8.8 with no new modules, so we can't rely on having 64-bit
integers. Provides sensible LSN comparisons. I'll be using this in
coming patches that have to make assertions about differences between
LSNs, and I think it's sensible to have anyway. Incorporates tests for
the class.

Finally, add some coverage of physical replication slots to recovery tests.

Backpatch to 9.6 desirable, since we seem to be doing that for TAP
infrastructure.

These three are related enough, and all only touch the TAP framework,
so I've bundled them in a series.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0003-Expand-streaming-replication-tests-to-cover-hot-stan.patchtext/x-patch; charset=US-ASCII; name=0003-Expand-streaming-replication-tests-to-cover-hot-stan.patchDownload
From f01790bb3f47f7fabf64328f8c9a01e8f3cf837c Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Wed, 9 Nov 2016 13:44:04 +0800
Subject: [PATCH 3/3] Expand streaming replication tests to cover hot standby
 feedback and physical replication slots

---
 src/test/recovery/t/001_stream_rep.pl | 105 +++++++++++++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 5ce69bb..ef29892 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 22;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -58,3 +58,106 @@ is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
 	3, 'read-only queries on standby 1');
 is($node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
 	3, 'read-only queries on standby 2');
+
+diag "switching to physical replication slot";
+# Switch to using a physical replication slot. We can do this without a new
+# backup since physical slots can go backwards if needed. Do so on both
+# standbys. Since we're going to be testing things that affect the slot state,
+# also increase the standby feedback interval to ensure timely updates.
+my ($slotname_1, $slotname_2) = ('standby_1', 'standby_2');
+$node_master->append_conf('postgresql.conf', "max_replication_slots = 4\n");
+$node_master->restart;
+is($node_master->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_1');]), 0, 'physical slot created on master');
+$node_standby_1->append_conf('recovery.conf', "primary_slot_name = $slotname_1\n");
+$node_standby_1->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n");
+$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n");
+$node_standby_1->restart;
+is($node_standby_1->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_2');]), 0, 'physical slot created on intermediate replica');
+$node_standby_2->append_conf('recovery.conf', "primary_slot_name = $slotname_2\n");
+$node_standby_2->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n");
+$node_standby_2->restart;
+
+sub get_slot_xmins
+{
+	my ($node, $slotname) = @_;
+	my $slotinfo = $node->slot($slotname);
+	return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+# There's no hot standby feedback and there are no logical slots on either peer
+# so xmin and catalog_xmin should be null on both slots.
+my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
+is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');
+
+# Replication still works?
+$node_master->safe_psql('postgres', 'CREATE TABLE replayed(val integer);');
+
+sub replay_check
+{
+	my $newval = $node_master->safe_psql('postgres', 'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val');
+	$node_master->wait_for_catchup($node_standby_1);
+	$node_standby_1->wait_for_catchup($node_standby_2);
+	$node_standby_1->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval])
+		or die "standby_1 didn't replay master value $newval";
+	$node_standby_2->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval])
+		or die "standby_2 didn't replay standby_1 value $newval";
+}
+
+replay_check();
+
+diag "enabling hot_standby_feedback";
+# Enable hs_feedback. The slot should gain an xmin. We set the status interval
+# so we'll see the results promptly.
+$node_standby_1->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;');
+$node_standby_1->reload;
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;');
+$node_standby_2->reload;
+replay_check();
+sleep(2);
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+isnt($xmin, '', 'non-cascaded slot xmin non-null with hs feedback');
+is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+isnt($xmin, '', 'cascaded slot xmin non-null with hs feedback');
+is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback');
+
+diag "doing some work to advance xmin";
+for my $i (10000..11000) {
+	$node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES ($i);]);
+}
+$node_master->safe_psql('postgres', 'VACUUM;');
+$node_master->safe_psql('postgres', 'CHECKPOINT;');
+
+my ($xmin2, $catalog_xmin2) = get_slot_xmins($node_master, $slotname_1);
+diag "new xmin $xmin2, old xmin $xmin";
+isnt($xmin2, $xmin, 'non-cascaded slot xmin with hs feedback has changed');
+is($catalog_xmin2, '', 'non-cascaded slot xmin still null with hs_feedback unchanged');
+
+($xmin2, $catalog_xmin2) = get_slot_xmins($node_standby_1, $slotname_2);
+diag "new xmin $xmin2, old xmin $xmin";
+isnt($xmin2, $xmin, 'cascaded slot xmin with hs feedback has changed');
+is($catalog_xmin2, '', 'cascaded slot xmin still null with hs_feedback unchanged');
+
+diag "disabling hot_standby_feedback";
+# Disable hs_feedback. Xmin should be cleared.
+$node_standby_1->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;');
+$node_standby_1->reload;
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;');
+$node_standby_2->reload;
+replay_check();
+sleep(2);
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+is($xmin, '', 'non-cascaded slot xmin null with hs feedback reset');
+is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback reset');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin null with hs feedback reset');
+is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback reset');
-- 
2.5.5

0002-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patchtext/x-patch; charset=US-ASCII; name=0002-Create-new-pg_lsn-class-to-deal-with-awkward-LSNs-in.patchDownload
From 4044ad6a8b85bdf4c3bfab30ae9d5965ad5dfadb Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Mon, 14 Nov 2016 12:19:35 +0800
Subject: [PATCH 2/3] Create new pg_lsn class to deal with awkward LSNs in
 tests

---
 src/test/perl/Makefile        |   3 +
 src/test/perl/pg_lsn.pm       | 144 ++++++++++++++++++++++++++++++++++++++++++
 src/test/perl/t/001_load.pl   |   9 +++
 src/test/perl/t/002_pg_lsn.pl |  68 ++++++++++++++++++++
 4 files changed, 224 insertions(+)
 create mode 100644 src/test/perl/pg_lsn.pm
 create mode 100644 src/test/perl/t/001_load.pl
 create mode 100644 src/test/perl/t/002_pg_lsn.pl

diff --git a/src/test/perl/Makefile b/src/test/perl/Makefile
index 8ab60fc..cdc38f4 100644
--- a/src/test/perl/Makefile
+++ b/src/test/perl/Makefile
@@ -15,6 +15,9 @@ include $(top_builddir)/src/Makefile.global
 
 ifeq ($(enable_tap_tests),yes)
 
+check:
+	$(prove_check)
+
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)'
 
diff --git a/src/test/perl/pg_lsn.pm b/src/test/perl/pg_lsn.pm
new file mode 100644
index 0000000..777b3df
--- /dev/null
+++ b/src/test/perl/pg_lsn.pm
@@ -0,0 +1,144 @@
+package pg_lsn;
+
+use strict;
+use warnings;
+
+our (@ISA, @EXPORT_OK);
+BEGIN {
+	require Exporter;
+	@ISA = qw(Exporter);
+	@EXPORT_OK = qw(parse_lsn);
+}
+
+use Scalar::Util qw(blessed looks_like_number);
+use Carp;
+
+use overload
+	'""' => \&Str,
+	'<=>' => \&NumCmp,
+	'bool' => \&Bool,
+	'-' => \&Negate,
+	fallback => 1;
+
+=pod package pg_lsn
+
+A class to encapsulate a PostgreSQL log-sequence number (LSN) and handle conversion
+of its hex representation.
+
+Provides equality and inequality operators.
+
+Calling 'new' on undef or empty string argument returns undef, not an instance.
+
+=cut
+
+sub new_num
+{
+	my ($class, $high, $low) = @_;
+	my $self = bless { '_low' => $low, '_high' => $high } => $class;
+	$self->_constraint;
+	return $self;
+}
+
+sub new
+{
+	my ($class, $lsn_str) = @_;
+	return undef if !defined($lsn_str) || $lsn_str eq '';
+	my ($high, $low) = split('/', $lsn_str, 2);
+	die "malformed LSN" if ($high eq '' || $low eq '');
+	return $class->new_num(hex($high), hex($low));
+}
+
+sub NumCmp
+{
+	my ($self, $other, $swap) = @_;
+	$self->_constraint;
+	die "comparison with undef" unless defined($other);
+	if (!blessed($other))
+	{
+		# coerce from string if needed. Try to coerce any non-object.
+		$other = pg_lsn->new($other) if !blessed($other);
+	}
+	$other->_constraint;
+	# and compare
+	my $ret;
+	if ($self->{'_high'} < $other->{'_high'})
+	{
+		$ret = -1;
+	}
+	elsif ($self->{'_high'} == $other->{'_high'})
+	{
+		if ($self->{'_low'} < $other->{'_low'})
+		{
+			$ret = -1;
+		}
+		elsif ($self->{'_low'} == $other->{'_low'})
+		{
+			$ret = 0;
+		}
+		else
+		{
+			$ret = 1;
+		}
+	}
+	else
+	{
+		$ret = 1;
+	}
+	$ret = -$ret if $swap;
+	return $ret;
+}
+
+sub _constraint
+{
+	my $self = shift;
+	die "high word must be defined" unless (defined($self->{'_high'}));
+	die "high word must be numeric" unless (looks_like_number($self->{'_high'}));
+	die "high word must be in uint32 range" unless ($self->{'_high'} >= 0 && $self->{'_high'} <= 0xFFFFFFFF);
+	die "low word must be defined" unless (defined($self->{'_low'}));
+	die "low word must be numeric" unless (looks_like_number($self->{'_low'}));
+	die "low word must be in uint32 range" unless ($self->{'_low'} >= 0 && $self->{'_low'} <= 0xFFFFFFFF);
+}
+
+sub Bool
+{
+	my $self = shift;
+	$self->_constraint;
+	return $self->{'_high'} || $self->{'_low'};
+}
+
+sub Negate
+{
+	die "cannot negate pg_lsn";
+}
+
+sub Str
+{
+	my $self = shift;
+	return sprintf("%X/%X", $self->high, $self->low);
+}
+
+sub high
+{
+	my $self = shift;
+	return $self->{'_high'};
+}
+
+sub low
+{
+	my $self = shift;
+	return $self->{'_low'};
+}
+
+# Todo: addition/subtraction. Needs to handle wraparound and carrying.
+
+=pod parse_lsn(lsn)
+
+Returns a 2-array of the high and low words of the passed LSN as numbers,
+or undef if argument is the empty string or undef.
+
+=cut 
+
+sub parse_lsn
+{
+	return pg_lsn->new($_[0]);
+}
diff --git a/src/test/perl/t/001_load.pl b/src/test/perl/t/001_load.pl
new file mode 100644
index 0000000..53a39af
--- /dev/null
+++ b/src/test/perl/t/001_load.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+use Test::More tests => 5;
+
+require_ok 'RecursiveCopy';
+require_ok 'SimpleTee';
+require_ok 'TestLib';
+require_ok 'PostgresNode';
+require_ok 'pg_lsn';
diff --git a/src/test/perl/t/002_pg_lsn.pl b/src/test/perl/t/002_pg_lsn.pl
new file mode 100644
index 0000000..73e3d65
--- /dev/null
+++ b/src/test/perl/t/002_pg_lsn.pl
@@ -0,0 +1,68 @@
+use strict;
+use warnings;
+use Test::More tests => 42;
+use Scalar::Util qw(blessed);
+
+use pg_lsn qw(parse_lsn);
+
+ok(!defined(parse_lsn('')), 'parse_lsn of empty string is undef');
+ok(!defined(parse_lsn(undef)), 'parse_lsn of undef is undef');
+
+my $zero_lsn = parse_lsn('0/0');
+ok(blessed($zero_lsn), 'zero lsn blessed');
+ok($zero_lsn->isa("pg_lsn"), 'zero lsn isa pg_lsn');
+is($zero_lsn->{'_high'}, 0, 'zero lsn high word zero');
+is($zero_lsn->{'_low'}, 0, 'zero lsn low word zero');
+cmp_ok($zero_lsn, "==", pg_lsn->new_num(0, 0), 'parse_lsn of 0/0');
+
+cmp_ok(parse_lsn('0/FFFFFFFF'), "==", pg_lsn->new_num(0, 0xFFFFFFFF), 'parse_lsn of 0/FFFFFFFF');
+cmp_ok(parse_lsn('FFFFFFFF/0'), "==", pg_lsn->new_num(0xFFFFFFFF, 0), 'parse_lsn of FFFFFFFF/0');
+cmp_ok(parse_lsn('FFFFFFFF/FFFFFFFF'), "==", pg_lsn->new_num(0xFFFFFFFF, 0xFFFFFFFF), 'parse_lsn of 0xFFFFFFFF/0xFFFFFFFF');
+
+is(parse_lsn('2/2') <=> parse_lsn('2/3'), -1);
+is(parse_lsn('2/2') <=> parse_lsn('2/2'), 0);
+is(parse_lsn('2/2') <=> parse_lsn('2/1'), 1);
+is(parse_lsn('2/2') <=> parse_lsn('3/2'), -1);
+is(parse_lsn('2/2') <=> parse_lsn('1/2'), 1);
+
+cmp_ok(parse_lsn('0/1'), "==", parse_lsn('0/1'));
+ok(!(parse_lsn('0/1') == parse_lsn('0/2')), "! 0/1 == 0/2");
+ok(!(parse_lsn('0/1') == parse_lsn('0/0')), "! 0/1 == 0/0");
+cmp_ok(parse_lsn('1/0'), "==", parse_lsn('1/0'));
+cmp_ok(parse_lsn('1/0'), "!=", parse_lsn('1/1'));
+cmp_ok(parse_lsn('1/0'), "!=", parse_lsn('2/0'));
+cmp_ok(parse_lsn('1/0'), "!=", parse_lsn('0/0'));
+cmp_ok(parse_lsn('1/0'), "!=", parse_lsn('0/1'));
+
+cmp_ok(parse_lsn('0/1'), ">=", parse_lsn('0/1'));
+cmp_ok(parse_lsn('0/1'), "<=", parse_lsn('0/1'));
+cmp_ok(parse_lsn('0/1'), "<=", parse_lsn('0/2'));
+cmp_ok(parse_lsn('0/1'), ">=", parse_lsn('0/0'));
+cmp_ok(parse_lsn('1/0'), ">=", parse_lsn('1/0'));
+cmp_ok(parse_lsn('1/0'), "<=", parse_lsn('1/0'));
+cmp_ok(parse_lsn('1/0'), "<=", parse_lsn('2/0'));
+cmp_ok(parse_lsn('1/0'), ">=", parse_lsn('0/0'));
+cmp_ok(parse_lsn('1/1'), ">=", parse_lsn('1/1'));
+cmp_ok(parse_lsn('1/1'), "<=", parse_lsn('1/1'));
+cmp_ok(parse_lsn('1/1'), "<=", parse_lsn('1/2'));
+cmp_ok(parse_lsn('1/2'), ">=", parse_lsn('1/1'));
+
+ok(parse_lsn('1/1'), 'bool conversion');
+ok(! $zero_lsn, 'bool negation');
+
+# implicit string conversions
+cmp_ok(parse_lsn('0/0'), "==", "0/0");
+cmp_ok(parse_lsn('FFFFFFFF/FFFFFFFF'), "==", "FFFFFFFF/FFFFFFFF");
+# swapped string conversions
+cmp_ok("0/0", "==", parse_lsn('0/0'));
+cmp_ok("FFFFFFFF/FFFFFFFF", "==", parse_lsn('FFFFFFFF/FFFFFFFF'));
+
+# negation makes no sense for a uint64
+eval {
+	- parse_lsn('0/1');
+};
+if ($@) {
+	ok('negation raises error');
+} else {
+	fail('negation did not raise error');
+}
-- 
2.5.5

0001-PostgresNode-methods-to-wait-for-node-catchup.patchtext/x-patch; charset=US-ASCII; name=0001-PostgresNode-methods-to-wait-for-node-catchup.patchDownload
From 0dc9e4c93e2090ea52812628acc7d27f2ae47fb4 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Mon, 14 Nov 2016 12:27:17 +0800
Subject: [PATCH 1/3] PostgresNode methods to wait for node catchup

---
 src/test/perl/PostgresNode.pm         | 121 +++++++++++++++++++++++++++++++++-
 src/test/recovery/t/001_stream_rep.pl |  12 +---
 2 files changed, 121 insertions(+), 12 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index c1b16ca..4ce0bf2 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -93,6 +93,7 @@ use RecursiveCopy;
 use Socket;
 use Test::More;
 use TestLib ();
+use pg_lsn qw(parse_lsn);
 use Scalar::Util qw(blessed);
 
 our @EXPORT = qw(
@@ -1121,7 +1122,6 @@ sub psql
 		my $exc_save = $@;
 		if ($exc_save)
 		{
-
 			# IPC::Run::run threw an exception. re-throw unless it's a
 			# timeout, which we'll handle by testing is_expired
 			die $exc_save
@@ -1173,7 +1173,7 @@ sub psql
 		  if $ret == 1;
 		die "connection error: '$$stderr'\nwhile running '@psql_params'"
 		  if $ret == 2;
-		die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
+		die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'"
 		  if $ret == 3;
 		die "psql returns $ret: '$$stderr'\nwhile running '@psql_params'";
 	}
@@ -1325,6 +1325,123 @@ sub run_log
 	TestLib::run_log(@_);
 }
 
+=pod $node->lsn
+
+Return pg_current_xlog_insert_location() or, on a replica,
+pg_last_xlog_replay_location().
+
+=cut
+
+sub lsn
+{
+	my $self = shift;
+	return $self->safe_psql('postgres', 'select case when pg_is_in_recovery() then pg_last_xlog_replay_location() else pg_current_xlog_insert_location() end as lsn;');
+}
+
+=pod $node->wait_for_catchup(standby_name, mode, target_lsn)
+
+Wait for the node with application_name standby_name (usually from node->name)
+until its replication equals or passes the upstream's xlog insert point at the
+time this function is called. By default the replay_location is waited for,
+but 'mode' may be specified to wait for any of sent|write|flush|replay.
+
+If there is no active replication connection from this peer, waits until
+poll_query_until timeout.
+
+Requires that the 'postgres' db exists and is accessible.
+
+If pos is passed, use that xlog position instead of the server's current
+xlog insert position.
+
+This is not a test. It die()s on failure.
+
+Returns the LSN caught up to.
+
+=cut
+
+sub wait_for_catchup
+{
+	my ($self, $standby_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'replay';
+	my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 );
+	die "valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode});
+	if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") ) {
+		$standby_name = $standby_name->name;
+	}
+	if (!defined($target_lsn)) {
+		$target_lsn = $self->lsn;
+	}
+	diag "waiting for $standby_name to catch up to $target_lsn";
+	$self->poll_query_until('postgres', qq[SELECT '$target_lsn' <= ${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';])
+		or die "timed out waiting for catchup";
+	return $target_lsn;
+}
+
+=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
+
+Wait for the named replication slot to equal or pass the xlog position of the
+server, or the supplied target_lsn if given. The position used is the
+restart_lsn unless mode is given, in which case it may be 'restart' or
+'confirmed_flush'.
+
+Requires that the 'postgres' db exists and is accessible.
+
+This is not a test. It die()s on failure.
+
+If the slot is not active, will time out after poll_query_until's timeout.
+
+Note that for logical slots, restart_lsn is held down by the oldest in progress tx.
+
+Returns the LSN caught up to.
+
+=cut
+
+sub wait_for_slot_catchup
+{
+	my ($self, $slot_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'restart';
+	if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) {
+		die "valid modes are restart, confirmed_flush";
+	}
+	if (!defined($target_lsn)) {
+		$target_lsn = $self->lsn;
+	}
+	$self->poll_query_until('postgres', qq[SELECT '$target_lsn' <= ${mode}_lsn FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name';])
+		or die "timed out waiting for catchup";
+	return $target_lsn;
+}
+
+=pod $node->slot(slot_name)
+
+Return hash-ref of replication slot data for the named slot, or a hash-ref with
+all values '' if not found. Does not differentiate between null and empty string
+for fields, no field is ever undef.
+
+The restart_lsn and confirmed_flush_lsn fields are returned verbatim, and also
+as a 2-list of [highword, lowword] integer. Since we rely on Perl 5.8.8 we can't
+"use bigint", it's from 5.20, and we can't assume we have Math::Bigint from CPAN
+either.
+
+=cut
+
+sub slot
+{
+	my ($self, $slot_name) = @_;
+	my @fields = ('plugin', 'slot_type', 'datoid', 'database', 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
+	my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ', @fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name'");
+	$result = undef if $result eq '';
+	# hash slice, see http://stackoverflow.com/a/16755894/398670 .
+	#
+	# Fills the hash with empty strings produced by x-operator element
+	# duplication if result is an empty row
+	#
+	my %val;
+	@val{@fields} = $result ne '' ? split(qr/\|/, $result) : ('',) x scalar(@fields);
+	$val{'restart_lsn_arr'} = parse_lsn($val{'restart_lsn'});
+	$val{'confirmed_flush_lsn_arr'} = parse_lsn($val{'confirmed_flush_lsn'});
+	return \%val;
+}
+
 =pod
 
 =back
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 981c00b..5ce69bb 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -40,16 +40,8 @@ $node_master->safe_psql('postgres',
 	"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
 
 # Wait for standbys to catch up
-my $applname_1 = $node_standby_1->name;
-my $applname_2 = $node_standby_2->name;
-my $caughtup_query =
-"SELECT pg_current_xlog_location() <= replay_location FROM pg_stat_replication WHERE application_name = '$applname_1';";
-$node_master->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby 1 to catch up";
-$caughtup_query =
-"SELECT pg_last_xlog_replay_location() <= replay_location FROM pg_stat_replication WHERE application_name = '$applname_2';";
-$node_standby_1->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby 2 to catch up";
+$node_master->wait_for_catchup($node_standby_1);
+$node_standby_1->wait_for_catchup($node_standby_2);
 
 my $result =
   $node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int");
-- 
2.5.5

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#1)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 14 November 2016 at 15:01, Craig Ringer <craig@2ndquadrant.com> wrote:

These three are related enough, and all only touch the TAP framework,
so I've bundled them in a series.

CF entry https://commitfest.postgresql.org/12/883/

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Aleksander Alekseev
a.alekseev@postgrespro.ru
In reply to: Craig Ringer (#1)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

Hello, Craig.

I noticed that these patches have a "Needs review" status and decided to
take a look. Surprisingly I didn't manage to find many defects.

* I like this idea in general;
* Code is test covered and documented well;
* All tests pass;
* No warnings during compilation and/or tests run;
* I didn't find any typos;
* Code style seems to be OK;

Though are you sure you want to name a class like_this instead of LikeThis?
It's quite uncommon in Perl code and usually used only for packages like
"strict" and "warnings". However I prefer no to insist on changes like
this since it's a matter of taste.

The bottom line --- this code looks good to me. If there is nothing you
would like to change in the last moment I suggest to change the status
to "Ready for committer".

--
Best regards,
Aleksander Alekseev

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Craig Ringer (#1)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 14 November 2016 at 07:01, Craig Ringer <craig@2ndquadrant.com> wrote:

Every test I write with the TAP framework for recovery seems to need
to wait for one node to catch up to another or examine replication
slot state. So: attached is a patch to add helper methods for these
tasks.

Also add a new pg_lsn.pm helper class to deal with PostgreSQL's
awkward representation of LSNs in perl. Made extra fun by our use of
Perl 5.8.8 with no new modules, so we can't rely on having 64-bit
integers. Provides sensible LSN comparisons. I'll be using this in
coming patches that have to make assertions about differences between
LSNs, and I think it's sensible to have anyway. Incorporates tests for
the class.

Finally, add some coverage of physical replication slots to recovery tests.

Backpatch to 9.6 desirable, since we seem to be doing that for TAP
infrastructure.

These three are related enough, and all only touch the TAP framework,
so I've bundled them in a series.

Good to see more tests going in.

Bit confused... can't see a caller for wait_for_slot_catchup() and the
slot tests don't call it AFAICS.

Also can't see anywhere the LSN stuff is used either.

No specific problems with the code, just don't want to commit code
that isn't used anywhere, yet.

I want to commit the extra tests soon, as a stronger test for my
recovery.conf changes.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Simon Riggs (#4)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 2 January 2017 at 20:17, Simon Riggs <simon@2ndquadrant.com> wrote:

Bit confused... can't see a caller for wait_for_slot_catchup() and the
slot tests don't call it AFAICS.

The recovery tests for decoding on standby will use it. I can delay
adding it until then.

Also can't see anywhere the LSN stuff is used either.

Removed after discussion with Michael in the logical decoding on standby thread.

I'll be posting a new patch series there shortly, which removes
pg_lsn.pm and modifies the PostgresNode.pm changes per discussion. If
you're able to commit the updated PostgresNode.pm and new standby
tests that'd be very handy.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#5)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 3 January 2017 at 12:36, Craig Ringer <craig@2ndquadrant.com> wrote:

On 2 January 2017 at 20:17, Simon Riggs <simon@2ndquadrant.com> wrote:

Bit confused... can't see a caller for wait_for_slot_catchup() and the
slot tests don't call it AFAICS.

The recovery tests for decoding on standby will use it. I can delay
adding it until then.

Also can't see anywhere the LSN stuff is used either.

Removed after discussion with Michael in the logical decoding on standby thread.

I'll be posting a new patch series there shortly, which removes
pg_lsn.pm and modifies the PostgresNode.pm changes per discussion. If
you're able to commit the updated PostgresNode.pm and new standby
tests that'd be very handy.

To keep things together, I've followed up on the logical decoding on
standby thread that now incorporates all these patches.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#6)
2 attachment(s)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 4 January 2017 at 12:39, Craig Ringer <craig@2ndquadrant.com> wrote:

To keep things together, I've followed up on the logical decoding on
standby thread that now incorporates all these patches.

Attached are the two patches discussed upthread, in their
proposed-for-commit form, as requested by Simon.

These correspond to patches 0001 and 0004 of the logical decoding on
standby series at
/messages/by-id/CAMsr+YEzC=-+eV09A=ra150FjtkmTqT5Q70PiqBwytbOR3cshg@mail.gmail.com
.

This subset is tracked as https://commitfest.postgresql.org/12/883/ .

When committed I will update the decoding on standby series to omit
these pre-requisite patches.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-PostgresNode-methods-to-wait-for-node-catchup.patchtext/x-patch; charset=US-ASCII; name=0001-PostgresNode-methods-to-wait-for-node-catchup.patchDownload
From 79241c8052fbd1ecd079e98fd8564e4b2fcf797b Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Mon, 14 Nov 2016 12:27:17 +0800
Subject: [PATCH 1/2] PostgresNode methods to wait for node catchup

---
 src/test/perl/PostgresNode.pm              | 172 ++++++++++++++++++++++++++++-
 src/test/recovery/t/001_stream_rep.pl      |  12 +-
 src/test/recovery/t/004_timeline_switch.pl |  16 +--
 3 files changed, 175 insertions(+), 25 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index c1b16ca..2f009d4 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1121,7 +1121,6 @@ sub psql
 		my $exc_save = $@;
 		if ($exc_save)
 		{
-
 			# IPC::Run::run threw an exception. re-throw unless it's a
 			# timeout, which we'll handle by testing is_expired
 			die $exc_save
@@ -1173,7 +1172,7 @@ sub psql
 		  if $ret == 1;
 		die "connection error: '$$stderr'\nwhile running '@psql_params'"
 		  if $ret == 2;
-		die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
+		die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'"
 		  if $ret == 3;
 		die "psql returns $ret: '$$stderr'\nwhile running '@psql_params'";
 	}
@@ -1325,6 +1324,175 @@ sub run_log
 	TestLib::run_log(@_);
 }
 
+=pod $node->lsn(mode)
+
+Look up xlog positions on the server:
+
+* insert position (master only, error on replica)
+* write position (master only, error on replica)
+* flush position
+* receive position (always undef on master)
+* replay position
+
+mode must be specified.
+
+=cut
+
+sub lsn
+{
+	my ($self, $mode) = @_;
+	my %modes = ('insert' => 'pg_current_xlog_insert_location()',
+				 'flush' => 'pg_current_xlog_flush_location()',
+				 'write' => 'pg_current_xlog_location()',
+				 'receive' => 'pg_last_xlog_receive_location()',
+				 'replay' => 'pg_last_xlog_replay_location()');
+
+	$mode = '<undef>' if !defined($mode);
+	die "unknown mode for 'lsn': '$mode', valid modes are " . join(', ', keys %modes)
+		if !defined($modes{$mode});
+
+	my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}");
+	chomp($result);
+	if ($result eq '')
+	{
+		return undef;
+	}
+	else
+	{
+		return $result;
+	}
+}
+
+=pod $node->wait_for_catchup(standby_name, mode, target_lsn)
+
+Wait for the node with application_name standby_name (usually from node->name)
+until its replication position in pg_stat_replication equals or passes the
+upstream's xlog insert point at the time this function is called. By default
+the replay_location is waited for, but 'mode' may be specified to wait for any
+of sent|write|flush|replay.
+
+If there is no active replication connection from this peer, waits until
+poll_query_until timeout.
+
+Requires that the 'postgres' db exists and is accessible.
+
+target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert').
+
+This is not a test. It die()s on failure.
+
+=cut
+
+sub wait_for_catchup
+{
+	my ($self, $standby_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'replay';
+	my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 );
+	die "unknown mode $mode for 'wait_for_catchup', valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode});
+	# Allow passing of a PostgresNode instance as shorthand
+	if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") )
+	{
+		$standby_name = $standby_name->name;
+	}
+	die 'target_lsn must be specified' unless defined($target_lsn);
+	print "Waiting for replication conn " . $standby_name . "'s " . $mode . "_location to pass " . $target_lsn . " on " . $self->name . "\n";
+	my $query = qq[SELECT '$target_lsn' <= ${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
+	$self->poll_query_until('postgres', $query)
+		or die "timed out waiting for catchup, current position is " . ($self->safe_psql('postgres', $query) || '(unknown)');
+	print "done\n";
+}
+
+=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
+
+Wait for the named replication slot to equal or pass the supplied target_lsn.
+The position used is the restart_lsn unless mode is given, in which case it may
+be 'restart' or 'confirmed_flush'.
+
+Requires that the 'postgres' db exists and is accessible.
+
+This is not a test. It die()s on failure.
+
+If the slot is not active, will time out after poll_query_until's timeout.
+
+target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert').
+
+Note that for logical slots, restart_lsn is held down by the oldest in-progress tx.
+
+=cut
+
+sub wait_for_slot_catchup
+{
+	my ($self, $slot_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'restart';
+	if (!($mode eq 'restart' || $mode eq 'confirmed_flush'))
+	{
+		die "valid modes are restart, confirmed_flush";
+	}
+	die 'target lsn must be specified' unless defined($target_lsn);
+	print "Waiting for replication slot " . $slot_name . "'s " . $mode . "_lsn to pass " . $target_lsn . " on " . $self->name . "\n";
+	my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name';];
+	$self->poll_query_until('postgres', $query)
+		or die "timed out waiting for catchup, current position is " . ($self->safe_psql('postgres', $query) || '(unknown)');
+	print "done\n";
+}
+
+=pod $node->query_hash($dbname, $query, @columns)
+
+Execute $query on $dbname, replacing any appearance of the string __COLUMNS__
+within the query with a comma-separated list of @columns.
+
+If __COLUMNS__ does not appear in the query, its result columns must EXACTLY
+match the order and number (but not necessarily alias) of supplied @columns.
+
+The query must return zero or one rows.
+
+Return a hash-ref representation of the results of the query, with any empty
+or null results as defined keys with an empty-string value. There is no way
+to differentiate between null and empty-string result fields.
+
+If the query returns zero rows, return a hash with all columns empty. There
+is no way to differentiate between zero rows returned and a row with only
+null columns.
+
+=cut
+
+sub query_hash
+{
+	my ($self, $dbname, $query, @columns) = @_;
+	die 'calls in array context for multi-row results not supported yet' if (wantarray);
+	# Replace __COLUMNS__ if found
+	substr($query, index($query, '__COLUMNS__'), length('__COLUMNS__')) = join(', ', @columns)
+		if index($query, '__COLUMNS__') >= 0;
+	my $result = $self->safe_psql($dbname, $query);
+	# hash slice, see http://stackoverflow.com/a/16755894/398670 .
+	#
+	# Fills the hash with empty strings produced by x-operator element
+	# duplication if result is an empty row
+	#
+	my %val;
+	@val{@columns} = $result ne '' ? split(qr/\|/, $result) : ('',) x scalar(@columns);
+	return \%val;
+}
+
+=pod $node->slot(slot_name)
+
+Return hash-ref of replication slot data for the named slot, or a hash-ref with
+all values '' if not found. Does not differentiate between null and empty string
+for fields, no field is ever undef.
+
+The restart_lsn and confirmed_flush_lsn fields are returned verbatim, and also
+as a 2-list of [highword, lowword] integer. Since we rely on Perl 5.8.8 we can't
+"use bigint", it's from 5.20, and we can't assume we have Math::Bigint from CPAN
+either.
+
+=cut
+
+sub slot
+{
+	my ($self, $slot_name) = @_;
+	my @columns = ('plugin', 'slot_type', 'datoid', 'database', 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
+	return $self->query_hash('postgres', "SELECT __COLUMNS__ FROM pg_catalog.pg_replication_slots WHERE slot_name = '$slot_name'", @columns);
+}
+
 =pod
 
 =back
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 981c00b..ba1da8c 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -40,16 +40,8 @@ $node_master->safe_psql('postgres',
 	"CREATE TABLE tab_int AS SELECT generate_series(1,1002) AS a");
 
 # Wait for standbys to catch up
-my $applname_1 = $node_standby_1->name;
-my $applname_2 = $node_standby_2->name;
-my $caughtup_query =
-"SELECT pg_current_xlog_location() <= replay_location FROM pg_stat_replication WHERE application_name = '$applname_1';";
-$node_master->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby 1 to catch up";
-$caughtup_query =
-"SELECT pg_last_xlog_replay_location() <= replay_location FROM pg_stat_replication WHERE application_name = '$applname_2';";
-$node_standby_1->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby 2 to catch up";
+$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('insert'));
+$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $node_standby_1->lsn('replay'));
 
 my $result =
   $node_standby_1->safe_psql('postgres', "SELECT count(*) FROM tab_int");
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
index 5f3b2fe..7c6587a 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -32,14 +32,9 @@ $node_standby_2->start;
 # Create some content on master
 $node_master->safe_psql('postgres',
 	"CREATE TABLE tab_int AS SELECT generate_series(1,1000) AS a");
-my $until_lsn =
-  $node_master->safe_psql('postgres', "SELECT pg_current_xlog_location();");
 
 # Wait until standby has replayed enough data on standby 1
-my $caughtup_query =
-  "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
-$node_standby_1->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby to catch up";
+$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('write'));
 
 # Stop and remove master, and promote standby 1, switching it to a new timeline
 $node_master->teardown_node;
@@ -50,7 +45,7 @@ rmtree($node_standby_2->data_dir . '/recovery.conf');
 my $connstr_1 = $node_standby_1->connstr;
 $node_standby_2->append_conf(
 	'recovery.conf', qq(
-primary_conninfo='$connstr_1'
+primary_conninfo='$connstr_1 application_name=@{[$node_standby_2->name]}'
 standby_mode=on
 recovery_target_timeline='latest'
 ));
@@ -60,12 +55,7 @@ $node_standby_2->restart;
 # to ensure that the timeline switch has been done.
 $node_standby_1->safe_psql('postgres',
 	"INSERT INTO tab_int VALUES (generate_series(1001,2000))");
-$until_lsn = $node_standby_1->safe_psql('postgres',
-	"SELECT pg_current_xlog_location();");
-$caughtup_query =
-  "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
-$node_standby_2->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby to catch up";
+$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $node_standby_1->lsn('write'));
 
 my $result =
   $node_standby_2->safe_psql('postgres', "SELECT count(*) FROM tab_int");
-- 
2.5.5

0002-Expand-streaming-replication-tests-to-cover-hot-stan.patchtext/x-patch; charset=US-ASCII; name=0002-Expand-streaming-replication-tests-to-cover-hot-stan.patchDownload
From 859591607f203bb986c8eaa6acf4b3841324f5a6 Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Wed, 9 Nov 2016 13:44:04 +0800
Subject: [PATCH 2/2] Expand streaming replication tests to cover hot standby
 feedback and physical replication slots

---
 src/test/recovery/t/001_stream_rep.pl | 105 +++++++++++++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ba1da8c..eef512d 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 22;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -58,3 +58,106 @@ is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
 	3, 'read-only queries on standby 1');
 is($node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
 	3, 'read-only queries on standby 2');
+
+diag "switching to physical replication slot";
+# Switch to using a physical replication slot. We can do this without a new
+# backup since physical slots can go backwards if needed. Do so on both
+# standbys. Since we're going to be testing things that affect the slot state,
+# also increase the standby feedback interval to ensure timely updates.
+my ($slotname_1, $slotname_2) = ('standby_1', 'standby_2');
+$node_master->append_conf('postgresql.conf', "max_replication_slots = 4\n");
+$node_master->restart;
+is($node_master->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_1');]), 0, 'physical slot created on master');
+$node_standby_1->append_conf('recovery.conf', "primary_slot_name = $slotname_1\n");
+$node_standby_1->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n");
+$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n");
+$node_standby_1->restart;
+is($node_standby_1->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_2');]), 0, 'physical slot created on intermediate replica');
+$node_standby_2->append_conf('recovery.conf', "primary_slot_name = $slotname_2\n");
+$node_standby_2->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n");
+$node_standby_2->restart;
+
+sub get_slot_xmins
+{
+	my ($node, $slotname) = @_;
+	my $slotinfo = $node->slot($slotname);
+	return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+# There's no hot standby feedback and there are no logical slots on either peer
+# so xmin and catalog_xmin should be null on both slots.
+my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
+is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');
+
+# Replication still works?
+$node_master->safe_psql('postgres', 'CREATE TABLE replayed(val integer);');
+
+sub replay_check
+{
+	my $newval = $node_master->safe_psql('postgres', 'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val');
+	$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('insert'));
+	$node_standby_1->wait_for_catchup($node_standby_2, 'replay', $node_standby_1->lsn('replay'));
+	$node_standby_1->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval])
+		or die "standby_1 didn't replay master value $newval";
+	$node_standby_2->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval])
+		or die "standby_2 didn't replay standby_1 value $newval";
+}
+
+replay_check();
+
+diag "enabling hot_standby_feedback";
+# Enable hs_feedback. The slot should gain an xmin. We set the status interval
+# so we'll see the results promptly.
+$node_standby_1->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;');
+$node_standby_1->reload;
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;');
+$node_standby_2->reload;
+replay_check();
+sleep(2);
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+isnt($xmin, '', 'non-cascaded slot xmin non-null with hs feedback');
+is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+isnt($xmin, '', 'cascaded slot xmin non-null with hs feedback');
+is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback');
+
+diag "doing some work to advance xmin";
+for my $i (10000..11000) {
+	$node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES ($i);]);
+}
+$node_master->safe_psql('postgres', 'VACUUM;');
+$node_master->safe_psql('postgres', 'CHECKPOINT;');
+
+my ($xmin2, $catalog_xmin2) = get_slot_xmins($node_master, $slotname_1);
+diag "new xmin $xmin2, old xmin $xmin";
+isnt($xmin2, $xmin, 'non-cascaded slot xmin with hs feedback has changed');
+is($catalog_xmin2, '', 'non-cascaded slot xmin still null with hs_feedback unchanged');
+
+($xmin2, $catalog_xmin2) = get_slot_xmins($node_standby_1, $slotname_2);
+diag "new xmin $xmin2, old xmin $xmin";
+isnt($xmin2, $xmin, 'cascaded slot xmin with hs feedback has changed');
+is($catalog_xmin2, '', 'cascaded slot xmin still null with hs_feedback unchanged');
+
+diag "disabling hot_standby_feedback";
+# Disable hs_feedback. Xmin should be cleared.
+$node_standby_1->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;');
+$node_standby_1->reload;
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;');
+$node_standby_2->reload;
+replay_check();
+sleep(2);
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+is($xmin, '', 'non-cascaded slot xmin null with hs feedback reset');
+is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback reset');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin null with hs feedback reset');
+is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback reset');
-- 
2.5.5

#8Simon Riggs
simon@2ndquadrant.com
In reply to: Craig Ringer (#7)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 4 January 2017 at 08:16, Craig Ringer <craig@2ndquadrant.com> wrote:

On 4 January 2017 at 12:39, Craig Ringer <craig@2ndquadrant.com> wrote:

To keep things together, I've followed up on the logical decoding on
standby thread that now incorporates all these patches.

Attached are the two patches discussed upthread, in their
proposed-for-commit form, as requested by Simon.

These correspond to patches 0001 and 0004 of the logical decoding on
standby series at
/messages/by-id/CAMsr+YEzC=-+eV09A=ra150FjtkmTqT5Q70PiqBwytbOR3cshg@mail.gmail.com
.

This subset is tracked as https://commitfest.postgresql.org/12/883/ .

When committed I will update the decoding on standby series to omit
these pre-requisite patches.

Committed, thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Simon Riggs (#8)
1 attachment(s)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On Thu, Jan 5, 2017 at 1:58 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 4 January 2017 at 08:16, Craig Ringer <craig@2ndquadrant.com> wrote:

When committed I will update the decoding on standby series to omit
these pre-requisite patches.

Committed, thanks.

I was planning a round of reviews of those patches, but you were faster than me.

So now looking at what has been committed.

The perldoc documentation is broken for the new routines. The commits
have added patterns like that:
=pod $node->routine_name
But what needs to be done is to use =pod and =item, like that:
=pod
=item $node->routine_name

+Look up xlog positions on the server:
This has better be "*WAL* positions". There is another reference with
xlog (see recent threads about renaming those functions for example).

+* insert position (master only, error on replica)
+* write position (master only, error on replica)
+* flush position
+* receive position (always undef on master)
+* replay position
Replay position is always undefined on primary, let's document it in
the description of the routine. And flush position generates an error
for a standby.

The documentation of $node->lsn is generated like that, with a very
unfriendly list of modes:
$node->lsn(mode)
Look up xlog positions on the server:

* insert position (master only, error on replica) * write position
(master only, error on replica) * flush position * receive position
(always undef on master) * replay position
A trick that I have found here is to add a space before the '*'.

It may be a good idea to run perltidy on top of that to be honest...

Attached is a patch to fix all those small issues.
--
Michael

Attachments:

tap-doc-fix.patchtext/plain; charset=US-ASCII; name=tap-doc-fix.patchDownload
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 2a4ceb3d42..053c5ea787 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1324,15 +1324,17 @@ sub run_log
 	TestLib::run_log(@_);
 }
 
-=pod $node->lsn(mode)
+=pod
+
+=item $node->lsn(mode)
 
-Look up xlog positions on the server:
+Look up WAL positions on the server:
 
-* insert position (master only, error on replica)
-* write position (master only, error on replica)
-* flush position
-* receive position (always undef on master)
-* replay position
+ * insert position (master only, error on replica)
+ * write position (master only, error on replica)
+ * flush position (master only, error on replica)
+ * receive position (always undef on master)
+ * replay position (always undef on master)
 
 mode must be specified.
 
@@ -1363,11 +1365,13 @@ sub lsn
 	}
 }
 
-=pod $node->wait_for_catchup(standby_name, mode, target_lsn)
+=pod
+
+=item $node->wait_for_catchup(standby_name, mode, target_lsn)
 
 Wait for the node with application_name standby_name (usually from node->name)
 until its replication position in pg_stat_replication equals or passes the
-upstream's xlog insert point at the time this function is called. By default
+upstream's WAL insert point at the time this function is called. By default
 the replay_location is waited for, but 'mode' may be specified to wait for any
 of sent|write|flush|replay.
 
@@ -1401,7 +1405,9 @@ sub wait_for_catchup
 	print "done\n";
 }
 
-=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
+=pod
+
+=item $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
 
 Wait for the named replication slot to equal or pass the supplied target_lsn.
 The position used is the restart_lsn unless mode is given, in which case it may
@@ -1435,7 +1441,9 @@ sub wait_for_slot_catchup
 	print "done\n";
 }
 
-=pod $node->query_hash($dbname, $query, @columns)
+=pod
+
+=item $node->query_hash($dbname, $query, @columns)
 
 Execute $query on $dbname, replacing any appearance of the string __COLUMNS__
 within the query with a comma-separated list of @columns.
@@ -1473,7 +1481,9 @@ sub query_hash
 	return \%val;
 }
 
-=pod $node->slot(slot_name)
+=pod
+
+=item $node->slot(slot_name)
 
 Return hash-ref of replication slot data for the named slot, or a hash-ref with
 all values '' if not found. Does not differentiate between null and empty string
#10Simon Riggs
simon@2ndquadrant.com
In reply to: Michael Paquier (#9)
Re: [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

On 5 January 2017 at 04:50, Michael Paquier <michael.paquier@gmail.com> wrote:

The perldoc documentation is broken for the new routines.

...

Attached is a patch to fix all those small issues.

Thanks, looks good, will apply.

It may be a good idea to run perltidy on top of that to be honest...

Should we add that to the makefile? I guess start a new thread if you think so.

Anything that helps me check its correct is a good thing AFAICS.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers