pgsql: Add regression tests for multiple synchronous standbys.

Started by Fujii Masaoalmost 10 years ago11 messages
#1Fujii Masao
fujii@postgresql.org

Add regression tests for multiple synchronous standbys.

Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/196b72fb9a5556c66f2b012cc4e869175a3049fa

Modified Files
--------------
src/test/perl/PostgresNode.pm | 18 +++++
src/test/recovery/t/007_sync_rep.pl | 149 ++++++++++++++++++++++++++++++++++++
2 files changed, 167 insertions(+)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#1)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii@postgresql.org> wrote:

Add regression tests for multiple synchronous standbys.

Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi

Well, we are not quite there yet:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-04-12%2016%3A00%3A06

# Running: pg_ctl -D
/home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
reload
server signaled
not ok 2 - asterisk in synchronous_standby_names

# Failed test 'asterisk in synchronous_standby_names'
# at t/007_sync_rep.pl line 26.
# got: 'standby1|1|sync
# standby2|1|potential
# standby3|0|async'
# expected: 'standby1|1|sync
# standby2|1|potential
# standby3|1|potential'

I am adding an open item for now, I guess that's on my plate, as
co-authorm and one who has the environment to trigger that (didn't see
it 10 times in a row in my tests though).
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#2)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Wed, Apr 13, 2016 at 4:54 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii@postgresql.org> wrote:

Add regression tests for multiple synchronous standbys.

Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi

Well, we are not quite there yet:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-04-12%2016%3A00%3A06

# Running: pg_ctl -D
/home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
reload
server signaled
not ok 2 - asterisk in synchronous_standby_names

# Failed test 'asterisk in synchronous_standby_names'
# at t/007_sync_rep.pl line 26.
# got: 'standby1|1|sync
# standby2|1|potential
# standby3|0|async'
# expected: 'standby1|1|sync
# standby2|1|potential
# standby3|1|potential'

This seems to be a timing issue.

There can be small window after SIGHUP is sent before walsender updates
its priority based on new s_s_names. If pg_stat_replication is checked
before that update, it displays unexpected output. Probably we need to
sleep a few second after pg_ctl reload before pg_stat_replication.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#3)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Thu, Apr 14, 2016 at 11:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Apr 13, 2016 at 4:54 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii@postgresql.org> wrote:

Add regression tests for multiple synchronous standbys.

Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi

Well, we are not quite there yet:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-04-12%2016%3A00%3A06

# Running: pg_ctl -D
/home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
reload
server signaled
not ok 2 - asterisk in synchronous_standby_names

# Failed test 'asterisk in synchronous_standby_names'
# at t/007_sync_rep.pl line 26.
# got: 'standby1|1|sync
# standby2|1|potential
# standby3|0|async'
# expected: 'standby1|1|sync
# standby2|1|potential
# standby3|1|potential'

This seems to be a timing issue.

There can be small window after SIGHUP is sent before walsender updates
its priority based on new s_s_names. If pg_stat_replication is checked
before that update, it displays unexpected output. Probably we need to
sleep a few second after pg_ctl reload before pg_stat_replication.

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
test_passed = 0;
while ($timeout < 30)
{
$result = $node->psql('SELECT blah FROM pg_stat_replication');
if ($result eq $expected)
{
test_passed = 1;
break;
}
sleep 1;
$timeout++;
}
ok($test_passed, $test_name);

What do you think?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#4)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 14, 2016 at 11:29 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Apr 13, 2016 at 4:54 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii@postgresql.org> wrote:

Add regression tests for multiple synchronous standbys.

Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
Reviewed-By: Kyotaro Horiguchi

Well, we are not quite there yet:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&amp;dt=2016-04-12%2016%3A00%3A06

# Running: pg_ctl -D
/home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
reload
server signaled
not ok 2 - asterisk in synchronous_standby_names

# Failed test 'asterisk in synchronous_standby_names'
# at t/007_sync_rep.pl line 26.
# got: 'standby1|1|sync
# standby2|1|potential
# standby3|0|async'
# expected: 'standby1|1|sync
# standby2|1|potential
# standby3|1|potential'

This seems to be a timing issue.

There can be small window after SIGHUP is sent before walsender updates
its priority based on new s_s_names. If pg_stat_replication is checked
before that update, it displays unexpected output. Probably we need to
sleep a few second after pg_ctl reload before pg_stat_replication.

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
test_passed = 0;
while ($timeout < 30)
{
$result = $node->psql('SELECT blah FROM pg_stat_replication');
if ($result eq $expected)
{
test_passed = 1;
break;
}
sleep 1;
$timeout++;
}
ok($test_passed, $test_name);

What do you think?

Look good to me. +1.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael.paquier@gmail.com
In reply to: Masahiko Sawada (#5)
1 attachment(s)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Thu, Apr 14, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
What do you think?

Look good to me. +1.

And so here we go...
--
Michael

Attachments:

nsync-test-fix.patchtext/x-patch; charset=US-ASCII; name=nsync-test-fix.patchDownload
diff --git a/src/test/recovery/t/007_sync_rep.pl b/src/test/recovery/t/007_sync_rep.pl
index c257b6e..908fe49 100644
--- a/src/test/recovery/t/007_sync_rep.pl
+++ b/src/test/recovery/t/007_sync_rep.pl
@@ -22,8 +22,28 @@ sub test_sync_state
 		$self->reload;
 	}
 
-	my $result = $self->safe_psql('postgres', $check_sql);
-	is($result, $expected, $msg);
+	my $timeout_max = 30;
+	my $timeout = 0;
+	my $test_passed = 0;
+
+	# A reload may take some time to take effect on busy machines,
+	# hence use a loop with a timeout to give some room for the test
+	# to pass.
+	while ($timeout < $timeout_max)
+	{
+		my $result = $self->safe_psql('postgres', $check_sql);
+
+		if ($result eq $expected)
+		{
+			$test_passed = 1;
+			last;
+		}
+
+		$timeout++;
+		sleep 1;
+	}
+
+	ok($test_passed, $msg);
 }
 
 # Initialize master node
#7Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#6)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Thu, Apr 14, 2016 at 8:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 14, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
What do you think?

Look good to me. +1.

+1 from me.

And so here we go...

+ ok($test_passed, $msg);

ISTM that this change prevents the test from outputting the difference
of expected and actual results when the test fails. Which would make
the diagnosis of the test failure more difficult, I'm afraid.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#7)
1 attachment(s)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Fri, Apr 15, 2016 at 12:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 14, 2016 at 8:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 14, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
What do you think?

Look good to me. +1.

+1 from me.

And so here we go...

+ ok($test_passed, $msg);

ISTM that this change prevents the test from outputting the difference
of expected and actual results when the test fails. Which would make
the diagnosis of the test failure more difficult, I'm afraid.

Well, then, it is just a matter of saving the result in a variable
defined out of the loop, and use is() for the test. This way, after
the timeout it is possible to check if the expected result and the
fetched result match properly or not. In other words see attached.
--
Michael

Attachments:

nsync-test-fix-v2.patchtext/x-patch; charset=US-ASCII; name=nsync-test-fix-v2.patchDownload
diff --git a/src/test/recovery/t/007_sync_rep.pl b/src/test/recovery/t/007_sync_rep.pl
index c257b6e..d551954 100644
--- a/src/test/recovery/t/007_sync_rep.pl
+++ b/src/test/recovery/t/007_sync_rep.pl
@@ -22,7 +22,23 @@ sub test_sync_state
 		$self->reload;
 	}
 
-	my $result = $self->safe_psql('postgres', $check_sql);
+	my $timeout_max = 30;
+	my $timeout = 0;
+	my $result;
+
+	# A reload may take some time to take effect on busy machines,
+	# hence use a loop with a timeout to give some room for the test
+	# to pass.
+	while ($timeout < $timeout_max)
+	{
+		$result = $self->safe_psql('postgres', $check_sql);
+
+		last if ($result eq $expected);
+
+		$timeout++;
+		sleep 1;
+	}
+
 	is($result, $expected, $msg);
 }
 
#9Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#8)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Fri, Apr 15, 2016 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 15, 2016 at 12:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 14, 2016 at 8:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 14, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
What do you think?

Look good to me. +1.

+1 from me.

And so here we go...

+ ok($test_passed, $msg);

ISTM that this change prevents the test from outputting the difference
of expected and actual results when the test fails. Which would make
the diagnosis of the test failure more difficult, I'm afraid.

Well, then, it is just a matter of saving the result in a variable
defined out of the loop, and use is() for the test. This way, after
the timeout it is possible to check if the expected result and the
fetched result match properly or not. In other words see attached.

The patch looks good to me.

+ my $timeout_max = 30;

One comment is; isn't 30 (seconds) too large value for the timeout?
What about just, say, 5?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#9)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Fri, Apr 15, 2016 at 12:57 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Apr 15, 2016 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 15, 2016 at 12:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 14, 2016 at 8:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 14, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
What do you think?

Look good to me. +1.

+1 from me.

And so here we go...

+ ok($test_passed, $msg);

ISTM that this change prevents the test from outputting the difference
of expected and actual results when the test fails. Which would make
the diagnosis of the test failure more difficult, I'm afraid.

Well, then, it is just a matter of saving the result in a variable
defined out of the loop, and use is() for the test. This way, after
the timeout it is possible to check if the expected result and the
fetched result match properly or not. In other words see attached.

The patch looks good to me.

+ my $timeout_max = 30;

One comment is; isn't 30 (seconds) too large value for the timeout?
What about just, say, 5?

hamster can become easily quite busy to be honest.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#10)
Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

On Fri, Apr 15, 2016 at 1:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 15, 2016 at 12:57 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Apr 15, 2016 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Apr 15, 2016 at 12:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Apr 14, 2016 at 8:41 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Apr 14, 2016 at 5:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 14, 2016 at 1:22 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
What do you think?

Look good to me. +1.

+1 from me.

And so here we go...

+ ok($test_passed, $msg);

ISTM that this change prevents the test from outputting the difference
of expected and actual results when the test fails. Which would make
the diagnosis of the test failure more difficult, I'm afraid.

Well, then, it is just a matter of saving the result in a variable
defined out of the loop, and use is() for the test. This way, after
the timeout it is possible to check if the expected result and the
fetched result match properly or not. In other words see attached.

The patch looks good to me.

+ my $timeout_max = 30;

One comment is; isn't 30 (seconds) too large value for the timeout?
What about just, say, 5?

hamster can become easily quite busy to be honest.

Okay, since the test would fail very rarely (I hope),
probably we can live with 30 seconds timeout (it happens
only when the test fails).

I pushed your latest patch. Thanks!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers