pgsql: Improve stability of TAP test for synchronous replication

Started by Michael Paquieralmost 7 years ago3 messagescomitters
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Improve stability of TAP test for synchronous replication

Slow buildfarm machines have run into issues with this TAP test caused
by a race condition related to the startup of a set of standbys, where
it is possible to finish with an unexpected order in the WAL sender
array of the primary.

This closes the race condition by making sure that any standby started
is registered into the WAL sender array of the primary before starting
the next one based on lookups of pg_stat_replication.

Backpatch down to 9.6 where the test has been introduced.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Noah Misch
Discussion: /messages/by-id/20190617055145.GB18917@paquier.xyz
Backpatch-through: 9.6

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7d81bdc8c0ce838efa248928065e9b2da829f981

Modified Files
--------------
src/test/recovery/t/007_sync_rep.pl | 42 +++++++++++++++++++++++++++++--------
1 file changed, 33 insertions(+), 9 deletions(-)

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#1)
Re: pgsql: Improve stability of TAP test for synchronous replication

On 7/23/19 9:55 PM, Michael Paquier wrote:

Improve stability of TAP test for synchronous replication

Slow buildfarm machines have run into issues with this TAP test caused
by a race condition related to the startup of a set of standbys, where
it is possible to finish with an unexpected order in the WAL sender
array of the primary.

This closes the race condition by making sure that any standby started
is registered into the WAL sender array of the primary before starting
the next one based on lookups of pg_stat_replication.

Backpatch down to 9.6 where the test has been introduced.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera, Noah Misch
Discussion: /messages/by-id/20190617055145.GB18917@paquier.xyz
Backpatch-through: 9.6

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/7d81bdc8c0ce838efa248928065e9b2da829f981

Modified Files
--------------
src/test/recovery/t/007_sync_rep.pl | 42 +++++++++++++++++++++++++++++--------
1 file changed, 33 insertions(+), 9 deletions(-)

This broke our perl coding rules:

./src/test/recovery/t/007_sync_rep.pl: Subroutine "start_standby_and_wait" does not end with "return" at line 33, column 1. See page 197 of PBP. (Severity: 5)

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#2)
Re: pgsql: Improve stability of TAP test for synchronous replication

On Wed, Jul 24, 2019 at 02:17:14PM -0400, Andrew Dunstan wrote:

This broke our perl coding rules:

./src/test/recovery/t/007_sync_rep.pl: Subroutine
"start_standby_and_wait" does not end with "return" at line 33,
column 1. See page 197 of PBP. (Severity: 5)

Fixed, thanks. Indeed I can see that pgperlcritic complains here, and
I have added a call in my pre-commit scripts for the future.
--
Michael