pgsql: Add regression test for recovery pause.
Add regression test for recovery pause.
Previously there was no regression test for recovery pause feature.
This commit adds the test that checks
- recovery can be paused or resumed expectedly
- pg_get_wal_replay_pause_state() reports the correct pause state
- the paused state ends and promotion continues if a promotion
is triggered while recovery is paused
Suggested-by: Michael Paquier
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi, Dilip Kumar
Discussion: /messages/by-id/YKNirzqM1HYyk5h4@paquier.xyz
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/6bbc5c5e96b08f6b8c7d28d10ed8dfe6c49dca30
Modified Files
--------------
src/test/recovery/t/005_replay_delay.pl | 59 +++++++++++++++++++++++++++++++--
1 file changed, 57 insertions(+), 2 deletions(-)
Fujii Masao <fujii@postgresql.org> writes:
Add regression test for recovery pause.
Buildfarm member jacana doesn't like this patch:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44
the symptom being
Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query:
Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # expecting this output:
Jun 02 09:05:17 # t
Jun 02 09:05:17 # last actual query output:
Jun 02 09:05:17 #
Jun 02 09:05:17 # with stderr:
Jun 02 09:05:17 # ERROR: syntax error at or near "pg_lsn"
Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # ^
Checking the postmaster log confirms that what the backend is getting is
2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR: syntax error at or near "pg_lsn" at character 20
2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()
It sort of looks like something has decided that the pg_lsn constant
is a search path and made a lame attempt to convert it to Windows
style. I doubt our own code is doing that, so I'm inclined to blame
IPC::Run thinking it can mangle the command string it's given.
I wonder whether jacana has got a freshly-installed version of IPC::Run.
Another interesting question is how come we managed to get this far
in the tests. There is a nearly, but not quite, identical delay
query in 002_archiving.pl, which already ran successfully:
# Wait until necessary replay has been done on standby
my $caughtup_query =
"SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
$node_standby->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for standby to catch up";
I wonder whether the fact that 002 uses '<=' not '<' could be
at all related. (I also wonder which one is correct as a means
of waiting for replay; they are not both correct.)
In any case, letting IPC::Run munge SQL commands seems completely
unacceptable. We can't plan on working around that every time.
regards, tom lane
On 6/2/21 5:26 PM, Tom Lane wrote:
Fujii Masao <fujii@postgresql.org> writes:
Add regression test for recovery pause.
Buildfarm member jacana doesn't like this patch:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44
the symptom being
Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query:
Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # expecting this output:
Jun 02 09:05:17 # t
Jun 02 09:05:17 # last actual query output:
Jun 02 09:05:17 #
Jun 02 09:05:17 # with stderr:
Jun 02 09:05:17 # ERROR: syntax error at or near "pg_lsn"
Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # ^Checking the postmaster log confirms that what the backend is getting is
2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR: syntax error at or near "pg_lsn" at character 20
2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()It sort of looks like something has decided that the pg_lsn constant
is a search path and made a lame attempt to convert it to Windows
style. I doubt our own code is doing that, so I'm inclined to blame
IPC::Run thinking it can mangle the command string it's given.
I wonder whether jacana has got a freshly-installed version of IPC::Run.Another interesting question is how come we managed to get this far
in the tests. There is a nearly, but not quite, identical delay
query in 002_archiving.pl, which already ran successfully:# Wait until necessary replay has been done on standby
my $caughtup_query =
"SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
$node_standby->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for standby to catch up";I wonder whether the fact that 002 uses '<=' not '<' could be
at all related. (I also wonder which one is correct as a means
of waiting for replay; they are not both correct.)In any case, letting IPC::Run munge SQL commands seems completely
unacceptable. We can't plan on working around that every time.
Looks to me like we're getting munged by the msys shell, and unlike on
msys2 there isn't a way to disable it:
https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line
c.f. commit 73ff3a0abbb
Maybe a robust solution would be to have the query piped to psql on its
stdin rather than on the command line. poll_query_until looks on a quick
check like the only place in PostgresNode where we use "psql -c"
I'll experiment a bit tomorrow.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 6/2/21 6:25 PM, Andrew Dunstan wrote:
Looks to me like we're getting munged by the msys shell, and unlike on
msys2 there isn't a way to disable it:
https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-linec.f. commit 73ff3a0abbb
Maybe a robust solution would be to have the query piped to psql on its
stdin rather than on the command line. poll_query_until looks on a quick
check like the only place in PostgresNode where we use "psql -c"I'll experiment a bit tomorrow.
My suspicion was correct. Fix pushed.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
My suspicion was correct. Fix pushed.
Great, thanks.
Do we need to worry about back-patching that? It seems only
accidental if no existing back-branch test cases hit this.
regards, tom lane
On 6/3/21 4:45 PM, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
My suspicion was correct. Fix pushed.
Great, thanks.
Do we need to worry about back-patching that? It seems only
accidental if no existing back-branch test cases hit this.
Well, we haven't had breakage, but its also useful to keep things in
sync as much as possible. Ill do it shortly.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com