pgsql: Unbreak recovery test on Windows

Started by Andrew Dunstanover 5 years ago6 messagescomitters
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

Unbreak recovery test on Windows

On Windows we need to send explicit quit messages to psql or the TAP tests
can hang.

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/677271a3a125e294b33b891669f594a2c8cb36ce

Modified Files
--------------
src/test/recovery/t/022_crash_temp_files.pl | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: pgsql: Unbreak recovery test on Windows

Andrew Dunstan <andrew@dunslane.net> writes:

Unbreak recovery test on Windows

Hmm, looks like this broke things on other machines.
crake and sifaka (so far) are showing

# Running: pg_ctl kill KILL 2237831
ok 3 - killed process with KILL
ack Broken pipe: write( 13, '\\q
' ) at /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

which looks like a race condition: did we manage to stuff the "\q"
command down the pipe before killing the psql process, or not?

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: pgsql: Unbreak recovery test on Windows

On 3/21/21 12:38 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Unbreak recovery test on Windows

Hmm, looks like this broke things on other machines.
crake and sifaka (so far) are showing

# Running: pg_ctl kill KILL 2237831
ok 3 - killed process with KILL
ack Broken pipe: write( 13, '\\q
' ) at /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

which looks like a race condition: did we manage to stuff the "\q"
command down the pipe before killing the psql process, or not?

Ugh.  Is there any reason we need to do those kills before we end the
psql processes? If not I'm tempted just to move them - after the psql's
are finished it should be safe. At any rate I'll go and test that.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#3)
Re: pgsql: Unbreak recovery test on Windows

On 3/21/21 1:06 PM, Andrew Dunstan wrote:

On 3/21/21 12:38 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Unbreak recovery test on Windows

Hmm, looks like this broke things on other machines.
crake and sifaka (so far) are showing

# Running: pg_ctl kill KILL 2237831
ok 3 - killed process with KILL
ack Broken pipe: write( 13, '\\q
' ) at /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

which looks like a race condition: did we manage to stuff the "\q"
command down the pipe before killing the psql process, or not?

Ugh.  Is there any reason we need to do those kills before we end the
psql processes? If not I'm tempted just to move them - after the psql's
are finished it should be safe. At any rate I'll go and test that.

Well, no good deed goes unpunished. That makes things hang on my Linux
box. Maybe we should just send the \q on Windows.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: pgsql: Unbreak recovery test on Windows

Andrew Dunstan <andrew@dunslane.net> writes:

Ugh.  Is there any reason we need to do those kills before we end the
psql processes? If not I'm tempted just to move them - after the psql's
are finished it should be safe. At any rate I'll go and test that.

IIUC, that'd completely destroy the point of the test, which is to make
sure we clean up when a backend dies unexpectedly. If we allow the
psql's to exit then the backend might be able to complete normal cleanup
before we can SIGKILL it.

Probably the best thing for now is to revert your change and instead
just skip the whole test script on Windows.

regards, tom lane

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: pgsql: Unbreak recovery test on Windows

On 3/21/21 2:16 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Ugh.  Is there any reason we need to do those kills before we end the
psql processes? If not I'm tempted just to move them - after the psql's
are finished it should be safe. At any rate I'll go and test that.

IIUC, that'd completely destroy the point of the test, which is to make
sure we clean up when a backend dies unexpectedly. If we allow the
psql's to exit then the backend might be able to complete normal cleanup
before we can SIGKILL it.

Probably the best thing for now is to revert your change and instead
just skip the whole test script on Windows.

OK, will do that.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com