pg_regress: promptly detect failed postmaster startup

Started by Noah Mischover 7 years ago2 messageshackers
Jump to latest
#1Noah Misch
noah@leadboat.com

When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
fails startup, we detect that with "pg_regress: postmaster did not respond
within 60 seconds". pg_regress has a kill(postmaster_pid, 0) intended to
detect this case faster. Since kill(ZOMBIE-PID, 0) succeeds[1]Search for "zombie" in http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html, that test is
ineffective. The fix, attached, is to instead test waitpid(), like pg_ctl's
wait_for_postmaster() does.

[1]: Search for "zombie" in http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html

Attachments:

pg_regress-dead-postmaster-v1.patchtext/x-diff; charset=us-asciiDownload+1-1
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#1)
Re: pg_regress: promptly detect failed postmaster startup

Noah Misch <noah@leadboat.com> writes:

When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
fails startup, we detect that with "pg_regress: postmaster did not respond
within 60 seconds". pg_regress has a kill(postmaster_pid, 0) intended to
detect this case faster. Since kill(ZOMBIE-PID, 0) succeeds[1], that test is
ineffective.

Ooops.

The fix, attached, is to instead test waitpid(), like pg_ctl's
wait_for_postmaster() does.

+1. This leaves postmaster_pid as a dangling pointer, but since
we just exit immediately, that seems fine. (If we continued, and
arrived at the "kill(postmaster_pid, SIGKILL)" below, it would not
be fine.)

regards, tom lane