pg_regress: promptly detect failed postmaster startup
When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
fails startup, we detect that with "pg_regress: postmaster did not respond
within 60 seconds". pg_regress has a kill(postmaster_pid, 0) intended to
detect this case faster. Since kill(ZOMBIE-PID, 0) succeeds[1]Search for "zombie" in http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html, that test is
ineffective. The fix, attached, is to instead test waitpid(), like pg_ctl's
wait_for_postmaster() does.
[1]: Search for "zombie" in http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
Attachments:
pg_regress-dead-postmaster-v1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 63fe689..2c46941 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2414,7 +2414,7 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
* Fail immediately if postmaster has exited
*/
#ifndef WIN32
- if (kill(postmaster_pid, 0) != 0)
+ if (waitpid(postmaster_pid, NULL, WNOHANG) == postmaster_pid)
#else
if (WaitForSingleObject(postmaster_pid, 0) == WAIT_OBJECT_0)
#endif
Noah Misch <noah@leadboat.com> writes:
When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
fails startup, we detect that with "pg_regress: postmaster did not respond
within 60 seconds". pg_regress has a kill(postmaster_pid, 0) intended to
detect this case faster. Since kill(ZOMBIE-PID, 0) succeeds[1], that test is
ineffective.
Ooops.
The fix, attached, is to instead test waitpid(), like pg_ctl's
wait_for_postmaster() does.
+1. This leaves postmaster_pid as a dangling pointer, but since
we just exit immediately, that seems fine. (If we continued, and
arrived at the "kill(postmaster_pid, SIGKILL)" below, it would not
be fine.)
regards, tom lane