pg_regress: promptly detect failed postmaster startup

Started by Noah Mischabout 7 years ago2 messages
#1Noah Misch
noah@leadboat.com
1 attachment(s)

When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
fails startup, we detect that with "pg_regress: postmaster did not respond
within 60 seconds". pg_regress has a kill(postmaster_pid, 0) intended to
detect this case faster. Since kill(ZOMBIE-PID, 0) succeeds[1]Search for "zombie" in http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html, that test is
ineffective. The fix, attached, is to instead test waitpid(), like pg_ctl's
wait_for_postmaster() does.

[1]: Search for "zombie" in http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html

Attachments:

pg_regress-dead-postmaster-v1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 63fe689..2c46941 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2414,7 +2414,7 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 			 * Fail immediately if postmaster has exited
 			 */
 #ifndef WIN32
-			if (kill(postmaster_pid, 0) != 0)
+			if (waitpid(postmaster_pid, NULL, WNOHANG) == postmaster_pid)
 #else
 			if (WaitForSingleObject(postmaster_pid, 0) == WAIT_OBJECT_0)
 #endif
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#1)
Re: pg_regress: promptly detect failed postmaster startup

Noah Misch <noah@leadboat.com> writes:

When "make check TEMP_CONFIG=<(echo break_me=on)" spawns a postmaster that
fails startup, we detect that with "pg_regress: postmaster did not respond
within 60 seconds". pg_regress has a kill(postmaster_pid, 0) intended to
detect this case faster. Since kill(ZOMBIE-PID, 0) succeeds[1], that test is
ineffective.

Ooops.

The fix, attached, is to instead test waitpid(), like pg_ctl's
wait_for_postmaster() does.

+1. This leaves postmaster_pid as a dangling pointer, but since
we just exit immediately, that seems fine. (If we continued, and
arrived at the "kill(postmaster_pid, SIGKILL)" below, it would not
be fine.)

regards, tom lane