pg_ctl start may return 0 even if the postmaster has been already started on Windows

Started by Hayato Kuroda (Fujitsu)over 2 years ago41 messages
#1Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com

Dear hackers,

While investigating the cfbot failure [1]https://cirrus-ci.com/task/4634769732927488, I found a strange behavior of pg_ctl
command. How do you think? Is this a bug to be fixed or in the specification?

# Problem

The "pg_ctl start" command returns 0 (succeeded) even if the cluster has
already been started. This occurs on Windows environment, and when the command
is executed just after postmaster starts.

# Analysis

The primal reason is in wait_for_postmaster_start(). In this function the
postmaster.pid file is read and checked whether the start command is
successfully done or not.

Check (1) requires that the postmaster must be started after the our pg_ctl
command, but 2 seconds delay is accepted.

In the linux mode, the check (2) is also executed to ensures that the forked
process modified the file, so this time window is not so problematic.
But in the windows system, (2) is ignored, *so the pg_ctl command may be
succeeded if the postmaster is started within 2 seconds.*

```
if ((optlines = readfile(pid_file, &numlines)) != NULL &&
numlines >= LOCK_FILE_LINE_PM_STATUS)
{
/* File is complete enough for us, parse it */
pid_t pmpid;
time_t pmstart;

/*
* Make sanity checks. If it's for the wrong PID, or the recorded
* start time is before pg_ctl started, then either we are looking
* at the wrong data directory, or this is a pre-existing pidfile
* that hasn't (yet?) been overwritten by our child postmaster.
* Allow 2 seconds slop for possible cross-process clock skew.
*/
pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
if (pmstart >= start_time - 2 && // (1)
#ifndef WIN32
pmpid == pm_pid // (2)
#else
/* Windows can only reject standalone-backend PIDs */
pmpid > 0
#endif

```

# Appendix - how do I found?

I found it while investigating the failure. In the test "pg_upgrade --check"
is executed just after old cluster has been started. I checked the output file [2]https://api.cirrus-ci.com/v1/artifact/task/4634769732927488/testrun/build/testrun/pg_upgrade/003_logical_replication_slots/data/t_003_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20230905T080645.548/log/pg_upgrade_internal.log
and found that the banner says "Performing Consistency Checks", which meant that
the parameter live_check was set to false (see output_check_banner()). This
parameter is set to true when the postmaster has been started at that time and
the pg_ctl start fails. That's how I find.

[1]: https://cirrus-ci.com/task/4634769732927488
[2]: https://api.cirrus-ci.com/v1/artifact/task/4634769732927488/testrun/build/testrun/pg_upgrade/003_logical_replication_slots/data/t_003_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20230905T080645.548/log/pg_upgrade_internal.log

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#2Michael Paquier
michael@paquier.xyz
In reply to: Hayato Kuroda (Fujitsu) (#1)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Thu, Sep 07, 2023 at 07:07:36AM +0000, Hayato Kuroda (Fujitsu) wrote:

# Problem

The "pg_ctl start" command returns 0 (succeeded) even if the cluster has
already been started. This occurs on Windows environment, and when the command
is executed just after postmaster starts.

Not failing on `pg_ctl start` if the command is run on a data folder
that has already been started previously by a different command with a
postmaster still alive feels like cheating, because pg_ctl is lying
about its result. If pg_ctl wants to start a cluster but is not able
to do it, either because the postmaster failed at startup or because
the cluster has already started, it should report a failure. Now, I
also recall that the processes spawned by pg_ctl on Windows make the
status handling rather tricky to reason about..
--
Michael

#3Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Michael Paquier (#2)
1 attachment(s)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Michael,

Thank you for replying!

Not failing on `pg_ctl start` if the command is run on a data folder
that has already been started previously by a different command with a
postmaster still alive feels like cheating, because pg_ctl is lying
about its result. If pg_ctl wants to start a cluster but is not able
to do it, either because the postmaster failed at startup or because
the cluster has already started, it should report a failure.

I have a same feelings as you. Users may use the return code in their batch file
and they may decide what to do based on the wrong status. Reporting the status
more accurately is nice.

My first idea is that to move the checking part to above, but this may not handle
the case the postmaster is still alive (now sure this is real issue). Do we have to
add a new indicator which ensures the identity of processes for windows?
Please tell me how you feel.

Now, I
also recall that the processes spawned by pg_ctl on Windows make the
status handling rather tricky to reason about..

Did you say about the below comment? Currently I have no idea to make
codes more proper, sorry.

```
* On Windows, we may be checking the postmaster's parent shell, but
* that's fine for this purpose.
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

move_checking.patchapplication/octet-stream; name=move_checking.patchDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..07b3f0c9c8 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -599,6 +599,25 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 		char	  **optlines;
 		int			numlines;
 
+		/*
+		 * Check whether the child postmaster process is still alive.  This
+		 * lets us exit early if the postmaster fails during startup.
+		 *
+		 * On Windows, we may be checking the postmaster's parent shell, but
+		 * that's fine for this purpose.
+		 */
+#ifndef WIN32
+		{
+			int			exitstatus;
+
+			if (waitpid(pm_pid, &exitstatus, WNOHANG) == pm_pid)
+				return POSTMASTER_FAILED;
+		}
+#else
+		if (WaitForSingleObject(postmasterProcess, 0) == WAIT_OBJECT_0)
+			return POSTMASTER_FAILED;
+#endif
+
 		/*
 		 * Try to read the postmaster.pid file.  If it's not valid, or if the
 		 * status line isn't there yet, just keep waiting.
@@ -619,6 +638,7 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
+
 			if (pmstart >= start_time - 2 &&
 #ifndef WIN32
 				pmpid == pm_pid
@@ -651,25 +671,6 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 		 */
 		free_readfile(optlines);
 
-		/*
-		 * Check whether the child postmaster process is still alive.  This
-		 * lets us exit early if the postmaster fails during startup.
-		 *
-		 * On Windows, we may be checking the postmaster's parent shell, but
-		 * that's fine for this purpose.
-		 */
-#ifndef WIN32
-		{
-			int			exitstatus;
-
-			if (waitpid(pm_pid, &exitstatus, WNOHANG) == pm_pid)
-				return POSTMASTER_FAILED;
-		}
-#else
-		if (WaitForSingleObject(postmasterProcess, 0) == WAIT_OBJECT_0)
-			return POSTMASTER_FAILED;
-#endif
-
 		/* Startup still in process; wait, printing a dot once per second */
 		if (i % WAITS_PER_SEC == 0)
 		{
#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#3)
1 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Thu, 7 Sep 2023 10:53:41 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in

My first idea is that to move the checking part to above, but this may not handle
the case the postmaster is still alive (now sure this is real issue). Do we have to
add a new indicator which ensures the identity of processes for windows?
Please tell me how you feel.

It doesn't seem to work as expected. We still lose the relationship
between the PID file and the launched postmaster.

Now, I
also recall that the processes spawned by pg_ctl on Windows make the
status handling rather tricky to reason about..

Did you say about the below comment? Currently I have no idea to make
codes more proper, sorry.

```
* On Windows, we may be checking the postmaster's parent shell, but
* that's fine for this purpose.
```

Ditching cmd.exe seems like a big hassle. So, on the flip side, I
tried to identify the postmaster PID using the shell's PID, and it
seem to work. The APIs used are avaiable from XP/2003 onwards.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

pg_ctl_waits_for_postmasters_pid_on_windows.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..d54cc6ab54 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -32,6 +32,9 @@
 #ifdef WIN32					/* on Unix, we don't need libpq */
 #include "pqexpbuffer.h"
 #endif
+#ifdef WIN32
+#include <tlhelp32.h>
+#endif
 
 
 typedef enum
@@ -425,6 +428,91 @@ free_readfile(char **optlines)
  * start/test/stop routines
  */
 
+#ifdef WIN32
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ */
+DWORD
+win32_find_postmaster_pid(int shell_pid)
+{
+	HANDLE hSnapshot;
+	DWORD flags;
+	DWORD p_pid = shell_pid;
+	PROCESSENTRY32 ppe;
+	DWORD pm_pid = 0;
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* Create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, shell_pid);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		return;
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08x\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid > 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08x\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: lancher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* Check if the process is gone for any reason */
+	if (!shell_exists)
+	{
+		/* Report the condition as server start failure */
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
+#endif
+
 /*
  * Start the postmaster and return its PID.
  *
@@ -609,7 +697,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = win32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,21 +711,15 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
 				 * status line (this assumes a v10 or later server).
 				 */
 				char	   *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
-
+				
 				if (strcmp(pmstatus, PM_STATUS_READY) == 0 ||
 					strcmp(pmstatus, PM_STATUS_STANDBY) == 0)
 				{
#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#4)
1 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Fri, 08 Sep 2023 14:17:16 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

Ditching cmd.exe seems like a big hassle. So, on the flip side, I
tried to identify the postmaster PID using the shell's PID, and it
seem to work. The APIs used are avaiable from XP/2003 onwards.

Cleaned it up a bit.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

pg_ctl_waits_for_postmasters_pid_on_windows_2.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..25d3354dc6 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -32,6 +32,9 @@
 #ifdef WIN32					/* on Unix, we don't need libpq */
 #include "pqexpbuffer.h"
 #endif
+#ifdef WIN32
+#include <tlhelp32.h>
+#endif
 
 
 typedef enum
@@ -425,6 +428,90 @@ free_readfile(char **optlines)
  * start/test/stop routines
  */
 
+#ifdef WIN32
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ */
+DWORD
+win32_find_postmaster_pid(int shell_pid)
+{
+	HANDLE hSnapshot;
+	DWORD flags;
+	DWORD p_pid = shell_pid;
+	PROCESSENTRY32 ppe;
+	DWORD pm_pid = 0;
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* Create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, shell_pid);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		return;
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08x\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid > 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08x\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: launcher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+	{
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
+#endif
+
 /*
  * Start the postmaster and return its PID.
  *
@@ -609,7 +696,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = win32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +710,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
#6Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#5)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Hoiguchi-san,

Thank you for making the patch!

It doesn't seem to work as expected. We still lose the relationship
between the PID file and the launched postmaster.

Yes, I did not expect that the relationship can be kept.
Conceptually +1 for your approach.

Ditching cmd.exe seems like a big hassle. So, on the flip side, I
tried to identify the postmaster PID using the shell's PID, and it
seem to work. The APIs used are avaiable from XP/2003 onwards.

According to 495ed0ef2, Windows 10 seems the minimal requirement for using
the postgres. So the approach seems OK.

Followings are my comment, but I can say only cosmetic ones because I do not have
windows machine which can run postgres.

1.
Forward declaration seems missing. In the pg_ctl.c, the static function seems to
be declared even if there is only one caller (c.f., GetPrivilegesToDelete).

2.
I think the argument should be pid_t.

3.
I'm not sure the return type of the function should be pid_t or not. According
to the document, DWORD corrresponds to the pid_t. In win32_port.h, the pid_t is
defiend as int (_MSC_VER seems to be defined when the VisualStduio is used). It
is harmless, but I perfer to match the interface between caller/callee. IIUC we
can add just a cast.

```
#ifdef _MSC_VER
typedef int pid_t;
#endif
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#6)
1 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Fri, 8 Sep 2023 08:02:57 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in

Ditching cmd.exe seems like a big hassle. So, on the flip side, I
tried to identify the postmaster PID using the shell's PID, and it
seem to work. The APIs used are avaiable from XP/2003 onwards.

According to 495ed0ef2, Windows 10 seems the minimal requirement for using
the postgres. So the approach seems OK.

Followings are my comment, but I can say only cosmetic ones because I do not have
windows machine which can run postgres.

Thank you for the comment!

1.
Forward declaration seems missing. In the pg_ctl.c, the static function seems to
be declared even if there is only one caller (c.f., GetPrivilegesToDelete).

Agreed.

2.
I think the argument should be pid_t.

Yeah, I didn't linger on that detail earlier. But revisiting it, I
coucur it is best suited since it is a local function in
pg_ctl.c. I've now positioned it at the end of a WIN32 section
defining other win32-specific functions. Hence, a forward declaration
became necessary:p

3.
I'm not sure the return type of the function should be pid_t or not. According
to the document, DWORD corrresponds to the pid_t. In win32_port.h, the pid_t is
defiend as int (_MSC_VER seems to be defined when the VisualStduio is used). It
is harmless, but I perfer to match the interface between caller/callee. IIUC we
can add just a cast.

For the reason previously stated, I've adjusted the type for both the
parameter and the return value to pid_t. start_postmaster() already
assumed that pid_t is wider than DWORD.

I noticed that PID 0 is valid on Windows. However, it is consistently
the PID for the system idle process, so it can't be associated with
cmd.exe or postgres. I've added a comment noting that premise. Also I
did away with an unused variable. For the CreateToolhelp32Snapshot
function, I changed the second parameter to 0 from shell_pid, since it
is not used when using TH32CS_SNAPPROCESS. I changed the comparison
operator for pid_t from > to !=, ensuring correct behavior even with
negative values.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

pg_ctl_waits_for_postmasters_pid_on_windows_2.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..68ab83779c 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +625,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1950,90 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	DWORD flags;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		return;
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08x\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		else if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08x\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: launcher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+	{
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
#8Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#7)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Horiguchi-san,

I have tested your patch on my CI, but several test could not patch with error:
"pg_ctl: launcher shell executed multiple processes".

I added the thread to next CF entry, so let's see the how cfbot says.

[1]: https://commitfest.postgresql.org/45/4573/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#9Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Hayato Kuroda (Fujitsu) (#8)
1 attachment(s)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Horiguchi-san,

I added the thread to next CF entry, so let's see the how cfbot says.

At least there are several compiler warnings. E.g.,

* pgwin32_find_postmaster_pid() has "return;", but IIUC it should be "exit(1)"
* When DWORD is printed, "%lx" should be used.
* The variable "flags" seems not needed.

Here is a patch which suppresses warnings, whereas test would fail...
You can use it if acceptable.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

pg_ctl_waits_for_postmasters_pid_on_windows_3.patchapplication/octet-stream; name=pg_ctl_waits_for_postmasters_pid_on_windows_3.patchDownload
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..82fa70972c 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +625,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1950,89 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08lx\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/* iterate over the snapshot */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		else if (ppe.th32ParentProcessID == shell_pid)
+		{
+			if (pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08lx\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: launcher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+	{
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
#10Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#9)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Tue, 19 Sep 2023 13:48:55 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in

Dear Horiguchi-san,

I added the thread to next CF entry, so let's see the how cfbot says.

At least there are several compiler warnings. E.g.,

* pgwin32_find_postmaster_pid() has "return;", but IIUC it should be "exit(1)"
* When DWORD is printed, "%lx" should be used.
* The variable "flags" seems not needed.

Yeah, I thought that they all have been fixed but.. you are right in
every respect.

Here is a patch which suppresses warnings, whereas test would fail...
You can use it if acceptable.

I was able to see the trouble in the CI environment, but not
locally. I'll delve deeper into this. Thanks you for bringing it to my
attention.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#10)
2 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Wed, 20 Sep 2023 14:18:41 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

I was able to see the trouble in the CI environment, but not
locally. I'll delve deeper into this. Thanks you for bringing it to my
attention.

I found two instances with multiple child processes.

# child-pid / parent-pid / given-pid : exec name
process parent PID child PID target PID exec file
shell 1228 6472 1228 cmd.exe
child 5184 1228 1228 cmd.exe
child 6956 1228 1228 postgres.exe

launcher shell executed multiple processes

process parent PID child PID target PID exec file
shell 4296 5880 4296 cmd.exe
child 5156 4296 4296 agent.exe
child 5640 4296 4296 postgres.exe

launcher shell executed multiple processes

It looks like the environment has autorun setups for cmd.exe. There's
another known issue related to auto-launching chcp at
startup. Ideally, we would avoid such behavior in the
postmaster-launcher shell. I think we should add "/D" flag to cmd.exe
command line, perhaps in a separate patch.

Even after making that change, I still see something being launched from the launcher cmd.exe...

process parent PID child PID target PID exec file
shell 2784 6668 2784 cmd.exe
child 6140 2784 2784 MicrosoftEdgeUpdate.exe
child 6260 2784 2784 postgres.exe

launcher shell executed multiple processes

I'm not sure what triggers this; perhaps some other kind of hooks? If
we cannot avoid this behavior, we'll have to verify the executable
file name. It should be fine, given that the file name is constant,
but I'm not fully convinced that this is the ideal solution.

Another issue is.. that I haven't been able to cause the false
positive of pg_ctl start.. Do you have a concise reproducer of the
issue?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Disable-autoruns-for-cmd.exe-on-Windows.patchtext/x-patch; charset=us-asciiDownload
From c6de7b541be55b01bbd0d2e8e8d95aac6799a82c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 21 Sep 2023 16:54:33 +0900
Subject: [PATCH 1/3] Disable autoruns for cmd.exe on Windows

On Windows, we use cmd.exe to launch the postmaster process for easy
redirection setup. However, cmd.exe may execute other programs at
startup due to autorun configurations. This patch adds /D flag to the
launcher cmd.exe command line to disable autorun settings written in
the registry.
---
 src/bin/pg_ctl/pg_ctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..3ac2fcc004 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -553,11 +553,11 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, &pi, false))
-- 
2.39.3

0002-Improve-pg_ctl-postmaster-process-check-on-Windows.patchtext/x-patch; charset=us-asciiDownload
From f1484974f5bcf14d5ac3c6a702dcb02d0eb45f9f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 20 Sep 2023 13:16:35 +0900
Subject: [PATCH 2/3] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 109 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 100 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 3ac2fcc004..ed1b0c43fc 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +625,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1950,97 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08lx\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/*
+	 * Iterate over the snapshot
+	 *
+	 * Check for shell existence and duplicate processes for reliability.
+	 *
+	 * The launcher shell may start other instances of cmd.exe or programs
+	 * besides postgres.exe. It's important to verify the program file name.
+	 */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		else if (ppe.th32ParentProcessID == shell_pid &&
+				 strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08lx\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: launcher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+	{
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
-- 
2.39.3

#12Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#11)
3 attachment(s)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Horiguchi-san,

Thank you for making a patch! They can pass ci.
I'm still not sure what should be, but I can respond a part.

Another issue is.. that I haven't been able to cause the false
positive of pg_ctl start.. Do you have a concise reproducer of the
issue?

I found a short sleep in pg_ctl/t/001_start_stop.pl. This was introduced in
6bcce2580 to ensure waiting more than 2 seconds. I've tested on my CI and
found that removing the sleep can trigger the failure. Also, I confirmed your patch
fixes the problem. PSA the small patch for cfbot. 0001 and 0002 were not changed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2-0001-Disable-autoruns-for-cmd.exe-on-Windows.patchapplication/octet-stream; name=v2-0001-Disable-autoruns-for-cmd.exe-on-Windows.patchDownload
From 2ef4825cac9013321cfa39b3271622a059b6410c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 21 Sep 2023 16:54:33 +0900
Subject: [PATCH v2 1/3] Disable autoruns for cmd.exe on Windows

On Windows, we use cmd.exe to launch the postmaster process for easy
redirection setup. However, cmd.exe may execute other programs at
startup due to autorun configurations. This patch adds /D flag to the
launcher cmd.exe command line to disable autorun settings written in
the registry.
---
 src/bin/pg_ctl/pg_ctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..3ac2fcc004 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -553,11 +553,11 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, &pi, false))
-- 
2.27.0

v2-0002-Improve-pg_ctl-postmaster-process-check-on-Window.patchapplication/octet-stream; name=v2-0002-Improve-pg_ctl-postmaster-process-check-on-Window.patchDownload
From f24c26bb1fcaa18481ad8041a29dd3db85a8cf30 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 20 Sep 2023 13:16:35 +0900
Subject: [PATCH v2 2/3] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 109 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 100 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 3ac2fcc004..ed1b0c43fc 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +625,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1950,97 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool shell_exists = false;
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08lx\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/*
+	 * Iterate over the snapshot
+	 *
+	 * Check for shell existence and duplicate processes for reliability.
+	 *
+	 * The launcher shell may start other instances of cmd.exe or programs
+	 * besides postgres.exe. It's important to verify the program file name.
+	 */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		else if (ppe.th32ParentProcessID == shell_pid &&
+				 strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08lx\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: launcher shell executed multiple processes\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+	{
+		write_stderr(_("%s: launcher shell died\n"), progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
-- 
2.27.0

v2-0003-Remove-short-sleep-from-001_start_stop.pl.patchapplication/octet-stream; name=v2-0003-Remove-short-sleep-from-001_start_stop.pl.patchDownload
From c8287327dedbd032aa86800cf7c2f5050313638b Mon Sep 17 00:00:00 2001
From: HayaKuro <activegalacticnuclei.grb@gmail.com>
Date: Fri, 22 Sep 2023 19:09:24 +0900
Subject: [PATCH v2 3/3] Remove short sleep from 001_start_stop.pl

Previous commits ensures reliable detection whether the postmaster has really
started or not, so the short sleep in a test is not needed anymore. It was
originally introduced by 6bcce258.
---
 src/bin/pg_ctl/t/001_start_stop.pl | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index f019fe1703..590a82338f 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -46,11 +46,6 @@ my $ctlcmd = [
 ];
 command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
 
-# sleep here is because Windows builds can't check postmaster.pid exactly,
-# so they may mistake a pre-existing postmaster.pid for one created by the
-# postmaster they start.  Waiting more than the 2 seconds slop time allowed
-# by wait_for_postmaster() prevents that mistake.
-sleep 3 if ($windows_os);
 command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ],
 	'second pg_ctl start fails');
 command_ok([ 'pg_ctl', 'stop', '-D', "$tempdir/data" ], 'pg_ctl stop');
-- 
2.27.0

#13Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#12)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Fri, 6 Oct 2023 at 11:38, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Horiguchi-san,

Thank you for making a patch! They can pass ci.
I'm still not sure what should be, but I can respond a part.

Another issue is.. that I haven't been able to cause the false
positive of pg_ctl start.. Do you have a concise reproducer of the
issue?

I found a short sleep in pg_ctl/t/001_start_stop.pl. This was introduced in
6bcce2580 to ensure waiting more than 2 seconds. I've tested on my CI and
found that removing the sleep can trigger the failure. Also, I confirmed your patch
fixes the problem. PSA the small patch for cfbot. 0001 and 0002 were not changed.

I have tested the patches on my windows setup.
I am trying to start two postgres servers with an interval of 5 secs.

with HEAD (when same server is started after an interval of 5 secs):
D:\project\pg\bin>pg_ctl -D ../data -l data2.log start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start.... stopped waiting
pg_ctl: could not start server
Examine the log output.

with Patch:(when same server is started after an interval of 5 secs)
D:\project\pg_dev\bin>pg_ctl -D ../data -l data2.log start
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....pg_ctl: launcher shell died

The output message after patch is different from the HEAD. I felt that
with patch as well we should get the message "pg_ctl: could not start
server".
Is this message change intentional?

Thanks,
Shlok Kumar Kyal

#14Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Shlok Kyal (#13)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Thank you for testing this!

At Fri, 6 Oct 2023 12:28:32 +0530, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote i> D:\project\pg_dev\bin>pg_ctl -D ../data -l data2.log start

pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....pg_ctl: launcher shell died

The output message after patch is different from the HEAD. I felt that
with patch as well we should get the message "pg_ctl: could not start
server".
Is this message change intentional?

Partly no, partly yes. My focus was on verifying the accuracy of
identifying the actual postmaster PID on Windows. The current patch
provides a detailed description of the events, primarily because I
lack a comprehensive understanding of both the behavior of Windows
APIs and the associated processes. Given that context, the messages
essentially serve debugging purposes.

I agree with your suggestion. Ultimately, if there's a possibility
for this to be committed, the message will be consolidated to "could
not start server".

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#14)
3 attachment(s)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Horiguchi-san, Shlok,

At Fri, 6 Oct 2023 12:28:32 +0530, Shlok Kyal <shlok.kyal.oss@gmail.com> wrote
i> D:\project\pg_dev\bin>pg_ctl -D ../data -l data2.log start

pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....pg_ctl: launcher shell died

The output message after patch is different from the HEAD. I felt that
with patch as well we should get the message "pg_ctl: could not start
server".
Is this message change intentional?

Partly no, partly yes. My focus was on verifying the accuracy of
identifying the actual postmaster PID on Windows. The current patch
provides a detailed description of the events, primarily because I
lack a comprehensive understanding of both the behavior of Windows
APIs and the associated processes. Given that context, the messages
essentially serve debugging purposes.

I agree with your suggestion. Ultimately, if there's a possibility
for this to be committed, the message will be consolidated to "could
not start server".

Based on the suggestion, I tried to update the patch.
A new argument is_valid is added for reporting callee. Also, reporting formats
are adjusted based on other functions. How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v3-0001-Disable-autoruns-for-cmd.exe-on-Windows.patchapplication/octet-stream; name=v3-0001-Disable-autoruns-for-cmd.exe-on-Windows.patchDownload
From 8e70c43087e05d9b5111f0200024dd77e64f8c2b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 21 Sep 2023 16:54:33 +0900
Subject: [PATCH v3 1/3] Disable autoruns for cmd.exe on Windows

On Windows, we use cmd.exe to launch the postmaster process for easy
redirection setup. However, cmd.exe may execute other programs at
startup due to autorun configurations. This patch adds /D flag to the
launcher cmd.exe command line to disable autorun settings written in
the registry.
---
 src/bin/pg_ctl/pg_ctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..3ac2fcc004 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -553,11 +553,11 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, &pi, false))
-- 
2.27.0

v3-0002-Improve-pg_ctl-postmaster-process-check-on-Window.patchapplication/octet-stream; name=v3-0002-Improve-pg_ctl-postmaster-process-check-on-Window.patchDownload
From 870bd8a117042ffe28818b9669448d2ff0984414 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 20 Sep 2023 13:16:35 +0900
Subject: [PATCH v3 2/3] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 107 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 99 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 3ac2fcc004..2563f15376 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid, bool *is_valid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,20 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			bool		is_valid;
+
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid, &is_valid);
 
+			/* Quick exit if something is wrong */
+			if (!is_valid)
+			{
+				free_readfile(optlines);
+				return POSTMASTER_FAILED;
+			}
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +634,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1959,88 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. If the shell executes several processes or the process
+ * is dead, is_valid flag is set to false.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid, bool *is_valid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool shell_exists = false;
+	DWORD last_error;
+	*is_valid = true;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: could not create a snapshot: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: cound not retrieve information about the process: error code %lu\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/*
+	 * Iterate over the snapshot.
+	 *
+	 * Check for shell existence and duplicate processes for reliability.
+	 *
+	 * The launcher shell may start other instances of cmd.exe or programs
+	 * besides postgres.exe. It's important to verify the program file name.
+	 */
+	do
+	{
+		if (ppe.th32ProcessID == shell_pid)
+			shell_exists = true;
+		else if (ppe.th32ParentProcessID == shell_pid &&
+				 strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			/* assuming the launching shell executes a single process */
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				*is_valid = false;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: cound not retrieve information about the process: error code %lu\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* check if the process is still alive */
+	if (!shell_exists)
+		*is_valid = false;
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
-- 
2.27.0

v3-0003-Remove-short-sleep-from-001_start_stop.pl.patchapplication/octet-stream; name=v3-0003-Remove-short-sleep-from-001_start_stop.pl.patchDownload
From 0175f0c50407f4d2e0529bc7c6e07d9102043354 Mon Sep 17 00:00:00 2001
From: HayaKuro <activegalacticnuclei.grb@gmail.com>
Date: Fri, 22 Sep 2023 19:09:24 +0900
Subject: [PATCH v3 3/3] Remove short sleep from 001_start_stop.pl

Previous commits ensures reliable detection whether the postmaster has really
started or not, so the short sleep in a test is not needed anymore. It was
originally introduced by 6bcce258.
---
 src/bin/pg_ctl/t/001_start_stop.pl | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index f019fe1703..590a82338f 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -46,11 +46,6 @@ my $ctlcmd = [
 ];
 command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
 
-# sleep here is because Windows builds can't check postmaster.pid exactly,
-# so they may mistake a pre-existing postmaster.pid for one created by the
-# postmaster they start.  Waiting more than the 2 seconds slop time allowed
-# by wait_for_postmaster() prevents that mistake.
-sleep 3 if ($windows_os);
 command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ],
 	'second pg_ctl start fails');
 command_ok([ 'pg_ctl', 'stop', '-D', "$tempdir/data" ], 'pg_ctl stop');
-- 
2.27.0

#16Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#15)
3 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Mon, 23 Oct 2023 08:57:19 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in

I agree with your suggestion. Ultimately, if there's a possibility
for this to be committed, the message will be consolidated to "could
not start server".

Based on the suggestion, I tried to update the patch.
A new argument is_valid is added for reporting callee. Also, reporting formats
are adjusted based on other functions. How do you think?

An equivalent check is already done shortly afterward in the calling
function. Therefore, we can simply remove the code path for "launcher
shell died", and it will work the same way. Please find the attached.

Other error cases will fit to "shouldn't occur under normal
conditions" errors.

There is a possibility that the launcher shell terminates while
postmaster is running. Even in such a case, the server continue
working without any problems. I contemplated accomodating this case
but the effort required seemed disproportionate to the possibility.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v4-0001-Disable-autoruns-for-cmd.exe-on-Windows.patchtext/x-patch; charset=us-asciiDownload
From 530dc21be72e4e7f5ea199a9fceee00bbb88cf75 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 11:43:57 +0900
Subject: [PATCH v4 1/3] Disable autoruns for cmd.exe on Windows

On Windows, we use cmd.exe to launch the postmaster process for easy
redirection setup. However, cmd.exe may execute other programs at
startup due to autorun configurations. This patch adds /D flag to the
launcher cmd.exe command line to disable autorun settings written in
the registry.
---
 src/bin/pg_ctl/pg_ctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..3ac2fcc004 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -553,11 +553,11 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, &pi, false))
-- 
2.39.3

v4-0002-Improve-pg_ctl-postmaster-process-check-on-Window.patchtext/x-patch; charset=us-asciiDownload
From 913a0fbe0937331fca6ea4099834ed3001714339 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 14:46:50 +0900
Subject: [PATCH v4 2/3] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 102 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 93 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 3ac2fcc004..9c0168b075 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +625,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1950,90 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+static pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed\n"),
+					 progname);
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: errcode=%08lx\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
+	/*
+	 * Iterate over the snapshot
+	 *
+	 * Check for duplicate processes to ensure reliability.
+	 *
+	 * The launcher shell might start other cmd.exe instances or programs
+	 * besides postgres.exe. Veryfying the program file name is essential.
+	 *
+	 * The launcher shell process isn't checked in this function.  It will be
+	 * checked by the caller.
+	 */
+	do
+	{
+		if (ppe.th32ParentProcessID == shell_pid &&
+			strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: errcode=%08lx\n"),
+					 progname, last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: multiple postmasters found\n"),
+					 progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
-- 
2.39.3

v4-0003-Remove-short-sleep-from-001_start_stop.pl.patchtext/x-patch; charset=us-asciiDownload
From 77229eb99fbf67013a153828f8845db9550fccc5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 14:47:24 +0900
Subject: [PATCH v4 3/3] Remove short sleep from 001_start_stop.pl

Previous commits ensures reliable detection whether the postmaster has really
started or not, so the short sleep in a test is not needed anymore. It was
originally introduced by 6bcce258.
---
 src/bin/pg_ctl/t/001_start_stop.pl | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index f019fe1703..590a82338f 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -46,11 +46,6 @@ my $ctlcmd = [
 ];
 command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
 
-# sleep here is because Windows builds can't check postmaster.pid exactly,
-# so they may mistake a pre-existing postmaster.pid for one created by the
-# postmaster they start.  Waiting more than the 2 seconds slop time allowed
-# by wait_for_postmaster() prevents that mistake.
-sleep 3 if ($windows_os);
 command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ],
 	'second pg_ctl start fails');
 command_ok([ 'pg_ctl', 'stop', '-D', "$tempdir/data" ], 'pg_ctl stop');
-- 
2.39.3

#17Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Kyotaro Horiguchi (#16)
RE: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Dear Horiguchi-san,

Thanks for updates! I was quite not sure the Windows env, but I can post comments.
(We need reviews by windows-friendly developers...)

Other error cases will fit to "shouldn't occur under normal
conditions" errors.

Formatting of messages for write_stderr() seem different from others. In v3,
I slightly modified for readability like below. I wanted to let you know just in case
because you did not say anything about these changes...

```
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: could not create a snapshot: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: cound not retrieve information about the process: error code %lu\n"),
+					 progname, GetLastError());
+		exit(1);
+	}
+
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

#18Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#17)
3 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Tue, 24 Oct 2023 07:37:22 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in

Dear Horiguchi-san,

Thanks for updates! I was quite not sure the Windows env, but I can post comments.
(We need reviews by windows-friendly developers...)

Indeed, I haven't managed to successfully build using Meson on
Windows...

Formatting of messages for write_stderr() seem different from others. In v3,
I slightly modified for readability like below. I wanted to let you know just in case
because you did not say anything about these changes...

Ah. Sorry, I was lazy about the messages because I didn't regard this
to be at that stage yet.

In the attached, fixed the existing two messages, and adjusted one
message to display an error code, all in the consistent format.

Thanks!

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v5-0001-Disable-autoruns-for-cmd.exe-on-Windows.patchtext/x-patch; charset=us-asciiDownload
From 65013b5ccebd101f91be0e46b5209dc9cfcc7d5f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 11:43:57 +0900
Subject: [PATCH v5 1/3] Disable autoruns for cmd.exe on Windows

On Windows, we use cmd.exe to launch the postmaster process for easy
redirection setup. However, cmd.exe may execute other programs at
startup due to autorun configurations. This patch adds /D flag to the
launcher cmd.exe command line to disable autorun settings written in
the registry.
---
 src/bin/pg_ctl/pg_ctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 807d7023a9..3ac2fcc004 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -553,11 +553,11 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+		cmd = psprintf("\"%s\" /D /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
 					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, &pi, false))
-- 
2.39.3

v5-0002-Improve-pg_ctl-postmaster-process-check-on-Window.patchtext/x-patch; charset=us-asciiDownload
From 18c9a319b5160721d08cc1d3a3df770a63756f65 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 14:46:50 +0900
Subject: [PATCH v5 2/3] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 102 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 93 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 3ac2fcc004..221049db0e 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -132,6 +132,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -142,6 +143,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -609,7 +611,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -619,14 +625,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1950,6 +1950,90 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+static pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		exit(1);
+	}
+
+	/*
+	 * Iterate over the snapshot
+	 *
+	 * Check for duplicate processes to ensure reliability.
+	 *
+	 * The launcher shell might start other cmd.exe instances or programs
+	 * besides postgres.exe. Veryfying the program file name is essential.
+	 *
+	 * The launcher shell process isn't checked in this function.  It will be
+	 * checked by the caller.
+	 */
+	do
+	{
+		if (ppe.th32ParentProcessID == shell_pid &&
+			strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: error code %lu\n"),
+					 progname, (unsigned long) last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: multiple postmasters found\n"),
+					 progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
-- 
2.39.3

v5-0003-Remove-short-sleep-from-001_start_stop.pl.patchtext/x-patch; charset=us-asciiDownload
From 5ee4384af0edef313452ebd8babd628ddbb64dc9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 14:47:24 +0900
Subject: [PATCH v5 3/3] Remove short sleep from 001_start_stop.pl

Previous commits ensures reliable detection whether the postmaster has really
started or not, so the short sleep in a test is not needed anymore. It was
originally introduced by 6bcce258.
---
 src/bin/pg_ctl/t/001_start_stop.pl | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index f019fe1703..590a82338f 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -46,11 +46,6 @@ my $ctlcmd = [
 ];
 command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
 
-# sleep here is because Windows builds can't check postmaster.pid exactly,
-# so they may mistake a pre-existing postmaster.pid for one created by the
-# postmaster they start.  Waiting more than the 2 seconds slop time allowed
-# by wait_for_postmaster() prevents that mistake.
-sleep 3 if ($windows_os);
 command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ],
 	'second pg_ctl start fails');
 command_ok([ 'pg_ctl', 'stop', '-D', "$tempdir/data" ], 'pg_ctl stop');
-- 
2.39.3

#19Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#18)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Tue, Oct 24, 2023 at 4:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

In the attached, fixed the existing two messages, and adjusted one
message to display an error code, all in the consistent format.

Hi,

I'm not a Windows expert, but my guess is that 0001 is a very good
idea. I hope someone who is a Windows expert will comment on that.

0002 seems problematic to me. One potential issue is that it would
break if someone renamed postgres.exe to something else -- although
that's probably not really a serious problem. A bigger issue is that
it seems like it would break if someone used pg_ctl to start several
instances in different data directories on the same machine. If I'm
understanding correctly, that currently mostly works, and this would
break it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#19)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:

I'm not a Windows expert, but my guess is that 0001 is a very good
idea. I hope someone who is a Windows expert will comment on that.

I am +1 on 0001. It is just something we've never anticipated when
these wrappers around cmd in pg_ctl were written.

0002 seems problematic to me. One potential issue is that it would
break if someone renamed postgres.exe to something else -- although
that's probably not really a serious problem.

We do a find_other_exec_or_die() on "postgres" with what could be a
custom execution path. So we're actually sure that the binary will be
there in the start path, no? I don't like much the hardcoded
dependency to .exe here.

A bigger issue is that
it seems like it would break if someone used pg_ctl to start several
instances in different data directories on the same machine. If I'm
understanding correctly, that currently mostly works, and this would
break it.

Not having the guarantee that a single shell_pid is associated to a
single postgres.exe would be a problem.  Now the patch includes this
code:
+		if (ppe.th32ParentProcessID == shell_pid &&
+			strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}

Which is basically giving this guarantee? multiple_children should
never happen once the autorun part is removed. Is that right?

+        * The launcher shell might start other cmd.exe instances or programs
+        * besides postgres.exe. Veryfying the program file name is essential.

With the autorun part of cmd.exe removed, what's still relevant here?
s/Veryfying/Verifying/.

Perhaps 0002 should make more efforts in documenting things like
th32ProcessID and th32ParentProcessID.
--
Michael

#21Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#20)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Tue, Jan 09, 2024 at 09:40:23AM +0900, Michael Paquier wrote:

On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:

I'm not a Windows expert, but my guess is that 0001 is a very good
idea. I hope someone who is a Windows expert will comment on that.

I am +1 on 0001. It is just something we've never anticipated when
these wrappers around cmd in pg_ctl were written.

I have now applied 0001 for pg_ctl.

While reviewing that, I have also noticed spawn_process() in
pg_regress.c that includes direct command invocations with cmd.exe /c.
Could it make sense to append an extra /d for this case as well?
--
Michael

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#21)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Michael Paquier <michael@paquier.xyz> writes:

I have now applied 0001 for pg_ctl.

While reviewing that, I have also noticed spawn_process() in
pg_regress.c that includes direct command invocations with cmd.exe /c.
Could it make sense to append an extra /d for this case as well?

No Windows expert here, but it does seem like the same argument
applies.

regards, tom lane

#23Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#22)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Tue, Jan 09, 2024 at 09:40:12PM -0500, Tom Lane wrote:

No Windows expert here, but it does seem like the same argument
applies.

Yeah, I've applied the same restriction for pg_regress to avoid
similar problems as we spawn a postgres process in this case. I've
tested it and it was not causing issues in my own setup or the CI.

I am wondering if we'd better backpatch all that, TBH.
--
Michael

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#23)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Michael Paquier <michael@paquier.xyz> writes:

I am wondering if we'd better backpatch all that, TBH.

Seems like a good idea to me.

regards, tom lane

#25Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#20)
2 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Thanks for restarting this thread.

At Tue, 9 Jan 2024 09:40:23 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Fri, Jan 05, 2024 at 02:58:55PM -0500, Robert Haas wrote:

I'm not a Windows expert, but my guess is that 0001 is a very good
idea. I hope someone who is a Windows expert will comment on that.

I am +1 on 0001. It is just something we've never anticipated when
these wrappers around cmd in pg_ctl were written.

Thanks for committing it.

0002 seems problematic to me. One potential issue is that it would
break if someone renamed postgres.exe to something else -- although
that's probably not really a serious problem.

We do a find_other_exec_or_die() on "postgres" with what could be a
custom execution path. So we're actually sure that the binary will be
there in the start path, no? I don't like much the hardcoded
dependency to .exe here.

The patch doesn't care of the path for postgres.exe. If you are referring to the code you cited below, it's for another reason. I'll describe that there.

A bigger issue is that
it seems like it would break if someone used pg_ctl to start several
instances in different data directories on the same machine. If I'm
understanding correctly, that currently mostly works, and this would
break it.

Not having the guarantee that a single shell_pid is associated to a
single postgres.exe would be a problem.  Now the patch includes this
code:
+		if (ppe.th32ParentProcessID == shell_pid &&
+			strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}

Which is basically giving this guarantee? multiple_children should
never happen once the autorun part is removed. Is that right?

The patch indeed ensures the relationship between the parent
pg_ctl.exe and postgres.exe. However, for some reason, in my Windows
11 environment with the /D option specified, I observed that another
cmd.exe is spawned as the second child process of the parent
cmd.exe. This is why there is a need to verify the executable file
name. I have no idea how the second cmd.exe is being spawned.

+        * The launcher shell might start other cmd.exe instances or programs
+        * besides postgres.exe. Veryfying the program file name is essential.

With the autorun part of cmd.exe removed, what's still relevant here?

Yes, if the strcmp() is commented out, multiple_children sometimes
becomes true..

s/Veryfying/Verifying/.

Oops!

Perhaps 0002 should make more efforts in documenting things like
th32ProcessID and th32ParentProcessID.

Is it correct to understand that you are requesting changes as follows?

--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid)
 	 *
 	 * Check for duplicate processes to ensure reliability.
 	 *
- 	 * The launcher shell might start other cmd.exe instances or programs
-	 * besides postgres.exe. Verifying the program file name is essential.
-	 *
-	 * The launcher shell process isn't checked in this function.  It will be
-	 * checked by the caller.
+	 * The ppe entry to be examined is identified by th32ParentProcessID, which
+	 * should correspond to the cmd.exe process that executes the postgres.exe
+	 * binary. Additionally, th32ProcessID in the same entry should be the PID
+	 * of the launched postgres.exe. However, even though we have launched the
+	 * parent cmd.exe with the /D option specified, it is sometimes observed
+	 * that another cmd.exe is launched for unknown reasons. Therefore, it is
+	 * crucial to verify the program file name to avoid returning the wrong
+	 * PID.
 	 */

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v6-0001-Improve-pg_ctl-postmaster-process-check-on-Window.patchtext/x-patch; charset=us-asciiDownload
From f9f2486f18502357fb76f2f1da00e19db40b2bc9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 14:46:50 +0900
Subject: [PATCH v6 1/2] Improve pg_ctl postmaster process check on Windows

Currently pg_ctl on Windows does not verify that it actually executed
a postmaster process due to lack of process ID knowledge. This can
lead to false positives in cases where another pg_ctl instance starts
a different server simultaneously.
This patch adds the capability to identify the process ID of the
launched postmaster on Windows, similar to other OS versions, ensuring
more reliable detection of concurrent server startups.
---
 src/bin/pg_ctl/pg_ctl.c | 105 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 96 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 6900b27675..dec0afdb29 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -131,6 +131,7 @@ static void adjust_data_dir(void);
 
 #ifdef WIN32
 #include <versionhelpers.h>
+#include <tlhelp32.h>
 static bool pgwin32_IsInstalled(SC_HANDLE);
 static char *pgwin32_CommandLine(bool);
 static void pgwin32_doRegister(void);
@@ -141,6 +142,7 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static pid_t pgwin32_find_postmaster_pid(pid_t shell_pid);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
@@ -608,7 +610,11 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
-
+#ifndef WIN32
+			pid_t		wait_pid = pm_pid;
+#else
+			pid_t		wait_pid = pgwin32_find_postmaster_pid(pm_pid);
+#endif
 			/*
 			 * Make sanity checks.  If it's for the wrong PID, or the recorded
 			 * start time is before pg_ctl started, then either we are looking
@@ -618,14 +624,8 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atol(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == wait_pid)
 			{
 				/*
 				 * OK, seems to be a valid pidfile from our child.  Check the
@@ -1949,6 +1949,93 @@ GetPrivilegesToDelete(HANDLE hToken)
 
 	return tokenPrivs;
 }
+
+/*
+ * Find the PID of the launched postmaster.
+ *
+ * On Windows, the cmd.exe doesn't support the exec command. As a result, we
+ * don't directly get the postmaster's PID. This function identifies the PID of
+ * the postmaster started by the child cmd.exe.
+ *
+ * Returns the postmaster's PID. If the shell is alive but the postmaster is
+ * missing, returns 0. Otherwise terminates this command with an error.
+ *
+ * This function uses PID 0 as an invalid value, assuming the system idle
+ * process occupies it and it won't be a PID for a shell or postmaster.
+ */
+static pid_t
+pgwin32_find_postmaster_pid(pid_t shell_pid)
+{
+	HANDLE hSnapshot;
+	PROCESSENTRY32 ppe;
+	pid_t pm_pid = 0;			/* abusing 0 as an invalid value */
+	bool multiple_children = false;
+	DWORD last_error;
+
+	/* create a process snapshot */
+	hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
+	if (hSnapshot == INVALID_HANDLE_VALUE)
+	{
+		write_stderr(_("%s: CreateToolhelp32Snapshot failed: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		exit(1);
+	}
+
+	/* start iterating on the snapshot */
+	ppe.dwSize = sizeof(PROCESSENTRY32);
+	if (!Process32First(hSnapshot, &ppe))
+	{
+		write_stderr(_("%s: Process32First failed: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		exit(1);
+	}
+
+	/*
+	 * Iterate over the snapshot
+	 *
+	 * Check for duplicate processes to ensure reliability.
+	 *
+	 * The ppe entry to be examined is identified by th32ParentProcessID, which
+	 * should correspond to the cmd.exe process that executes the postgres.exe
+	 * binary. Additionally, th32ProcessID in the same entry should be the PID
+	 * of the launched postgres.exe. However, even though we have launched the
+	 * parent cmd.exe with the /D option specified, it is sometimes observed
+	 * that another cmd.exe is launched for unknown reasons. Therefore, it is
+	 * crucial to verify the program file name to avoid returning the wrong
+	 * PID.
+	 */
+	do
+	{
+		if (ppe.th32ParentProcessID == shell_pid &&
+			strcmp("postgres.exe", ppe.szExeFile) == 0)
+		{
+			if (pm_pid != ppe.th32ProcessID && pm_pid != 0)
+				multiple_children = true;
+			pm_pid = ppe.th32ProcessID;
+		}
+	}
+	while (Process32Next(hSnapshot, &ppe));
+
+	/* avoid multiple calls primary for clarity, not out of necessity */
+	last_error = GetLastError();
+	if (last_error != ERROR_NO_MORE_FILES)
+	{
+		write_stderr(_("%s: Process32Next failed: error code %lu\n"),
+					 progname, (unsigned long) last_error);
+		exit(1);
+	}
+	CloseHandle(hSnapshot);
+
+	/* assuming the launching shell executes a single process */
+	if (multiple_children)
+	{
+		write_stderr(_("%s: multiple postmasters found\n"),
+					 progname);
+		exit(1);
+	}
+
+	return pm_pid;
+}
 #endif							/* WIN32 */
 
 static void
-- 
2.39.3

v6-0002-Remove-short-sleep-from-001_start_stop.pl.patchtext/x-patch; charset=us-asciiDownload
From b453e1f2d1e2fefbee93c4573e6ffcf8897bf319 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 24 Oct 2023 14:47:24 +0900
Subject: [PATCH v6 2/2] Remove short sleep from 001_start_stop.pl

Previous commits ensures reliable detection whether the postmaster has really
started or not, so the short sleep in a test is not needed anymore. It was
originally introduced by 6bcce258.
---
 src/bin/pg_ctl/t/001_start_stop.pl | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index fd56bf7706..b50bd30a12 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -46,11 +46,6 @@ my $ctlcmd = [
 ];
 command_like($ctlcmd, qr/done.*server started/s, 'pg_ctl start');
 
-# sleep here is because Windows builds can't check postmaster.pid exactly,
-# so they may mistake a pre-existing postmaster.pid for one created by the
-# postmaster they start.  Waiting more than the 2 seconds slop time allowed
-# by wait_for_postmaster() prevents that mistake.
-sleep 3 if ($windows_os);
 command_fails([ 'pg_ctl', 'start', '-D', "$tempdir/data" ],
 	'second pg_ctl start fails');
 command_ok([ 'pg_ctl', 'stop', '-D', "$tempdir/data" ], 'pg_ctl stop');
-- 
2.39.3

#26Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#25)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Thu, Jan 11, 2024 at 3:33 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Is it correct to understand that you are requesting changes as follows?

--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1995,11 +1995,14 @@ pgwin32_find_postmaster_pid(pid_t shell_pid)
*
* Check for duplicate processes to ensure reliability.
*
-        * The launcher shell might start other cmd.exe instances or programs
-        * besides postgres.exe. Verifying the program file name is essential.
-        *
-        * The launcher shell process isn't checked in this function.  It will be
-        * checked by the caller.
+        * The ppe entry to be examined is identified by th32ParentProcessID, which
+        * should correspond to the cmd.exe process that executes the postgres.exe
+        * binary. Additionally, th32ProcessID in the same entry should be the PID
+        * of the launched postgres.exe. However, even though we have launched the
+        * parent cmd.exe with the /D option specified, it is sometimes observed
+        * that another cmd.exe is launched for unknown reasons. Therefore, it is
+        * crucial to verify the program file name to avoid returning the wrong
+        * PID.
*/

This kind of change looks massively helpful to me. I don't know if it
is exactly right or not, but it would have been a big help to me when
writing my previous review, so +1 for some change of this general
type.

--
Robert Haas
EDB: http://www.enterprisedb.com

#27Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#26)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Mon, Jan 15, 2024 at 01:34:46PM -0500, Robert Haas wrote:

This kind of change looks massively helpful to me. I don't know if it
is exactly right or not, but it would have been a big help to me when
writing my previous review, so +1 for some change of this general
type.

During a live review of this patch last week, as part of the Advanced
Patch Workshop of pgconf.dev, it has been mentioned by Tom that we may
be able to simplify the check on pmstart if the detection of the
postmaster PID started by pg_ctl is more stable using the WIN32
internals that this patch relies on. I am not sure that this
suggestion is right, though, because we should still care about the
clock skew case as written in the surrounding comments? Even if
that's OK, I would assume that this should be an independent patch,
written on top of the proposed v6-0001.

Tom, could you comment about that? Perhaps my notes did not catch
what you meant.
--
Michael

#28Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#27)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Tue, 4 Jun 2024 08:30:19 +0900, Michael Paquier <michael@paquier.xyz> wrote in

On Mon, Jan 15, 2024 at 01:34:46PM -0500, Robert Haas wrote:

This kind of change looks massively helpful to me. I don't know if it
is exactly right or not, but it would have been a big help to me when
writing my previous review, so +1 for some change of this general
type.

During a live review of this patch last week, as part of the Advanced
Patch Workshop of pgconf.dev, it has been mentioned by Tom that we may
be able to simplify the check on pmstart if the detection of the
postmaster PID started by pg_ctl is more stable using the WIN32
internals that this patch relies on. I am not sure that this
suggestion is right, though, because we should still care about the
clock skew case as written in the surrounding comments? Even if
that's OK, I would assume that this should be an independent patch,
written on top of the proposed v6-0001.

Tom, could you comment about that? Perhaps my notes did not catch
what you meant.

Thank you for the follow-up.

I have been thinking about this since then. At first, I thought it
referred to FindFirstChangeNotification() and friends, and inotify on
Linux. However, I haven't found a way to simplify the specified code
area using those APIs.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#29Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#28)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Thu, 06 Jun 2024 16:45:00 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

I have been thinking about this since then. At first, I thought it
referred to FindFirstChangeNotification() and friends, and inotify on
Linux. However, I haven't found a way to simplify the specified code
area using those APIs.

By the way, the need to shift by 2 seconds to tolerate clock skew
suggests that the current launcher-postmaster association mechanism is
somewhat unreliable. Couldn't we add a command line option to
postmaster to explicitly pass a unique identifier (say, pid itself) of
the launcher? If it is not specified, the number should be the PID of
the immediate parent process.

This change avoids the need for the special treatment for Windows.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#30Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#29)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

At Thu, 06 Jun 2024 17:15:15 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Thu, 06 Jun 2024 16:45:00 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

I have been thinking about this since then. At first, I thought it
referred to FindFirstChangeNotification() and friends, and inotify on
Linux. However, I haven't found a way to simplify the specified code
area using those APIs.

By the way, the need to shift by 2 seconds to tolerate clock skew
suggests that the current launcher-postmaster association mechanism is
somewhat unreliable. Couldn't we add a command line option to
postmaster to explicitly pass a unique identifier (say, pid itself) of
the launcher? If it is not specified, the number should be the PID of
the immediate parent process.

No. The combination of pg_ctl's pid and timestamp, to avoid false
matching during reboot.

This change avoids the need for the special treatment for Windows.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#31Andrew Dunstan
andrew@dunslane.net
In reply to: Kyotaro Horiguchi (#29)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On 2024-06-06 Th 04:15, Kyotaro Horiguchi wrote:

At Thu, 06 Jun 2024 16:45:00 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

I have been thinking about this since then. At first, I thought it
referred to FindFirstChangeNotification() and friends, and inotify on
Linux. However, I haven't found a way to simplify the specified code
area using those APIs.

By the way, the need to shift by 2 seconds to tolerate clock skew
suggests that the current launcher-postmaster association mechanism is
somewhat unreliable. Couldn't we add a command line option to
postmaster to explicitly pass a unique identifier (say, pid itself) of
the launcher? If it is not specified, the number should be the PID of
the immediate parent process.

This change avoids the need for the special treatment for Windows.

Looks good generally. I assume iterating over the process table to find
the right pid will be pretty quick.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#32Noah Misch
noah@leadboat.com
In reply to: Kyotaro Horiguchi (#30)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Thu, Jun 06, 2024 at 05:21:46PM +0900, Kyotaro Horiguchi wrote:

At Thu, 06 Jun 2024 17:15:15 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

By the way, the need to shift by 2 seconds to tolerate clock skew
suggests that the current launcher-postmaster association mechanism is
somewhat unreliable. Couldn't we add a command line option to
postmaster to explicitly pass a unique identifier (say, pid itself) of
the launcher? If it is not specified, the number should be the PID of
the immediate parent process.

No. The combination of pg_ctl's pid and timestamp, to avoid false
matching during reboot.

This change avoids the need for the special treatment for Windows.

Regarding your "unique identifier" idea, pg_ctl could generate an 8-byte
random value for the postmaster to write to postmaster.pid. That would be
enough for wait_for_postmaster_start() to ignore PIDs and achieve its mission
without OS-specific code.

Commits 9886744 and b83747a added /D to two %comspec% callers. I gather they
arose to make particular cmd.exe invocations have just one child. However,
/messages/by-id/20240111.173322.1809044112677090191.horikyota.ntt@gmail.com
reports multiple children remain possible. v17 is currently in a weird state
where most Windows subprocess invocation routes through pgwin32_system() and
does not add /D, while these two callers add /D. I suspect we should either
(1) prepend /D in pgwin32_system() and other %comspec% usage or (2) revert
prepending it in the callers from this thread's commits. While
"Software\Microsoft\Command Processor\AutoRun" is hard to use without breaking
things, it's not PostgreSQL's job to second-guess the user in that respect.
Hence, I lean toward (2). What do you think?

#33Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#32)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Sat, Jun 29, 2024 at 07:12:11PM -0700, Noah Misch wrote:

Commits 9886744 and b83747a added /D to two %comspec% callers. I gather they
arose to make particular cmd.exe invocations have just one child. However,
/messages/by-id/20240111.173322.1809044112677090191.horikyota.ntt@gmail.com
reports multiple children remain possible. v17 is currently in a weird state
where most Windows subprocess invocation routes through pgwin32_system() and
does not add /D, while these two callers add /D. I suspect we should either
(1) prepend /D in pgwin32_system() and other %comspec% usage or (2) revert
prepending it in the callers from this thread's commits. While
"Software\Microsoft\Command Processor\AutoRun" is hard to use without breaking
things, it's not PostgreSQL's job to second-guess the user in that respect.
Hence, I lean toward (2). What do you think?

Thanks for the ping.

As of this stage of the game for v17, I am going to agree with (2) to
remove this inconsistency rather than experiment with new things. We
could always study more in v18 what could be done with the /D switches
and the other patch, though that will unlikely be something I'll be
able to look at in the short term.
--
Michael

#34Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#33)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Fri, Jul 05, 2024 at 02:19:54PM +0900, Michael Paquier wrote:

As of this stage of the game for v17, I am going to agree with (2) to
remove this inconsistency rather than experiment with new things. We
could always study more in v18 what could be done with the /D switches
and the other patch, though that will unlikely be something I'll be
able to look at in the short term.

As I am not tempted to play the apprentice sorcerer on a stable
branch, just reverted these two things with 74b8e6a69802, and
backpatched the change down to 17.
--
Michael

#35Sutou Kouhei
kou@clear-code.com
In reply to: Michael Paquier (#27)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

Hi,

I'm reviewing patches in Commitfest 2024-07 from top to bottom:
https://commitfest.postgresql.org/48/

This is the 2st patch:
https://commitfest.postgresql.org/48/4573/

FYI: https://commitfest.postgresql.org/48/4681/ is my patch.

In <Zl5SC_kboY7i0AS8@paquier.xyz>
"Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows" on Tue, 4 Jun 2024 08:30:19 +0900,
Michael Paquier <michael@paquier.xyz> wrote:

During a live review of this patch last week, as part of the Advanced
Patch Workshop of pgconf.dev, it has been mentioned by Tom that we may
be able to simplify the check on pmstart if the detection of the
postmaster PID started by pg_ctl is more stable using the WIN32
internals that this patch relies on. I am not sure that this
suggestion is right, though, because we should still care about the
clock skew case as written in the surrounding comments? Even if
that's OK, I would assume that this should be an independent patch,
written on top of the proposed v6-0001.

I reviewed the latest patch set and I felt different
impression.

start_postmaster() on Windows uses cmd.exe for redirection
based on the comment in the function:

/*
* As with the Unix case, it's easiest to use the shell (CMD.EXE) to
* handle redirection etc. Unfortunately CMD.EXE lacks any equivalent of
* "exec", so we don't get to find out the postmaster's PID immediately.
*/

It seems that we can use redirection by CreateProcess()
family functions without cmd.exe based on the following
documentation:

https://learn.microsoft.com/en-us/windows/win32/procthread/creating-a-child-process-with-redirected-input-and-output

How about changing start_postmaster() for Windows to start
postgres.exe directly so that it returns the postgres.exe's
PID not cmd.exe's PID? If we can do it, we don't need
pgwin32_find_postmaster_pid() in the patch set.

Thanks,
--
kou

#36Yasir Shah
yasir.hussain.shah@gmail.com
In reply to: Sutou Kouhei (#35)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: not tested
Documentation: not tested

Hi,

I have verified following:
- Bug exits in PG17. I also checked it in PG16 but it does not exits there.
- After applying your patch, I can confirm that bug get fixed.
- no regression found. I ran "meson test".
- I would like to suggest you that #includes should be included at appropriate location keeping the #includes alphabetically sorted, what I observed in the PG code as a standard:
Your patch:
#include <versionhelpers.h>
#include <tlhelp32.h>

It should be like:
#include <tlhelp32.h>
#include <versionhelpers.h>

Regards...

Yasir Hussain
Bitnine Global Inc.

#37Yasir
yasir.hussain.shah@gmail.com
In reply to: Yasir Shah (#36)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Tue, Jul 16, 2024 at 4:58 PM Yasir Shah <yasir.hussain.shah@gmail.com>
wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed (meson test, passed)
Implements feature: tested, failed (tested, passed)
Spec compliant: not tested (tested, passed
with suggestion)
Documentation: not tested

Please ignore the above 4 lines in my review. See my comments in blue.

Show quoted text

Hi,

I have verified following:
- Bug exits in PG17. I also checked it in PG16 but it does not exits
there.
- After applying your patch, I can confirm that bug get fixed.
- no regression found. I ran "meson test".
- I would like to suggest you that #includes should be included at
appropriate location keeping the #includes alphabetically sorted, what I
observed in the PG code as a standard:
Your patch:
#include <versionhelpers.h>
#include <tlhelp32.h>

It should be like:
#include <tlhelp32.h>
#include <versionhelpers.h>

Regards...

Yasir Hussain
Bitnine Global Inc.

#38Robert Haas
robertmhaas@gmail.com
In reply to: Yasir (#37)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com> wrote:

Please ignore the above 4 lines in my review. See my comments in blue.

OK, so I think it's unclear what the next steps are for this patch.

1. On June 3rd, Michael Paquier said that Tom Lane proposed that,
after doing what the patch currently does, we could simplify some
other stuff. The details are unclear, and Tom hasn't commented.

2. On June 29th, Noah Misch proposed a platform-independent way of
solving the problem.

3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start
the postmaster instead of cmd.exe.

4. On July 16th, Yasir Shah said that he tested the patch and found
that the problem only exists in v17, not any prior release, which is
contrary to my understanding of the situation. He also proposed a
minor tweak to the patch itself.

So, as I see it, we have three possible ways forward here. First, we
could stick with the current patch, possibly with further work as per
[1]: or adjustments as per [4]. Second, we could abandon the current approach and adopt Noah's proposal in [2]. Third, we could possibly abandon the current approach and adopt Sutou's proposal in [3]. I say "possibly" because I can't personally assess whether this approach is feasible.
approach and adopt Noah's proposal in [2]. Third, we could possibly
abandon the current approach and adopt Sutou's proposal in [3]. I say
"possibly" because I can't personally assess whether this approach is
feasible.

I have some bias toward thinking that real patches are better than
imaginary ones, and that we ought to therefore think about committing
Horiguchi-san's actual patch to fix the actual problem rather than
worrying much about other hypothetical things that we could do. On the
other hand, I'm also not volunteering, among other reasons because I
am not knowledgeable enough about Windows. And, certainly, there is
some appeal to a platform-independent approach. But I feel like we're
not doing ourselves any favors by letting this patch sit for (checks
thread) 10 months when according to multiple reviewers, it works.

--
Robert Haas
EDB: http://www.enterprisedb.com

#39vignesh C
vignesh21@gmail.com
In reply to: Robert Haas (#38)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On Sat, 20 Jul 2024 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com> wrote:

Please ignore the above 4 lines in my review. See my comments in blue.

OK, so I think it's unclear what the next steps are for this patch.

1. On June 3rd, Michael Paquier said that Tom Lane proposed that,
after doing what the patch currently does, we could simplify some
other stuff. The details are unclear, and Tom hasn't commented.

2. On June 29th, Noah Misch proposed a platform-independent way of
solving the problem.

3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start
the postmaster instead of cmd.exe.

4. On July 16th, Yasir Shah said that he tested the patch and found
that the problem only exists in v17, not any prior release, which is
contrary to my understanding of the situation. He also proposed a
minor tweak to the patch itself.

So, as I see it, we have three possible ways forward here. First, we
could stick with the current patch, possibly with further work as per
[1] or adjustments as per [4]. Second, we could abandon the current
approach and adopt Noah's proposal in [2]. Third, we could possibly
abandon the current approach and adopt Sutou's proposal in [3]. I say
"possibly" because I can't personally assess whether this approach is
feasible.

Thank you very much, Robert, for summarizing this. If anyone has
suggestions on which approach might work best, please share them to
help move this discussion forward.

Regards,
Vignesh

#40Bryan Green
dbryan.green@gmail.com
In reply to: vignesh C (#39)
1 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On 3/26/2025 3:18 AM, vignesh C wrote:

On Sat, 20 Jul 2024 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com> wrote:

Please ignore the above 4 lines in my review. See my comments in blue.

OK, so I think it's unclear what the next steps are for this patch.

1. On June 3rd, Michael Paquier said that Tom Lane proposed that,
after doing what the patch currently does, we could simplify some
other stuff. The details are unclear, and Tom hasn't commented.

2. On June 29th, Noah Misch proposed a platform-independent way of
solving the problem.

3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start
the postmaster instead of cmd.exe.

4. On July 16th, Yasir Shah said that he tested the patch and found
that the problem only exists in v17, not any prior release, which is
contrary to my understanding of the situation. He also proposed a
minor tweak to the patch itself.

So, as I see it, we have three possible ways forward here. First, we
could stick with the current patch, possibly with further work as per
[1] or adjustments as per [4]. Second, we could abandon the current
approach and adopt Noah's proposal in [2]. Third, we could possibly
abandon the current approach and adopt Sutou's proposal in [3]. I say
"possibly" because I can't personally assess whether this approach is
feasible.

Thank you very much, Robert, for summarizing this. If anyone has
suggestions on which approach might work best, please share them to
help move this discussion forward.

Regards,
Vignesh

Hello everyone,

I've spent the last few days implementing Sutou-san's CreateProcess()
proposal as a proof-of-concept. With all three approaches now available
for comparison, I wanted to share my assessment and recommendation.

The core issue, as we've established, is that cmd.exe intermediation
prevents pg_ctl from getting the real postgres.exe PID, creating race
conditions where we can't distinguish a newly-started postmaster from a
pre-existing one. This causes pg_ctl start to incorrectly report success
when attempting to start an already-running cluster.

Horiguchi-san's process tree walking patch works and is ready to ship.
It has real merit as a minimal-change solution. However, it keeps
cmd.exe and adds process enumeration complexity to work around that
decision. We're fixing symptoms rather than the root cause.

Noah's token proposal is architecturally elegant and
platform-independent. My concern is that it solves a Windows-specific
problem by adding complexity to all platforms. Making other platforms
adopt token-passing infrastructure for Windows's problems feels wrong.
It also requires postmaster changes for what is fundamentally a pg_ctl
issue, raising compatibility questions.

The CreateProcess() approach eliminates cmd.exe entirely and gives us
the actual postgres.exe PID directly. I've implemented this in the
attached patch (about 250 lines of Windows-specific code). The
implementation uses CreateProcessAsUser() with a restricted security
token to handle the case where pg_ctl runs with elevated privileges—this
drops Administrator group membership and dangerous privileges before
launching the postmaster, matching the behavior of the existing
CreateRestrictedProcess() function.

For I/O redirection, we use Windows's native handle-based APIs. When a
log file is specified, we open it with CreateFile() and pass it through
STARTUPINFO. When there's no log file (interactive use and postgres -C
queries), we inherit standard handles. One detail worth noting: we wait
2 seconds for no-log-file launches to distinguish postgres -C (which
exits immediately) from actual server startup. This is long enough to
catch quick exits even on loaded CI systems without impacting test suite
performance.

The implementation passes all regression tests on Windows, including CI
environments with elevated privileges. It handles all the problem
scenarios correctly: normal startup, detecting already-running clusters,
postgres -C queries, pg_upgrade with multiple instances, and concurrent
start attempts.

I prefer the CreateProcess() approach. It fixes the root cause, not the
symptoms. We get the real PID immediately with no process tree walking
or PID verification complexity. It's a Windows-specific solution to a
Windows-specific problem.

If the CreateProcess() approach is deemed unacceptable, I would support
Horiguchi-san's process tree walking patch as a pragmatic fallback. It
fixes the bug and is tested. However, I believe we'd be choosing a
workaround over a proper fix and adding maintenance burden we don't need.

I'm opposed to the token approach for the reasons stated above—it's
architecturally appealing but touches all platforms for a Windows-only
problem.

With this implementation complete, we now have all three options on the
table for evaluation. I believe the CreateProcess() approach is
technically sound and the right path forward, but I'm open to discussion
and happy to address any concerns about the implementation.

Regardless of which way we choose to go, I'll happily help review and
test the solution.

Best regards,
BG

Attachments:

0001-Fix-pg_ctl-on-Windows-to-reliably-detect-already-run.patchtext/plain; charset=UTF-8; name=0001-Fix-pg_ctl-on-Windows-to-reliably-detect-already-run.patchDownload
From 2f45e8dff815ad08118b68dcf435948f1d92b419 Mon Sep 17 00:00:00 2001
From: Bryan Green <dbryan.green@gmail.com>
Date: Thu, 23 Oct 2025 13:52:51 -0500
Subject: [PATCH] Fix pg_ctl on Windows to reliably detect already-running
 postmaster.

The previous implementation invoked postgres.exe via cmd.exe, which meant
pg_ctl got the PID of the shell wrapper rather than the actual postmaster
process.  This caused problems in wait_for_postmaster_start(), which tries
to verify that the PID in postmaster.pid matches the one we started.  The
mismatch led to a timing window where pg_ctl could incorrectly report
success when attempting to start an already-running cluster.

Fix by replacing the cmd.exe wrapper with a direct CreateProcess() call.
This gives us the real postgres.exe PID immediately, eliminating the
need for process tree walking or other heuristics to find the actual
postmaster.  We handle I/O redirection using Windows handle-based APIs
instead of shell syntax, which is more reliable anyway.
---
 src/bin/pg_ctl/pg_ctl.c | 528 ++++++++++++++++++++++++++++------------
 1 file changed, 373 insertions(+), 155 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 8a405ff122..6158fa3b99 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -72,6 +72,7 @@ typedef enum
 
 #define WAITS_PER_SEC	10		/* should divide USEC_PER_SEC evenly */
 
+static pid_t pm_pid = 0;
 static bool do_wait = true;
 static int	wait_seconds = DEFAULT_WAIT;
 static bool wait_seconds_arg = false;
@@ -99,7 +100,6 @@ static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
 static char promote_file[MAXPGPATH];
 static char logrotate_file[MAXPGPATH];
-
 static volatile pid_t postmasterPID = -1;
 
 #ifdef WIN32
@@ -142,12 +142,14 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static void InheritStdHandles(STARTUPINFO *si);
+static HANDLE create_restricted_token(void);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
 static char **readfile(const char *path, int *numlines);
 static void free_readfile(char **optlines);
-static pid_t start_postmaster(void);
+static void start_postmaster(void);
 static void read_post_opts(void);
 
 static WaitPMResult wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint);
@@ -421,56 +423,321 @@ free_readfile(char **optlines)
 	free(optlines);
 }
 
+
 /*
- * start/test/stop routines
+ * start/test/stop routines  
  */
+#ifdef WIN32
+/*
+ * Helper function to drop privileges before launching postgres.
+ */
+static HANDLE
+create_restricted_token(void)
+{
+	HANDLE		origToken;
+	HANDLE		restrictedToken;
+	SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
+	SID_AND_ATTRIBUTES dropSids[2];
+	PTOKEN_PRIVILEGES delPrivs;
+	BOOL		b;
+
+	if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ALL_ACCESS, &origToken))
+	{
+		write_stderr(_("%s: could not open process token: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		return NULL;
+	}
+
+	ZeroMemory(&dropSids, sizeof(dropSids));
+	if (!AllocateAndInitializeSid(&NtAuthority, 2,
+								  SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_ADMINS, 0, 0, 0, 0, 0,
+								  0, &dropSids[0].Sid))
+	{
+		write_stderr(_("%s: could not allocate SIDs: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		CloseHandle(origToken);
+		return NULL;
+	}
+
+	delPrivs = GetPrivilegesToDelete(origToken);
+	if (delPrivs == NULL)
+	{
+		FreeSid(dropSids[1].Sid);
+		FreeSid(dropSids[0].Sid);
+		CloseHandle(origToken);
+		return NULL;
+	}
+
+	b = CreateRestrictedToken(origToken,
+							  0,
+							  sizeof(dropSids) / sizeof(dropSids[0]),
+							  dropSids,
+							  delPrivs->PrivilegeCount, delPrivs->Privileges,
+							  0, NULL,
+							  &restrictedToken);
+
+	free(delPrivs);
+	FreeSid(dropSids[1].Sid);
+	FreeSid(dropSids[0].Sid);
+	CloseHandle(origToken);
+
+	if (!b)
+	{
+		write_stderr(_("%s: could not create restricted token: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		return NULL;
+	}
+
+	return restrictedToken;
+}
+
+
+static pid_t
+start_postmaster_win32(void)
+{
+	HANDLE		hOutputFile = INVALID_HANDLE_VALUE;
+	HANDLE		hErrorFile = INVALID_HANDLE_VALUE;
+	HANDLE		hInputFile = INVALID_HANDLE_VALUE;
+	HANDLE		restrictedToken = NULL;
+	STARTUPINFO si;
+	PROCESS_INFORMATION pi;
+	char		cmd[MAXPGPATH * 2 + 256];
+	SECURITY_ATTRIBUTES sa;
+	DWORD		creation_flags;
+	DWORD		create_error;
+	BOOL		ret;
+	char	   *cmdline;
+	int			cmdlen;
+	BOOL		own_handles = FALSE;
+
+	cmdlen = snprintf(cmd, sizeof(cmd), "\"%s\"%s%s%s%s",
+					  exec_path,
+					  pgdata_opt ? " " : "",
+					  pgdata_opt ? pgdata_opt : "",
+					  post_opts ? " " : "",
+					  post_opts ? post_opts : "");
+
+	if (cmdlen >= sizeof(cmd))
+	{
+		write_stderr(_("%s: command line too long\n"), progname);
+		return 0;
+	}
+
+	cmdline = pg_strdup(cmd);
+
+	restrictedToken = create_restricted_token();
+	if (restrictedToken == NULL)
+	{
+		free(cmdline);
+		exit(1);
+	}
+
+	ZeroMemory(&si, sizeof(si));
+	si.cb = sizeof(si);
+
+	if (log_file != NULL)
+	{
+		ZeroMemory(&sa, sizeof(sa));
+		sa.nLength = sizeof(sa);
+		sa.bInheritHandle = TRUE;
+		sa.lpSecurityDescriptor = NULL;
+
+		hInputFile = CreateFile("NUL",
+								GENERIC_READ,
+								FILE_SHARE_READ | FILE_SHARE_WRITE,
+								&sa,
+								OPEN_EXISTING,
+								FILE_ATTRIBUTE_NORMAL,
+								NULL);
+
+		if (hInputFile == INVALID_HANDLE_VALUE)
+		{
+			write_stderr(_("%s: could not open NUL device: error code %lu\n"),
+						 progname, GetLastError());
+			CloseHandle(restrictedToken);
+			free(cmdline);
+			return 0;
+		}
+
+		hOutputFile = CreateFile(log_file,
+								 GENERIC_WRITE,
+								 FILE_SHARE_READ | FILE_SHARE_WRITE,
+								 &sa,
+								 OPEN_ALWAYS,
+								 FILE_ATTRIBUTE_NORMAL,
+								 NULL);
+
+		if (hOutputFile == INVALID_HANDLE_VALUE)
+		{
+			write_stderr(_("%s: could not open log file \"%s\": error code %lu\n"),
+						 progname, log_file, GetLastError());
+			CloseHandle(hInputFile);
+			CloseHandle(restrictedToken);
+			free(cmdline);
+			return 0;
+		}
+
+		if (SetFilePointer(hOutputFile, 0, NULL, FILE_END) == INVALID_SET_FILE_POINTER &&
+			GetLastError() != NO_ERROR)
+		{
+			write_stderr(_("%s: could not seek to end of log file: error code %lu\n"),
+						 progname, GetLastError());
+			CloseHandle(hOutputFile);
+			CloseHandle(hInputFile);
+			CloseHandle(restrictedToken);
+			free(cmdline);
+			return 0;
+		}
+
+		hErrorFile = hOutputFile;
+
+		si.dwFlags = STARTF_USESTDHANDLES;
+		si.hStdInput = hInputFile;
+		si.hStdOutput = hOutputFile;
+		si.hStdError = hErrorFile;
+
+		creation_flags = CREATE_NO_WINDOW | CREATE_NEW_PROCESS_GROUP | CREATE_SUSPENDED;
+
+		own_handles = TRUE;
+	}
+	else
+	{
+		InheritStdHandles(&si);
+		creation_flags = CREATE_SUSPENDED;
+	}
+
+	ZeroMemory(&pi, sizeof(pi));
+
+	ret = CreateProcessAsUser(restrictedToken,		/* restricted security token */
+							  NULL,					/* application name */
+							  cmdline,				/* command line */
+							  NULL,					/* process security attributes */
+							  NULL,					/* thread security attributes */
+							  TRUE,					/* inherit handles */
+							  creation_flags,		/* creation flags */
+							  NULL,					/* environment */
+							  pg_data,				/* current directory */
+							  &si,					/* startup info */
+							  &pi);					/* process info */
+
+	create_error = GetLastError();
+
+	if (own_handles)
+	{
+		if (hInputFile != INVALID_HANDLE_VALUE)
+			CloseHandle(hInputFile);
+		if (hOutputFile != INVALID_HANDLE_VALUE)
+			CloseHandle(hOutputFile);
+	}
+	
+	CloseHandle(restrictedToken);
+	free(cmdline);
+
+	if (!ret)
+	{
+		write_stderr(_("%s: could not start server: error code %lu\n"),
+					 progname, create_error);
+		return 0;
+	}
+
+	ResumeThread(pi.hThread);
+	CloseHandle(pi.hThread);
+	
+	/*
+	 * When there's no log file, wait briefly to see if the process exits
+	 * immediately. This distinguishes postgres -C queries (which print a
+	 * from actual server startup. Two seconds is long enough to catch 
+	 * quick exits even on slow CI systems, but still fast enough not to 
+	 * be annoying for normal server startup.
+	 */
+	if (log_file == NULL)
+	{
+		DWORD wait_result;
+		DWORD exit_code;
+		
+		wait_result = WaitForSingleObject(pi.hProcess, 2000);
+		
+		if (wait_result == WAIT_TIMEOUT)
+		{
+			/* Still running after 2 seconds - assume real server startup */
+			postmasterProcess = pi.hProcess;
+			return (pid_t) pi.dwProcessId;
+		}
+		else if (wait_result == WAIT_OBJECT_0)
+		{
+			/* Process exited quickly - server didn't stay running */
+			exit_code = 0;
+			GetExitCodeProcess(pi.hProcess, &exit_code);
+			CloseHandle(pi.hProcess);
+			write_stderr(_("%s: could not start server\n"), progname);
+			return 0;
+		}
+		else
+		{
+			/* Wait failed */
+			write_stderr(_("%s: error waiting for server: error code %lu\n"),
+						 progname, GetLastError());
+			CloseHandle(pi.hProcess);
+			return 0;
+		}
+	}
+
+	postmasterProcess = pi.hProcess;
+	return (pid_t) pi.dwProcessId;
+}
+#endif
+
 
 /*
- * Start the postmaster and return its PID.
+ * start_postmaster
+ *
+ * Wrapper around the platform-specific code to launch the postmaster.
  *
- * Currently, on Windows what we return is the PID of the shell process
- * that launched the postmaster (and, we trust, is waiting for it to exit).
- * So the PID is usable for "is the postmaster still running" checks,
- * but cannot be compared directly to postmaster.pid.
+ * On Unix, we fork and exec. There's no extra process layer;
+ * we get the postgres PID directly.
  *
- * On Windows, we also save aside a handle to the shell process in
- * "postmasterProcess", which the caller should close when done with it.
+ * On Windows, we use CreateProcess to launch postgres.exe directly, which
+ * gives us its PID immediately.
  */
-static pid_t
+static void
 start_postmaster(void)
 {
+#ifdef WIN32
+	pm_pid = start_postmaster_win32();
+	if (pm_pid == 0)
+		exit(1);
+#else
 	char	   *cmd;
+	int			cmdlen;
+	pid_t		fork_pid;
 
-#ifndef WIN32
-	pid_t		pm_pid;
-
-	/* Flush stdio channels just before fork, to avoid double-output problems */
+	/* Flush stdio to avoid double-output problems after fork */
 	fflush(NULL);
 
 #ifdef EXEC_BACKEND
 	pg_disable_aslr();
 #endif
 
-	pm_pid = fork();
-	if (pm_pid < 0)
+	fork_pid = fork();
+	if (fork_pid < 0)
 	{
-		/* fork failed */
 		write_stderr(_("%s: could not start server: %m\n"),
 					 progname);
 		exit(1);
 	}
-	if (pm_pid > 0)
+	if (fork_pid > 0)
 	{
-		/* fork succeeded, in parent */
-		return pm_pid;
+		/* Parent process */
+		pm_pid = fork_pid;
+		return;
 	}
 
-	/* fork succeeded, in child */
+	/* Child process */
 
 	/*
-	 * If possible, detach the postmaster process from the launching process
-	 * group and make it a group leader, so that it doesn't get signaled along
-	 * with the current group that launched it.
+	 * Detach from the parent's process group so we don't get signaled
+	 * along with it.  This is the equivalent of what CREATE_NEW_PROCESS_GROUP
+	 * does on Windows.
 	 */
 #ifdef HAVE_SETSID
 	if (setsid() < 0)
@@ -482,112 +749,72 @@ start_postmaster(void)
 #endif
 
 	/*
-	 * Since there might be quotes to handle here, it is easier simply to pass
-	 * everything to a shell to process them.  Use exec so that the postmaster
-	 * has the same PID as the current child process.
+	 * Build the shell command.  We use "exec" so the shell replaces itself
+	 * with postgres, rather than keeping the shell process around.
 	 */
 	if (log_file != NULL)
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts,
-					   DEVNULL, log_file);
+	{
+		cmdlen = snprintf(NULL, 0, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+						  exec_path,
+						  pgdata_opt ? pgdata_opt : "",
+						  post_opts ? post_opts : "",
+						  DEVNULL, log_file);
+	}
 	else
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts, DEVNULL);
-
-	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
-
-	/* exec failed */
-	write_stderr(_("%s: could not start server: %m\n"),
-				 progname);
-	exit(1);
-
-	return 0;					/* keep dumb compilers quiet */
-
-#else							/* WIN32 */
-
-	/*
-	 * As with the Unix case, it's easiest to use the shell (CMD.EXE) to
-	 * handle redirection etc.  Unfortunately CMD.EXE lacks any equivalent of
-	 * "exec", so we don't get to find out the postmaster's PID immediately.
-	 */
-	PROCESS_INFORMATION pi;
-	const char *comspec;
+	{
+		cmdlen = snprintf(NULL, 0, "exec \"%s\" %s%s < \"%s\" 2>&1",
+						  exec_path,
+						  pgdata_opt ? pgdata_opt : "",
+						  post_opts ? post_opts : "",
+						  DEVNULL);
+	}
 
-	/* Find CMD.EXE location using COMSPEC, if it's set */
-	comspec = getenv("COMSPEC");
-	if (comspec == NULL)
-		comspec = "CMD";
+	cmd = pg_malloc(cmdlen + 1);
 
 	if (log_file != NULL)
 	{
-		/*
-		 * First, open the log file if it exists.  The idea is that if the
-		 * file is still locked by a previous postmaster run, we'll wait until
-		 * it comes free, instead of failing with ERROR_SHARING_VIOLATION.
-		 * (It'd be better to open the file in a sharing-friendly mode, but we
-		 * can't use CMD.EXE to do that, so work around it.  Note that the
-		 * previous postmaster will still have the file open for a short time
-		 * after removing postmaster.pid.)
-		 *
-		 * If the log file doesn't exist, we *must not* create it here.  If we
-		 * were launched with higher privileges than the restricted process
-		 * will have, the log file might end up with permissions settings that
-		 * prevent the postmaster from writing on it.
-		 */
-		int			fd = open(log_file, O_RDWR, 0);
-
-		if (fd == -1)
-		{
-			/*
-			 * ENOENT is expectable since we didn't use O_CREAT.  Otherwise
-			 * complain.  We could just fall through and let CMD.EXE report
-			 * the problem, but its error reporting is pretty miserable.
-			 */
-			if (errno != ENOENT)
-			{
-				write_stderr(_("%s: could not open log file \"%s\": %m\n"),
-							 progname, log_file);
-				exit(1);
-			}
-		}
-		else
-			close(fd);
-
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
-					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
+		snprintf(cmd, cmdlen + 1, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+				 exec_path,
+				 pgdata_opt ? pgdata_opt : "",
+				 post_opts ? post_opts : "",
+				 DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
-					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
-
-	if (!CreateRestrictedProcess(cmd, &pi, false))
 	{
-		write_stderr(_("%s: could not start server: error code %lu\n"),
-					 progname, (unsigned long) GetLastError());
-		exit(1);
+		snprintf(cmd, cmdlen + 1, "exec \"%s\" %s%s < \"%s\" 2>&1",
+				 exec_path,
+				 pgdata_opt ? pgdata_opt : "",
+				 post_opts ? post_opts : "",
+				 DEVNULL);
 	}
-	/* Don't close command process handle here; caller must do so */
-	postmasterProcess = pi.hProcess;
-	CloseHandle(pi.hThread);
-	return pi.dwProcessId;		/* Shell's PID, not postmaster's! */
+
+	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
+
+	/* exec failed */
+	write_stderr(_("%s: could not start server: %m\n"),
+				 progname);
+	exit(1);
+
 #endif							/* WIN32 */
 }
 
 
-
 /*
- * Wait for the postmaster to become ready.
+ * wait_for_postmaster_start
+ *
+ * Wait for the postmaster to finish starting up and become ready to
+ * accept connections.
  *
- * On Unix, pm_pid is the PID of the just-launched postmaster.  On Windows,
- * it may be the PID of an ancestor shell process, so we can't check the
- * contents of postmaster.pid quite as carefully.
+ * On entry, pm_pid should be the PID of the postmaster we just launched.
+ * We poll postmaster.pid to see when it's been created and whether the
+ * PID in it matches pm_pid.  Once we see a matching PID and the status
+ * line says "ready", we're done.
  *
- * On Windows, the static variable postmasterProcess is an implicit argument
- * to this routine; it contains a handle to the postmaster process or an
- * ancestor shell process thereof.
+ * On Windows, we also use the postmasterProcess handle (set by
+ * start_postmaster_win32) to detect if the process dies unexpectedly.
  *
- * Note that the checkpoint parameter enables a Windows service control
- * manager checkpoint, it's got nothing to do with database checkpoints!!
+ * Returns POSTMASTER_READY if all is well, POSTMASTER_STILL_STARTING if
+ * we ran out of time, or POSTMASTER_FAILED if the postmaster died.
  */
 static WaitPMResult
 wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
@@ -600,63 +827,48 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 		int			numlines;
 
 		/*
-		 * Try to read the postmaster.pid file.  If it's not valid, or if the
-		 * status line isn't there yet, just keep waiting.
+		 * Try to read postmaster.pid.  If it's not there yet, or doesn't
+		 * have enough lines, just move on to the next iteration.
 		 */
 		if ((optlines = readfile(pid_file, &numlines)) != NULL &&
 			numlines >= LOCK_FILE_LINE_PM_STATUS)
 		{
-			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
 
 			/*
-			 * Make sanity checks.  If it's for the wrong PID, or the recorded
-			 * start time is before pg_ctl started, then either we are looking
-			 * at the wrong data directory, or this is a pre-existing pidfile
-			 * that hasn't (yet?) been overwritten by our child postmaster.
-			 * Allow 2 seconds slop for possible cross-process clock skew.
+			 * Check that the PID and start time in the file match what we
+			 * expect.  A mismatch means either we're looking at the wrong
+			 * data directory, or this is a stale pidfile left over from a
+			 * previous postmaster.  We allow 2 seconds slop in the timestamp
+			 * comparison to handle clock skew between processes.
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atoll(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+			
+			if (pmstart >= start_time - 2 && pmpid == pm_pid)
 			{
 				/*
-				 * OK, seems to be a valid pidfile from our child.  Check the
-				 * status line (this assumes a v10 or later server).
+				 * It's the right pidfile.  Check whether the postmaster is
+				 * ready to accept connections.
 				 */
 				char	   *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
 
 				if (strcmp(pmstatus, PM_STATUS_READY) == 0 ||
 					strcmp(pmstatus, PM_STATUS_STANDBY) == 0)
 				{
-					/* postmaster is done starting up */
 					free_readfile(optlines);
 					return POSTMASTER_READY;
 				}
 			}
+			free_readfile(optlines);
 		}
 
-		/*
-		 * Free the results of readfile.
-		 *
-		 * This is safe to call even if optlines is NULL.
-		 */
-		free_readfile(optlines);
+		
 
 		/*
-		 * Check whether the child postmaster process is still alive.  This
-		 * lets us exit early if the postmaster fails during startup.
-		 *
-		 * On Windows, we may be checking the postmaster's parent shell, but
-		 * that's fine for this purpose.
+		 * Check whether the postmaster process is still alive.  If it's
+		 * gone, there's no point in waiting any longer.
 		 */
 		{
 			bool		pm_died;
@@ -669,7 +881,10 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 #endif
 			if (pm_died)
 			{
-				/* See if postmaster terminated intentionally */
+				/*
+				 * Postmaster died.  Check if it was an intentional shutdown
+				 * during recovery (which is OK) or an actual failure.
+				 */
 				if (get_control_dbstate() == DB_SHUTDOWNED_IN_RECOVERY)
 					return POSTMASTER_SHUTDOWN_IN_RECOVERY;
 				else
@@ -677,19 +892,18 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			}
 		}
 
-		/* Startup still in process; wait, printing a dot once per second */
+		/* Still starting up.  Print a dot once per second for user feedback */
 		if (i % WAITS_PER_SEC == 0)
 		{
 #ifdef WIN32
+			/*
+			 * On Windows, if we're running as a service, increment the
+			 * checkpoint counter so the service control manager doesn't
+			 * think we've hung.  (This has nothing to do with database
+			 * checkpoints, it's a Windows service thing.)
+			 */
 			if (do_checkpoint)
 			{
-				/*
-				 * Increment the wait hint by 6 secs (connection timeout +
-				 * sleep).  We must do this to indicate to the SCM that our
-				 * startup time is changing, otherwise it'll usually send a
-				 * stop signal after 20 seconds, despite incrementing the
-				 * checkpoint counter.
-				 */
 				status.dwWaitHint += 6000;
 				status.dwCheckPoint++;
 				SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status);
@@ -702,7 +916,7 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 		pg_usleep(USEC_PER_SEC / WAITS_PER_SEC);
 	}
 
-	/* out of patience; report that postmaster is still starting up */
+	/* Ran out of time */
 	return POSTMASTER_STILL_STARTING;
 }
 
@@ -931,8 +1145,7 @@ static void
 do_start(void)
 {
 	pid_t		old_pid = 0;
-	pid_t		pm_pid;
-
+	
 	if (ctl_command != RESTART_COMMAND)
 	{
 		old_pid = get_pgpid(false);
@@ -970,7 +1183,12 @@ do_start(void)
 	}
 #endif
 
-	pm_pid = start_postmaster();
+	start_postmaster();
+	if (pm_pid == 0)
+	{
+		write_stderr(_("%s: could not start server\n"), progname);
+		exit(1);
+	}
 
 	if (do_wait)
 	{
-- 
2.46.0.windows.1

#41Bryan Green
dbryan.green@gmail.com
In reply to: Bryan Green (#40)
1 attachment(s)
Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows

On 10/23/2025 11:02 PM, Bryan Green wrote:

On 3/26/2025 3:18 AM, vignesh C wrote:

On Sat, 20 Jul 2024 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jul 16, 2024 at 8:04 AM Yasir <yasir.hussain.shah@gmail.com>
wrote:

Please ignore the above 4 lines in my review. See my comments in blue.

OK, so I think it's unclear what the next steps are for this patch.

1. On June 3rd, Michael Paquier said that Tom Lane proposed that,
after doing what the patch currently does, we could simplify some
other stuff. The details are unclear, and Tom hasn't commented.

2. On June 29th, Noah Misch proposed a platform-independent way of
solving the problem.

3. On July 12th, Sutou Kouhei proposed using CreateProcess() to start
the postmaster instead of cmd.exe.

4. On July 16th, Yasir Shah said that he tested the patch and found
that the problem only exists in v17, not any prior release, which is
contrary to my understanding of the situation. He also proposed a
minor tweak to the patch itself.

So, as I see it, we have three possible ways forward here. First, we
could stick with the current patch, possibly with further work as per
[1] or adjustments as per [4]. Second, we could abandon the current
approach and adopt Noah's proposal in [2]. Third, we could possibly
abandon the current approach and adopt Sutou's proposal in [3]. I say
"possibly" because I can't personally assess whether this approach is
feasible.

Thank you very much, Robert, for summarizing this. If anyone has
suggestions on which approach might work best, please share them to
help move this discussion forward.

Regards,
Vignesh

Hello everyone,

I've spent the last few days implementing Sutou-san's CreateProcess()
proposal as a proof-of-concept. With all three approaches now available
for comparison, I wanted to share my assessment and recommendation.

The core issue, as we've established, is that cmd.exe intermediation
prevents pg_ctl from getting the real postgres.exe PID, creating race
conditions where we can't distinguish a newly-started postmaster from a
pre-existing one. This causes pg_ctl start to incorrectly report success
when attempting to start an already-running cluster.

Horiguchi-san's process tree walking patch works and is ready to ship.
It has real merit as a minimal-change solution. However, it keeps
cmd.exe and adds process enumeration complexity to work around that
decision. We're fixing symptoms rather than the root cause.

Noah's token proposal is architecturally elegant and platform-
independent. My concern is that it solves a Windows-specific problem by
adding complexity to all platforms. Making other platforms adopt token-
passing infrastructure for Windows's problems feels wrong. It also
requires postmaster changes for what is fundamentally a pg_ctl issue,
raising compatibility questions.

The CreateProcess() approach eliminates cmd.exe entirely and gives us
the actual postgres.exe PID directly. I've implemented this in the
attached patch (about 250 lines of Windows-specific code). The
implementation uses CreateProcessAsUser() with a restricted security
token to handle the case where pg_ctl runs with elevated privileges—this
drops Administrator group membership and dangerous privileges before
launching the postmaster, matching the behavior of the existing
CreateRestrictedProcess() function.

For I/O redirection, we use Windows's native handle-based APIs. When a
log file is specified, we open it with CreateFile() and pass it through
STARTUPINFO. When there's no log file (interactive use and postgres -C
queries), we inherit standard handles. One detail worth noting: we wait
2 seconds for no-log-file launches to distinguish postgres -C (which
exits immediately) from actual server startup. This is long enough to
catch quick exits even on loaded CI systems without impacting test suite
performance.

The implementation passes all regression tests on Windows, including CI
environments with elevated privileges. It handles all the problem
scenarios correctly: normal startup, detecting already-running clusters,
postgres -C queries, pg_upgrade with multiple instances, and concurrent
start attempts.

I prefer the CreateProcess() approach. It fixes the root cause, not the
symptoms. We get the real PID immediately with no process tree walking
or PID verification complexity. It's a Windows-specific solution to a
Windows-specific problem.

If the CreateProcess() approach is deemed unacceptable, I would support
Horiguchi-san's process tree walking patch as a pragmatic fallback. It
fixes the bug and is tested. However, I believe we'd be choosing a
workaround over a proper fix and adding maintenance burden we don't need.

I'm opposed to the token approach for the reasons stated above—it's
architecturally appealing but touches all platforms for a Windows-only
problem.

With this implementation complete, we now have all three options on the
table for evaluation. I believe the CreateProcess() approach is
technically sound and the right path forward, but I'm open to discussion
and happy to address any concerns about the implementation.

Regardless of which way we choose to go, I'll happily help review and
test the solution.

Best regards,
BG

I realized the patch had an issue after sending. Here is the updated
patch. Again, this patch is just for an example of Sutou-san's suggestion.

Attachments:

v2-0001-Fix-pg_ctl-on-Windows-to-reliably-detect-already-run.patchtext/plain; charset=UTF-8; name=v2-0001-Fix-pg_ctl-on-Windows-to-reliably-detect-already-run.patchDownload
From 12b661ec428edf8b93f722229cf18cd32559f746 Mon Sep 17 00:00:00 2001
From: Bryan Green <dbryan.green@gmail.com>
Date: Thu, 23 Oct 2025 13:52:51 -0500
Subject: [PATCH] Fix pg_ctl on Windows to reliably detect already-running
 postmaster.

The previous implementation invoked postgres.exe via cmd.exe, which meant
pg_ctl got the PID of the shell wrapper rather than the actual postmaster
process.  This caused problems in wait_for_postmaster_start(), which tries
to verify that the PID in postmaster.pid matches the one we started.  The
mismatch led to a timing window where pg_ctl could incorrectly report
success when attempting to start an already-running cluster.

Fix by replacing the cmd.exe wrapper with a direct CreateProcess() call.
This gives us the real postgres.exe PID immediately, eliminating the
need for process tree walking or other heuristics to find the actual
postmaster.  We handle I/O redirection using Windows handle-based APIs
instead of shell syntax, which is more reliable anyway.
---
 src/bin/pg_ctl/pg_ctl.c | 526 ++++++++++++++++++++++++++++------------
 1 file changed, 371 insertions(+), 155 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 8a405ff122..e0338ccc51 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -72,6 +72,7 @@ typedef enum
 
 #define WAITS_PER_SEC	10		/* should divide USEC_PER_SEC evenly */
 
+static pid_t pm_pid = 0;
 static bool do_wait = true;
 static int	wait_seconds = DEFAULT_WAIT;
 static bool wait_seconds_arg = false;
@@ -99,7 +100,6 @@ static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
 static char promote_file[MAXPGPATH];
 static char logrotate_file[MAXPGPATH];
-
 static volatile pid_t postmasterPID = -1;
 
 #ifdef WIN32
@@ -142,12 +142,14 @@ static void WINAPI pgwin32_ServiceMain(DWORD, LPTSTR *);
 static void pgwin32_doRunAsService(void);
 static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
 static PTOKEN_PRIVILEGES GetPrivilegesToDelete(HANDLE hToken);
+static void InheritStdHandles(STARTUPINFO *si);
+static HANDLE create_restricted_token(void);
 #endif
 
 static pid_t get_pgpid(bool is_status_request);
 static char **readfile(const char *path, int *numlines);
 static void free_readfile(char **optlines);
-static pid_t start_postmaster(void);
+static void start_postmaster(void);
 static void read_post_opts(void);
 
 static WaitPMResult wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint);
@@ -421,56 +423,319 @@ free_readfile(char **optlines)
 	free(optlines);
 }
 
+
 /*
- * start/test/stop routines
+ * start/test/stop routines
  */
+#ifdef WIN32
+/*
+ * Helper function to drop privileges before launching postgres.
+ */
+static HANDLE
+create_restricted_token(void)
+{
+	HANDLE		origToken;
+	HANDLE		restrictedToken;
+	SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
+	SID_AND_ATTRIBUTES dropSid;
+	PTOKEN_PRIVILEGES delPrivs;
+	BOOL		b;
+
+	if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ALL_ACCESS, &origToken))
+	{
+		write_stderr(_("%s: could not open process token: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		return NULL;
+	}
+
+	ZeroMemory(&dropSid, sizeof(dropSid));
+	if (!AllocateAndInitializeSid(&NtAuthority, 2,
+								  SECURITY_BUILTIN_DOMAIN_RID, DOMAIN_ALIAS_RID_ADMINS, 0, 0, 0, 0, 0,
+								  0, &dropSid.Sid))
+	{
+		write_stderr(_("%s: could not allocate SIDs: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		CloseHandle(origToken);
+		return NULL;
+	}
+
+	delPrivs = GetPrivilegesToDelete(origToken);
+	if (delPrivs == NULL)
+	{
+		FreeSid(dropSid.Sid);
+		CloseHandle(origToken);
+		return NULL;
+	}
+
+	b = CreateRestrictedToken(origToken,
+							  0,
+							  1,
+							  &dropSid,
+							  delPrivs->PrivilegeCount, delPrivs->Privileges,
+							  0, NULL,
+							  &restrictedToken);
+
+	free(delPrivs);
+	FreeSid(dropSid.Sid);
+	CloseHandle(origToken);
+
+	if (!b)
+	{
+		write_stderr(_("%s: could not create restricted token: error code %lu\n"),
+					 progname, (unsigned long) GetLastError());
+		return NULL;
+	}
+
+	return restrictedToken;
+}
+
+
+static pid_t
+start_postmaster_win32(void)
+{
+	HANDLE		hOutputFile = INVALID_HANDLE_VALUE;
+	HANDLE		hErrorFile = INVALID_HANDLE_VALUE;
+	HANDLE		hInputFile = INVALID_HANDLE_VALUE;
+	HANDLE		restrictedToken = NULL;
+	STARTUPINFO si;
+	PROCESS_INFORMATION pi;
+	char		cmd[MAXPGPATH * 2 + 256];
+	SECURITY_ATTRIBUTES sa;
+	DWORD		creation_flags;
+	DWORD		create_error;
+	BOOL		ret;
+	char	   *cmdline;
+	int			cmdlen;
+	BOOL		own_handles = FALSE;
+
+	cmdlen = snprintf(cmd, sizeof(cmd), "\"%s\"%s%s%s%s",
+					  exec_path,
+					  pgdata_opt ? " " : "",
+					  pgdata_opt ? pgdata_opt : "",
+					  post_opts ? " " : "",
+					  post_opts ? post_opts : "");
+
+	if (cmdlen >= sizeof(cmd))
+	{
+		write_stderr(_("%s: command line too long\n"), progname);
+		return 0;
+	}
+
+	cmdline = pg_strdup(cmd);
+
+	restrictedToken = create_restricted_token();
+	if (restrictedToken == NULL)
+	{
+		free(cmdline);
+		exit(1);
+	}
+
+	ZeroMemory(&si, sizeof(si));
+	si.cb = sizeof(si);
+
+	if (log_file != NULL)
+	{
+		ZeroMemory(&sa, sizeof(sa));
+		sa.nLength = sizeof(sa);
+		sa.bInheritHandle = TRUE;
+		sa.lpSecurityDescriptor = NULL;
+
+		hInputFile = CreateFile("NUL",
+								GENERIC_READ,
+								FILE_SHARE_READ | FILE_SHARE_WRITE,
+								&sa,
+								OPEN_EXISTING,
+								FILE_ATTRIBUTE_NORMAL,
+								NULL);
+
+		if (hInputFile == INVALID_HANDLE_VALUE)
+		{
+			write_stderr(_("%s: could not open NUL device: error code %lu\n"),
+						 progname, GetLastError());
+			CloseHandle(restrictedToken);
+			free(cmdline);
+			return 0;
+		}
+
+		hOutputFile = CreateFile(log_file,
+								 GENERIC_WRITE,
+								 FILE_SHARE_READ | FILE_SHARE_WRITE,
+								 &sa,
+								 OPEN_ALWAYS,
+								 FILE_ATTRIBUTE_NORMAL,
+								 NULL);
+
+		if (hOutputFile == INVALID_HANDLE_VALUE)
+		{
+			write_stderr(_("%s: could not open log file \"%s\": error code %lu\n"),
+						 progname, log_file, GetLastError());
+			CloseHandle(hInputFile);
+			CloseHandle(restrictedToken);
+			free(cmdline);
+			return 0;
+		}
+
+		if (SetFilePointer(hOutputFile, 0, NULL, FILE_END) == INVALID_SET_FILE_POINTER &&
+			GetLastError() != NO_ERROR)
+		{
+			write_stderr(_("%s: could not seek to end of log file: error code %lu\n"),
+						 progname, GetLastError());
+			CloseHandle(hOutputFile);
+			CloseHandle(hInputFile);
+			CloseHandle(restrictedToken);
+			free(cmdline);
+			return 0;
+		}
+
+		hErrorFile = hOutputFile;
+
+		si.dwFlags = STARTF_USESTDHANDLES;
+		si.hStdInput = hInputFile;
+		si.hStdOutput = hOutputFile;
+		si.hStdError = hErrorFile;
+
+		creation_flags = CREATE_NO_WINDOW | CREATE_NEW_PROCESS_GROUP | CREATE_SUSPENDED;
+
+		own_handles = TRUE;
+	}
+	else
+	{
+		InheritStdHandles(&si);
+		creation_flags = CREATE_SUSPENDED;
+	}
+
+	ZeroMemory(&pi, sizeof(pi));
+
+	ret = CreateProcessAsUser(restrictedToken,		/* restricted security token */
+							  NULL,					/* application name */
+							  cmdline,				/* command line */
+							  NULL,					/* process security attributes */
+							  NULL,					/* thread security attributes */
+							  TRUE,					/* inherit handles */
+							  creation_flags,		/* creation flags */
+							  NULL,					/* environment */
+							  pg_data,				/* current directory */
+							  &si,					/* startup info */
+							  &pi);					/* process info */
+
+	create_error = GetLastError();
+
+	if (own_handles)
+	{
+		if (hInputFile != INVALID_HANDLE_VALUE)
+			CloseHandle(hInputFile);
+		if (hOutputFile != INVALID_HANDLE_VALUE)
+			CloseHandle(hOutputFile);
+	}
+
+	CloseHandle(restrictedToken);
+	free(cmdline);
+
+	if (!ret)
+	{
+		write_stderr(_("%s: could not start server: error code %lu\n"),
+					 progname, create_error);
+		return 0;
+	}
+
+	ResumeThread(pi.hThread);
+	CloseHandle(pi.hThread);
+
+	/*
+	 * When there's no log file, wait briefly to see if the process exits
+	 * immediately. This distinguishes postgres -C queries from actual
+	 * server startup. Two seconds is long enough to catch quick exits
+	 * even on slow CI systems, but still fast enough not to be annoying
+	 * for normal server startup.
+	 */
+	if (log_file == NULL)
+	{
+		DWORD wait_result;
+		DWORD exit_code;
+
+		wait_result = WaitForSingleObject(pi.hProcess, 2000);
+
+		if (wait_result == WAIT_TIMEOUT)
+		{
+			/* Still running after 2 seconds - assume real server startup */
+			postmasterProcess = pi.hProcess;
+			return (pid_t) pi.dwProcessId;
+		}
+		else if (wait_result == WAIT_OBJECT_0)
+		{
+			/* Process exited quickly - server didn't stay running */
+			exit_code = 0;
+			GetExitCodeProcess(pi.hProcess, &exit_code);
+			CloseHandle(pi.hProcess);
+			write_stderr(_("%s: could not start server\n"), progname);
+			return 0;
+		}
+		else
+		{
+			/* Wait failed */
+			write_stderr(_("%s: error waiting for server: error code %lu\n"),
+						 progname, GetLastError());
+			CloseHandle(pi.hProcess);
+			return 0;
+		}
+	}
+
+	postmasterProcess = pi.hProcess;
+	return (pid_t) pi.dwProcessId;
+}
+#endif
+

 /*
- * Start the postmaster and return its PID.
+ * start_postmaster
+ *
+ * Wrapper around the platform-specific code to launch the postmaster.
  *
- * Currently, on Windows what we return is the PID of the shell process
- * that launched the postmaster (and, we trust, is waiting for it to exit).
- * So the PID is usable for "is the postmaster still running" checks,
- * but cannot be compared directly to postmaster.pid.
+ * On Unix, we fork and exec. There's no extra process layer;
+ * we get the postgres PID directly.
  *
- * On Windows, we also save aside a handle to the shell process in
- * "postmasterProcess", which the caller should close when done with it.
+ * On Windows, we use CreateProcess to launch postgres.exe directly, which
+ * gives us its PID immediately.
  */
-static pid_t
+static void
 start_postmaster(void)
 {
+#ifdef WIN32
+	pm_pid = start_postmaster_win32();
+	if (pm_pid == 0)
+		exit(1);
+#else
 	char	   *cmd;
+	int			cmdlen;
+	pid_t		fork_pid;
 
-#ifndef WIN32
-	pid_t		pm_pid;
-
-	/* Flush stdio channels just before fork, to avoid double-output problems */
+	/* Flush stdio to avoid double-output problems after fork */
 	fflush(NULL);
 
 #ifdef EXEC_BACKEND
 	pg_disable_aslr();
 #endif
 
-	pm_pid = fork();
-	if (pm_pid < 0)
+	fork_pid = fork();
+	if (fork_pid < 0)
 	{
-		/* fork failed */
 		write_stderr(_("%s: could not start server: %m\n"),
 					 progname);
 		exit(1);
 	}
-	if (pm_pid > 0)
+	if (fork_pid > 0)
 	{
-		/* fork succeeded, in parent */
-		return pm_pid;
+		/* Parent process */
+		pm_pid = fork_pid;
+		return;
 	}
 
-	/* fork succeeded, in child */
+	/* Child process */
 
 	/*
-	 * If possible, detach the postmaster process from the launching process
-	 * group and make it a group leader, so that it doesn't get signaled along
-	 * with the current group that launched it.
+	 * Detach from the parent's process group so we don't get signaled
+	 * along with it.  This is the equivalent of what CREATE_NEW_PROCESS_GROUP
+	 * does on Windows.
 	 */
 #ifdef HAVE_SETSID
 	if (setsid() < 0)
@@ -482,112 +747,72 @@ start_postmaster(void)
 #endif
 
 	/*
-	 * Since there might be quotes to handle here, it is easier simply to pass
-	 * everything to a shell to process them.  Use exec so that the postmaster
-	 * has the same PID as the current child process.
+	 * Build the shell command.  We use "exec" so the shell replaces itself
+	 * with postgres, rather than keeping the shell process around.
 	 */
 	if (log_file != NULL)
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts,
-					   DEVNULL, log_file);
+	{
+		cmdlen = snprintf(NULL, 0, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+						  exec_path,
+						  pgdata_opt ? pgdata_opt : "",
+						  post_opts ? post_opts : "",
+						  DEVNULL, log_file);
+	}
 	else
-		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
-					   exec_path, pgdata_opt, post_opts, DEVNULL);
-
-	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
-
-	/* exec failed */
-	write_stderr(_("%s: could not start server: %m\n"),
-				 progname);
-	exit(1);
-
-	return 0;					/* keep dumb compilers quiet */
-
-#else							/* WIN32 */
-
-	/*
-	 * As with the Unix case, it's easiest to use the shell (CMD.EXE) to
-	 * handle redirection etc.  Unfortunately CMD.EXE lacks any equivalent of
-	 * "exec", so we don't get to find out the postmaster's PID immediately.
-	 */
-	PROCESS_INFORMATION pi;
-	const char *comspec;
+	{
+		cmdlen = snprintf(NULL, 0, "exec \"%s\" %s%s < \"%s\" 2>&1",
+						  exec_path,
+						  pgdata_opt ? pgdata_opt : "",
+						  post_opts ? post_opts : "",
+						  DEVNULL);
+	}
 
-	/* Find CMD.EXE location using COMSPEC, if it's set */
-	comspec = getenv("COMSPEC");
-	if (comspec == NULL)
-		comspec = "CMD";
+	cmd = pg_malloc(cmdlen + 1);
 
 	if (log_file != NULL)
 	{
-		/*
-		 * First, open the log file if it exists.  The idea is that if the
-		 * file is still locked by a previous postmaster run, we'll wait until
-		 * it comes free, instead of failing with ERROR_SHARING_VIOLATION.
-		 * (It'd be better to open the file in a sharing-friendly mode, but we
-		 * can't use CMD.EXE to do that, so work around it.  Note that the
-		 * previous postmaster will still have the file open for a short time
-		 * after removing postmaster.pid.)
-		 *
-		 * If the log file doesn't exist, we *must not* create it here.  If we
-		 * were launched with higher privileges than the restricted process
-		 * will have, the log file might end up with permissions settings that
-		 * prevent the postmaster from writing on it.
-		 */
-		int			fd = open(log_file, O_RDWR, 0);
-
-		if (fd == -1)
-		{
-			/*
-			 * ENOENT is expectable since we didn't use O_CREAT.  Otherwise
-			 * complain.  We could just fall through and let CMD.EXE report
-			 * the problem, but its error reporting is pretty miserable.
-			 */
-			if (errno != ENOENT)
-			{
-				write_stderr(_("%s: could not open log file \"%s\": %m\n"),
-							 progname, log_file);
-				exit(1);
-			}
-		}
-		else
-			close(fd);
-
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
-					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
+		snprintf(cmd, cmdlen + 1, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+				 exec_path,
+				 pgdata_opt ? pgdata_opt : "",
+				 post_opts ? post_opts : "",
+				 DEVNULL, log_file);
 	}
 	else
-		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
-					   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
-
-	if (!CreateRestrictedProcess(cmd, &pi, false))
 	{
-		write_stderr(_("%s: could not start server: error code %lu\n"),
-					 progname, (unsigned long) GetLastError());
-		exit(1);
+		snprintf(cmd, cmdlen + 1, "exec \"%s\" %s%s < \"%s\" 2>&1",
+				 exec_path,
+				 pgdata_opt ? pgdata_opt : "",
+				 post_opts ? post_opts : "",
+				 DEVNULL);
 	}
-	/* Don't close command process handle here; caller must do so */
-	postmasterProcess = pi.hProcess;
-	CloseHandle(pi.hThread);
-	return pi.dwProcessId;		/* Shell's PID, not postmaster's! */
+
+	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
+
+	/* exec failed */
+	write_stderr(_("%s: could not start server: %m\n"),
+				 progname);
+	exit(1);
+
 #endif							/* WIN32 */
 }
 
 
-
 /*
- * Wait for the postmaster to become ready.
+ * wait_for_postmaster_start
+ *
+ * Wait for the postmaster to finish starting up and become ready to
+ * accept connections.
  *
- * On Unix, pm_pid is the PID of the just-launched postmaster.  On Windows,
- * it may be the PID of an ancestor shell process, so we can't check the
- * contents of postmaster.pid quite as carefully.
+ * On entry, pm_pid should be the PID of the postmaster we just launched.
+ * We poll postmaster.pid to see when it's been created and whether the
+ * PID in it matches pm_pid.  Once we see a matching PID and the status
+ * line says "ready", we're done.
  *
- * On Windows, the static variable postmasterProcess is an implicit argument
- * to this routine; it contains a handle to the postmaster process or an
- * ancestor shell process thereof.
+ * On Windows, we also use the postmasterProcess handle (set by
+ * start_postmaster_win32) to detect if the process dies unexpectedly.
  *
- * Note that the checkpoint parameter enables a Windows service control
- * manager checkpoint, it's got nothing to do with database checkpoints!!
+ * Returns POSTMASTER_READY if all is well, POSTMASTER_STILL_STARTING if
+ * we ran out of time, or POSTMASTER_FAILED if the postmaster died.
  */
 static WaitPMResult
 wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
@@ -600,63 +825,48 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 		int			numlines;
 
 		/*
-		 * Try to read the postmaster.pid file.  If it's not valid, or if the
-		 * status line isn't there yet, just keep waiting.
+		 * Try to read postmaster.pid.  If it's not there yet, or doesn't
+		 * have enough lines, just move on to the next iteration.
 		 */
 		if ((optlines = readfile(pid_file, &numlines)) != NULL &&
 			numlines >= LOCK_FILE_LINE_PM_STATUS)
 		{
-			/* File is complete enough for us, parse it */
 			pid_t		pmpid;
 			time_t		pmstart;
 
 			/*
-			 * Make sanity checks.  If it's for the wrong PID, or the recorded
-			 * start time is before pg_ctl started, then either we are looking
-			 * at the wrong data directory, or this is a pre-existing pidfile
-			 * that hasn't (yet?) been overwritten by our child postmaster.
-			 * Allow 2 seconds slop for possible cross-process clock skew.
+			 * Check that the PID and start time in the file match what we
+			 * expect.  A mismatch means either we're looking at the wrong
+			 * data directory, or this is a stale pidfile left over from a
+			 * previous postmaster.  We allow 2 seconds slop in the timestamp
+			 * comparison to handle clock skew between processes.
 			 */
 			pmpid = atol(optlines[LOCK_FILE_LINE_PID - 1]);
 			pmstart = atoll(optlines[LOCK_FILE_LINE_START_TIME - 1]);
-			if (pmstart >= start_time - 2 &&
-#ifndef WIN32
-				pmpid == pm_pid
-#else
-			/* Windows can only reject standalone-backend PIDs */
-				pmpid > 0
-#endif
-				)
+
+			if (pmstart >= start_time - 2 && pmpid == pm_pid)
 			{
 				/*
-				 * OK, seems to be a valid pidfile from our child.  Check the
-				 * status line (this assumes a v10 or later server).
+				 * It's the right pidfile.  Check whether the postmaster is
+				 * ready to accept connections.
 				 */
 				char	   *pmstatus = optlines[LOCK_FILE_LINE_PM_STATUS - 1];
 
 				if (strcmp(pmstatus, PM_STATUS_READY) == 0 ||
 					strcmp(pmstatus, PM_STATUS_STANDBY) == 0)
 				{
-					/* postmaster is done starting up */
 					free_readfile(optlines);
 					return POSTMASTER_READY;
 				}
 			}
+			free_readfile(optlines);
 		}
 
-		/*
-		 * Free the results of readfile.
-		 *
-		 * This is safe to call even if optlines is NULL.
-		 */
-		free_readfile(optlines);
+
 
 		/*
-		 * Check whether the child postmaster process is still alive.  This
-		 * lets us exit early if the postmaster fails during startup.
-		 *
-		 * On Windows, we may be checking the postmaster's parent shell, but
-		 * that's fine for this purpose.
+		 * Check whether the postmaster process is still alive.  If it's
+		 * gone, there's no point in waiting any longer.
 		 */
 		{
 			bool		pm_died;
@@ -669,7 +879,10 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 #endif
 			if (pm_died)
 			{
-				/* See if postmaster terminated intentionally */
+				/*
+				 * Postmaster died.  Check if it was an intentional shutdown
+				 * during recovery (which is OK) or an actual failure.
+				 */
 				if (get_control_dbstate() == DB_SHUTDOWNED_IN_RECOVERY)
 					return POSTMASTER_SHUTDOWN_IN_RECOVERY;
 				else
@@ -677,19 +890,18 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 			}
 		}
 
-		/* Startup still in process; wait, printing a dot once per second */
+		/* Still starting up.  Print a dot once per second for user feedback */
 		if (i % WAITS_PER_SEC == 0)
 		{
 #ifdef WIN32
+			/*
+			 * On Windows, if we're running as a service, increment the
+			 * checkpoint counter so the service control manager doesn't
+			 * think we've hung.  (This has nothing to do with database
+			 * checkpoints, it's a Windows service thing.)
+			 */
 			if (do_checkpoint)
 			{
-				/*
-				 * Increment the wait hint by 6 secs (connection timeout +
-				 * sleep).  We must do this to indicate to the SCM that our
-				 * startup time is changing, otherwise it'll usually send a
-				 * stop signal after 20 seconds, despite incrementing the
-				 * checkpoint counter.
-				 */
 				status.dwWaitHint += 6000;
 				status.dwCheckPoint++;
 				SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status);
@@ -702,7 +914,7 @@ wait_for_postmaster_start(pid_t pm_pid, bool do_checkpoint)
 		pg_usleep(USEC_PER_SEC / WAITS_PER_SEC);
 	}
 
-	/* out of patience; report that postmaster is still starting up */
+	/* Ran out of time */
 	return POSTMASTER_STILL_STARTING;
 }
 
@@ -931,8 +1143,7 @@ static void
 do_start(void)
 {
 	pid_t		old_pid = 0;
-	pid_t		pm_pid;
-
+
 	if (ctl_command != RESTART_COMMAND)
 	{
 		old_pid = get_pgpid(false);
@@ -970,7 +1181,12 @@ do_start(void)
 	}
 #endif
 
-	pm_pid = start_postmaster();
+	start_postmaster();
+	if (pm_pid == 0)
+	{
+		write_stderr(_("%s: could not start server\n"), progname);
+		exit(1);
+	}
 
 	if (do_wait)
 	{
-- 
2.46.0.windows.1