pg_regress: lookup shellprog in $PATH

Started by Gurjeet Singhover 3 years ago10 messages
#1Gurjeet Singh
gurjeet@singh.im
1 attachment(s)

I'm trying to build Postgres using the Nix language and the Nix package
manager on macOS (see [1]https://github.com/DrPostgres/HaathiFarm/blob/master/default.nix). After some work I was able to build, and even
run Postgres. But `make check` failed with the error

pg_regress: could not exec "sh": No such file or directory

The reason is that pg_regress uses execl() function to execute the shell
recorded in its shellprog variable, whose value is provided using the SHELL
variable via the makefile. Specifically, this error is because execl()
function expects the full path of the executable as the first parameter,
and it does _not_ perform a lookup in $PATH.

Using execl() in pg_regress has worked in the past, because the default
value of SHELL used by GNUmake (and I presume other make implementations)
is /bin/sh, and /bin/sh expected to be present on any Unix-like system.

But in Nixpkgs (the default and the central package repository of Nix/NixOS
distro), they have chosen to patch GNU make, and turn the default value of
SHELL from '/bin/sh' to just 'sh'; see their patch [2]https://github.com/NixOS/nixpkgs/blob/release-22.05/pkgs/development/tools/build-managers/gnumake/0001-No-impure-bin-sh.patch. They did this
because Nixpkgs consider any files outside the Nix Store (the path
/nix/store/, by default) to be "impure". They want the packagers to use
$PATH (consisting solely of paths that begin with /nix/store/...), to
lookup their binaries and other files.

So when pg_regress tries to run a program (the postmaster, in this case),
the execl() function complains that it could not find 'sh', since there's
no file ./sh in the directory where pg_regress is being run.

Please see attached the one-letter patch that fixes this problem. I have
chosen to replace the execl() call with execlp(), which performs a lookup
in $PATH, and finds the 'sh' to use for running the postmaster. This patch
does _not_ cause 'make check' or any other failures when Postgres is built
with non-Nix build tools available on macOS.

There is one other use of execl(), in pg_ctl.c, but that is safe from the
behaviour introduced by Nixpkgs, since that call site uses the absolute
path /bin/sh, and hence there's no ambiguity in where to look for the
executable.

There are no previous uses of execlp() in Postgres, which made me rethink
this patch. But I think it's safe to use execlp() since it's part of POSIX,
and it's also supported by Windows (see [3]https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/execlp?view=msvc-170; they say the function name is
"deprecated" but function is "supported" in the same paragraph!!).

There's one mention of execl in src/pl/plperl/ppport.h, and since it's a
generated file, I believe now execlp also needs to be added to that list.
But I'm not sure how to generate that file, or if it really needs to be
generated and included in this patch; maybe the file is re-generated during
a release process. Need advice on that.

GNU make's docs clearly explain (see [4]https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html) the special handling of variable
SHELL, and it seems impossible to pass this variable from an env variable
down to the GNUmakefile of interest. The only alternative I see for us
being able to pass a custom value via SHELL, is to detect and declare the
SHELL variable in one of our higher-level files; and I don't think that'd
be a good idea.

We could propose to Nixpkgs community that they stop patching make, and
leave the default SHELL value alone. But I see very low likelihood of them
accepting our arguments, or changing their ways.

It took many days of debugging, troubleshooting etc, to get to this
root-cause. I first tried to coax autoconf, make, etc. to pass my custom
SHELL through to pg_regress' makefile. Changing CONFIG_SHELL, or SHELL does
not have any impact. Then I had to read the Nixpkgs code, and work out the
archaic ways the packages are defined, and after much code wrangling I was
able to find out that _they_ changed the default value of SHELL by patching
the make sources.

The Nix language is not so bad, but the way it's used to write code in the
Nix community leaves a lot to be desired; ad-hoc environment variable
naming, polluting the built environment with all kinds of variables, almost
non-existent comments, no standards on indentation, etc. These reasons made
me decide to use the plain Nix language as much as possible, and not rely
on Nixpkgs, whenever I can avoid it.

The Nixpkgs and NixOS distro includes all the supported versions of
Postgres, so one would assume they would've also encountered, and solved,
this problem. But they didn't. My best guess as to why, is, I believe they
never bothered to run `make check` on their built binaries.

[1]: https://github.com/DrPostgres/HaathiFarm/blob/master/default.nix
[2]: https://github.com/NixOS/nixpkgs/blob/release-22.05/pkgs/development/tools/build-managers/gnumake/0001-No-impure-bin-sh.patch
https://github.com/NixOS/nixpkgs/blob/release-22.05/pkgs/development/tools/build-managers/gnumake/0001-No-impure-bin-sh.patch
[3]: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/execlp?view=msvc-170
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/execlp?view=msvc-170
[4]: https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html
https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html

Best regards,
Gurjeet
http://Gurje.et

Attachments:

pg_regress_use_execlp_to_lookup_shellprog.patchapplication/octet-stream; name=pg_regress_use_execlp_to_lookup_shellprog.patchDownload
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 9ca1a8d906..e2e88f621f 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1095,3 +1095,3 @@ spawn_process(const char *cmdline)
 		cmdline2 = psprintf("exec %s", cmdline);
-		execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
+		execlp(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
 		fprintf(stderr, _("%s: could not exec \"%s\": %s\n"),
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gurjeet Singh (#1)
Re: pg_regress: lookup shellprog in $PATH

Gurjeet Singh <gurjeet@singh.im> writes:

Please see attached the one-letter patch that fixes this problem. I have
chosen to replace the execl() call with execlp(), which performs a lookup
in $PATH, and finds the 'sh' to use for running the postmaster.

I can't say that I think this is a great fix. It creates security
questions that did not exist before, even without the point you
make about Windows considering execlp deprecated.

Given the lack of complaints about how pg_ctl works, I'd be inclined
to follow its lead and just hard-wire "/bin/sh", removing the whole
SHELLPROG/shellprog dance. I have not heard of anyone using the
theoretical ability to compile pg_regress with some other value.

The Nixpkgs and NixOS distro includes all the supported versions of
Postgres, so one would assume they would've also encountered, and solved,
this problem. But they didn't. My best guess as to why, is, I believe they
never bothered to run `make check` on their built binaries.

TBH, it's not clear to me that that project is competent enough to
be something we should take into account. But in any case, I'd
rather see us using fewer ways to do this, not more.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: pg_regress: lookup shellprog in $PATH

I wrote:

Given the lack of complaints about how pg_ctl works, I'd be inclined
to follow its lead and just hard-wire "/bin/sh", removing the whole
SHELLPROG/shellprog dance. I have not heard of anyone using the
theoretical ability to compile pg_regress with some other value.

git blame blames that whole mechanism on me: 60cfe25e68d. It looks
like the reason I did it like that is that I was replacing use of
system(3) with execl(), and system(3) is defined thus by POSIX:

execl(<shell path>, "sh", "-c", command, (char *)0);

where <shell path> is an unspecified pathname for the sh utility.

Using SHELL for the "unspecified path" is already a bit of a leap
of faith, since users are allowed to make that point at a non-Bourne
shell. I don't see any strong reason to preserve that behavior.

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: pg_regress: lookup shellprog in $PATH

On Wed, Aug 24, 2022 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

git blame blames that whole mechanism on me: 60cfe25e68d. It looks
like the reason I did it like that is that I was replacing use of
system(3) with execl(), and system(3) is defined thus by POSIX:

execl(<shell path>, "sh", "-c", command, (char *)0);

where <shell path> is an unspecified pathname for the sh utility.

Using SHELL for the "unspecified path" is already a bit of a leap
of faith, since users are allowed to make that point at a non-Bourne
shell. I don't see any strong reason to preserve that behavior.

It seems weird that you use any arbitrary shell to run 'sh', but I
guess the point is that your shell command, whatever it is, is
supposed to be a full pathname, and then it can do pathname resolution
to figure out where you 'sh' executable is. So that makes me think
that the problem Gurjeet is reporting is an issue with Nix rather than
an issue with PostgreSQL.

But what we've got is:

[rhaas pgsql]$ git grep execl\(
src/bin/pg_ctl/pg_ctl.c: (void) execl("/bin/sh", "/bin/sh", "-c", cmd,
(char *) NULL);
src/test/regress/pg_regress.c: execl(shellprog, shellprog, "-c",
cmdline2, (char *) NULL);

And surely that's stupid. The whole point here has to be that if you
want to run something called 'sh' but don't know where it is, you need
to execute a shell at a known pathname to figure it out for you.

We could do as you propose and I don't think we would be worse off
than we are today. But I'm confused why the correct formulation
wouldn't be exactly what POSIX specifies, namely execl(shellprog,
"sh", "-c", ...). That way, if somebody has a system where they do set
$SHELL properly but do not have /bin/sh, things would still work.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: pg_regress: lookup shellprog in $PATH

Robert Haas <robertmhaas@gmail.com> writes:

But what we've got is:

[rhaas pgsql]$ git grep execl\(
src/bin/pg_ctl/pg_ctl.c: (void) execl("/bin/sh", "/bin/sh", "-c", cmd,
(char *) NULL);
src/test/regress/pg_regress.c: execl(shellprog, shellprog, "-c",
cmdline2, (char *) NULL);

Right. I wouldn't really feel a need to change anything, except
that we have this weird inconsistency between the way pg_ctl does
it and the way pg_regress does it. I think we should settle on
just one way.

We could do as you propose and I don't think we would be worse off
than we are today. But I'm confused why the correct formulation
wouldn't be exactly what POSIX specifies, namely execl(shellprog,
"sh", "-c", ...). That way, if somebody has a system where they do set
$SHELL properly but do not have /bin/sh, things would still work.

My point is that that *isn't* what POSIX specifies. They say in so
many words that the path actually used by system(3) is unspecified.
They do NOT say that it's the value of $SHELL, and given that you're
allowed to set $SHELL to a non-POSIX-compatible shell, using that
is really wrong. We've gotten away with it so far because we
resolve $SHELL at build time not run time, but it's still shaky.

Interestingly, if you look at concrete man pages, you tend to find
something else. Linux says

The system() library function uses fork(2) to create a child process
that executes the shell command specified in command using execl(3) as
follows:
execl("/bin/sh", "sh", "-c", command, (char *) 0);

My BSD machines say "the command is handed to sh(1)", without committing
to just how that's found ... but guess what, "which sh" finds /bin/sh.

In any case, I can't find any system(3) that relies on $SHELL,
so my translation wasn't correct according to either the letter
of POSIX or common practice. It's supposed to be more or less
a hard-wired path, they just don't want to commit to which path.

Moreover, leaving aside the question of whether pg_regress'
current behavior is actually bug-compatible with system(3),
what is the argument that it needs to be? We have at this
point sufficient experience with pg_ctl's use of /bin/sh
to be pretty confident that that works everywhere. So let's
standardize on the simpler way, not the more complex way.

(It looks like pg_ctl has used /bin/sh since 6bcce25801c3f
of Oct 2015.)

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: pg_regress: lookup shellprog in $PATH

On Thu, Aug 25, 2022 at 10:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

My point is that that *isn't* what POSIX specifies. They say in so
many words that the path actually used by system(3) is unspecified.
They do NOT say that it's the value of $SHELL, and given that you're
allowed to set $SHELL to a non-POSIX-compatible shell, using that
is really wrong. We've gotten away with it so far because we
resolve $SHELL at build time not run time, but it's still shaky.

Interestingly, if you look at concrete man pages, you tend to find
something else. Linux says

The system() library function uses fork(2) to create a child process
that executes the shell command specified in command using execl(3) as
follows:
execl("/bin/sh", "sh", "-c", command, (char *) 0);

My BSD machines say "the command is handed to sh(1)", without committing
to just how that's found ... but guess what, "which sh" finds /bin/sh.

In any case, I can't find any system(3) that relies on $SHELL,
so my translation wasn't correct according to either the letter
of POSIX or common practice. It's supposed to be more or less
a hard-wired path, they just don't want to commit to which path.

Moreover, leaving aside the question of whether pg_regress'
current behavior is actually bug-compatible with system(3),
what is the argument that it needs to be? We have at this
point sufficient experience with pg_ctl's use of /bin/sh
to be pretty confident that that works everywhere. So let's
standardize on the simpler way, not the more complex way.

I mean, I can see you're on the warpath here and I don't care enough
to fight about it very much, but as a matter of theory, I believe that
hard-coded pathnames suck. Giving the user a way to survive if /bin/sh
doesn't exist on their system or isn't the path they want to use seems
fundamentally sensible to me. Now if system() doesn't do that anyhow,
well then there is no such mechanism in such cases, and so the benefit
of providing one in the tiny number of other cases that we have may
not be there. But if you're trying to convince me that hard-coded
paths are as a theoretical matter brilliant, I'm not buying it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: pg_regress: lookup shellprog in $PATH

Robert Haas <robertmhaas@gmail.com> writes:

I mean, I can see you're on the warpath here and I don't care enough
to fight about it very much, but as a matter of theory, I believe that
hard-coded pathnames suck. Giving the user a way to survive if /bin/sh
doesn't exist on their system or isn't the path they want to use seems
fundamentally sensible to me. Now if system() doesn't do that anyhow,
well then there is no such mechanism in such cases, and so the benefit
of providing one in the tiny number of other cases that we have may
not be there. But if you're trying to convince me that hard-coded
paths are as a theoretical matter brilliant, I'm not buying it.

If we were executing a program that the user needs to have some control
over, sure, but what we have here is an implementation detail that I
doubt anyone cares about. The fact that we're using a shell at all is
only because nobody has cared to manually implement I/O redirection logic
in these places; otherwise we'd be execl()'ing the server or psql directly.
Maybe the best answer would be to do that, and get out of the business
of knowing where the shell is?

regards, tom lane

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: pg_regress: lookup shellprog in $PATH

On Thu, Aug 25, 2022 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we were executing a program that the user needs to have some control
over, sure, but what we have here is an implementation detail that I
doubt anyone cares about. The fact that we're using a shell at all is
only because nobody has cared to manually implement I/O redirection logic
in these places; otherwise we'd be execl()'ing the server or psql directly.
Maybe the best answer would be to do that, and get out of the business
of knowing where the shell is?

Well that also would not be crazy.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
1 attachment(s)
Re: pg_regress: lookup shellprog in $PATH

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 25, 2022 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we were executing a program that the user needs to have some control
over, sure, but what we have here is an implementation detail that I
doubt anyone cares about. The fact that we're using a shell at all is
only because nobody has cared to manually implement I/O redirection logic
in these places; otherwise we'd be execl()'ing the server or psql directly.
Maybe the best answer would be to do that, and get out of the business
of knowing where the shell is?

Well that also would not be crazy.

I experimented with this, and it seems like it might not be as awful as
we've always assumed it would be. Attached is an incomplete POC that
converts pg_regress proper to doing things this way. (isolationtester
and pg_regress_ecpg are outright broken by this patch, because they rely
on pg_regress' spawn_process and I didn't fix them yet. But you can run
the core regression tests to see it works.)

The Windows side of this is completely untested and may be broken; also,
perhaps Windows has something more nearly equivalent to execvp() that we
could use instead of reconstructing a command line? It's annoying that
the patch removes all shell-quoting hazards on the Unix side but they
are still there on the Windows side.

regards, tom lane

Attachments:

avoid-using-shell-to-launch-processes-wip.patchtext/x-diff; charset=us-ascii; name=avoid-using-shell-to-launch-processes-wip.patchDownload
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index a803355f8e..e1e0d5f7a2 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -19,6 +19,7 @@
 #include "postgres_fe.h"
 
 #include <ctype.h>
+#include <fcntl.h>
 #include <sys/resource.h>
 #include <sys/stat.h>
 #include <sys/time.h>
@@ -51,10 +52,6 @@ typedef struct _resultmap
  */
 char	   *host_platform = HOST_TUPLE;
 
-#ifndef WIN32					/* not used in WIN32 case */
-static char *shellprog = SHELLPROG;
-#endif
-
 /*
  * On Windows we use -w in diff switches to avoid problems with inconsistent
  * newline representation.  The actual result files will generally have
@@ -73,7 +70,7 @@ _stringlist *dblist = NULL;
 bool		debug = false;
 char	   *inputdir = ".";
 char	   *outputdir = ".";
-char       *expecteddir = ".";
+char	   *expecteddir = ".";
 char	   *bindir = PGBINDIR;
 char	   *launcher = NULL;
 static _stringlist *loadextension = NULL;
@@ -1052,12 +1049,71 @@ psql_end_command(StringInfo buf, const char *database)
 	} while (0)
 
 /*
- * Spawn a process to execute the given shell command; don't wait for it
+ * Redirect f to the file specified by fpath, opened with given flags and mode
+ * Does not return upon failure
+ */
+static void
+redirect_to(FILE *f, const char *fpath, int flags, int mode)
+{
+	int			fh;
+
+	/* Let's just be sure the FILE is idle */
+	fflush(f);
+
+	fh = open(fpath, flags, mode);
+	if (fh < 0)
+	{
+		fprintf(stderr, "could not open file \"%s\": %m\n", fpath);
+		_exit(1);
+	}
+	if (dup2(fh, fileno(f)) < 0)
+	{
+		fprintf(stderr, "could not duplicate descriptor for file \"%s\": %m\n",
+				fpath);
+		_exit(1);
+	}
+	(void) close(fh);
+}
+
+/*
+ * Redirect f to the same file used by fref
+ * Does not return upon failure
+ */
+static void
+redirect_link(FILE *f, FILE *fref)
+{
+	/* Let's just be sure the FILE is idle */
+	fflush(f);
+
+	if (dup2(fileno(fref), fileno(f)) < 0)
+	{
+		fprintf(stderr, "could not duplicate file descriptor: %m\n");
+		_exit(1);
+	}
+}
+
+/*
+ * Spawn a process to execute a sub-command; don't wait for it
  *
  * Returns the process ID (or HANDLE) so we can wait for it later
+ * Does not return upon failure
+ *
+ * Arguments:
+ * file: program to be executed (may be a full path, or just program name)
+ * argv: NULL-terminated array of argument strings, as for execvp(2);
+ *       argv[0] should normally be the same as file
+ * proc_stdin: filename to make child's stdin read from, or NULL
+ *       for no redirection
+ * proc_stdout: filename to make child's stdout write to, or NULL
+ *       for no redirection
+ * proc_stderr: filename to make child's stderr write to, or NULL
+ *       for no redirection, or "&1" to link it to child's stdout
  */
 PID_TYPE
-spawn_process(const char *cmdline)
+spawn_process(const char *file, char *argv[],
+			  const char *proc_stdin,
+			  const char *proc_stdout,
+			  const char *proc_stderr)
 {
 #ifndef WIN32
 	pid_t		pid;
@@ -1085,35 +1141,65 @@ spawn_process(const char *cmdline)
 	if (pid == 0)
 	{
 		/*
-		 * In child
-		 *
-		 * Instead of using system(), exec the shell directly, and tell it to
-		 * "exec" the command too.  This saves two useless processes per
-		 * parallel test case.
+		 * In child: redirect stdio as requested, then execvp()
 		 */
-		char	   *cmdline2;
-
-		cmdline2 = psprintf("exec %s", cmdline);
-		execl(shellprog, shellprog, "-c", cmdline2, (char *) NULL);
+		if (proc_stdin)
+			redirect_to(stdin, proc_stdin, O_RDONLY, 0);
+		if (proc_stdout)
+			redirect_to(stdout, proc_stdout, O_WRONLY | O_CREAT | O_TRUNC,
+						S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
+		if (proc_stderr)
+		{
+			if (strcmp(proc_stderr, "&1") == 0)
+				redirect_link(stderr, stdout);
+			else
+				redirect_to(stderr, proc_stderr, O_WRONLY | O_CREAT | O_TRUNC,
+							S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
+		}
+		execvp(file, argv);
+		/* must have failed */
 		fprintf(stderr, _("%s: could not exec \"%s\": %s\n"),
-				progname, shellprog, strerror(errno));
+				progname, file, strerror(errno));
 		_exit(1);				/* not exit() here... */
 	}
 	/* in parent */
 	return pid;
 #else
 	PROCESS_INFORMATION pi;
+	StringInfoData cmdline;
 	char	   *cmdline2;
 	HANDLE		restrictedToken;
 	const char *comspec;
 
+	/*
+	 * Construct the equivalent command string.  We assume simplistic double
+	 * quoting of the arguments is sufficient.
+	 */
+	initStringInfo(&cmdline);
+	appendStringInfo(&cmdline, "\"%s\"", path);
+	for (int i = 1; argv[i] != NULL; i++)
+		appendStringInfo(&cmdline, " \"%s\"", argv[i]);
+	if (proc_stdin)
+		appendStringInfo(&cmdline, " <\"%s\"", proc_stdin);
+	if (proc_stdout)
+		appendStringInfo(&cmdline, " >\"%s\"", proc_stdout);
+	if (proc_stderr)
+	{
+		if (strcmp(proc_stderr, "&1") == 0)
+			appendStringInfo(&cmdline, " 2>&1");
+		else
+			appendStringInfo(&cmdline, " 2>\"%s\"", proc_stderr);
+	}
+
 	/* Find CMD.EXE location using COMSPEC, if it's set */
 	comspec = getenv("COMSPEC");
 	if (comspec == NULL)
 		comspec = "CMD";
 
+	/* Prepare command for CMD.EXE */
+	cmdline2 = psprintf("\"%s\" /c \"%s\"", comspec, cmdline.data);
+
 	memset(&pi, 0, sizeof(pi));
-	cmdline2 = psprintf("\"%s\" /c \"%s\"", comspec, cmdline);
 
 	if ((restrictedToken =
 		 CreateRestrictedProcess(cmdline2, &pi)) == 0)
@@ -2228,6 +2314,8 @@ regression_main(int argc, char *argv[],
 		FILE	   *pg_conf;
 		const char *env_wait;
 		int			wait_seconds;
+		char	   *pmargv[20];
+		char	   *outputname;
 
 		/*
 		 * Prepare the temp instance
@@ -2361,16 +2449,30 @@ regression_main(int argc, char *argv[],
 		 * Start the temp postmaster
 		 */
 		header(_("starting postmaster"));
-		snprintf(buf, sizeof(buf),
-				 "\"%s%spostgres\" -D \"%s/data\" -F%s "
-				 "-c \"listen_addresses=%s\" -k \"%s\" "
-				 "> \"%s/log/postmaster.log\" 2>&1",
-				 bindir ? bindir : "",
-				 bindir ? "/" : "",
-				 temp_instance, debug ? " -d 5" : "",
-				 hostname ? hostname : "", sockdir ? sockdir : "",
-				 outputdir);
-		postmaster_pid = spawn_process(buf);
+		i = 0;
+		pmargv[i++] = psprintf("%s%spostgres",
+							   bindir ? bindir : "",
+							   bindir ? "/" : "");
+		pmargv[i++] = "-D";
+		pmargv[i++] = psprintf("%s/data", temp_instance);
+		pmargv[i++] = "-F";
+		if (debug)
+		{
+			pmargv[i++] = "-d";
+			pmargv[i++] = "5";
+		}
+		pmargv[i++] = "-c";
+		pmargv[i++] = psprintf("listen_addresses=%s",
+							   hostname ? hostname : "");
+		pmargv[i++] = "-k";
+		pmargv[i++] = psprintf("%s", sockdir ? sockdir : "");
+		pmargv[i++] = NULL;
+		Assert(i <= lengthof(pmargv));
+		outputname = psprintf("%s/log/postmaster.log", outputdir);
+		postmaster_pid = spawn_process(pmargv[0], pmargv,
+									   NULL,
+									   outputname,
+									   "&1");
 		if (postmaster_pid == INVALID_PID)
 		{
 			fprintf(stderr, _("\n%s: could not spawn postmaster: %s\n"),
diff --git a/src/test/regress/pg_regress.h b/src/test/regress/pg_regress.h
index d8772fec8e..a9eb1a3ba1 100644
--- a/src/test/regress/pg_regress.h
+++ b/src/test/regress/pg_regress.h
@@ -65,5 +65,8 @@ int			regression_main(int argc, char *argv[],
 							postprocess_result_function postfunc);
 
 void		add_stringlist_item(_stringlist **listhead, const char *str);
-PID_TYPE	spawn_process(const char *cmdline);
+PID_TYPE	spawn_process(const char *file, char *argv[],
+						  const char *proc_stdin,
+						  const char *proc_stdout,
+						  const char *proc_stderr);
 bool		file_exists(const char *file);
diff --git a/src/test/regress/pg_regress_main.c b/src/test/regress/pg_regress_main.c
index a4b354c9e6..0b5ac124e9 100644
--- a/src/test/regress/pg_regress_main.c
+++ b/src/test/regress/pg_regress_main.c
@@ -34,8 +34,8 @@ psql_start_test(const char *testname,
 	char		infile[MAXPGPATH];
 	char		outfile[MAXPGPATH];
 	char		expectfile[MAXPGPATH];
-	char		psql_cmd[MAXPGPATH * 3];
-	size_t		offset = 0;
+	char	   *psqlargv[20];
+	int			i = 0;
 	char	   *appnameenv;
 
 	/*
@@ -63,40 +63,30 @@ psql_start_test(const char *testname,
 	add_stringlist_item(expectfiles, expectfile);
 
 	if (launcher)
-	{
-		offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
-						   "%s ", launcher);
-		if (offset >= sizeof(psql_cmd))
-		{
-			fprintf(stderr, _("command too long\n"));
-			exit(2);
-		}
-	}
-
-	/*
-	 * Use HIDE_TABLEAM to hide different AMs to allow to use regression tests
-	 * against different AMs without unnecessary differences.
-	 */
-	offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
-					   "\"%s%spsql\" -X -a -q -d \"%s\" %s < \"%s\" > \"%s\" 2>&1",
-					   bindir ? bindir : "",
-					   bindir ? "/" : "",
-					   dblist->str,
-					   "-v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on",
-					   infile,
-					   outfile);
-	if (offset >= sizeof(psql_cmd))
-	{
-		fprintf(stderr, _("command too long\n"));
-		exit(2);
-	}
+		psqlargv[i++] = launcher;
+
+	psqlargv[i++] = psprintf("%s%spsql",
+							 bindir ? bindir : "",
+							 bindir ? "/" : "");
+	psqlargv[i++] = "-Xaq";
+	psqlargv[i++] = "-d";
+	psqlargv[i++] = dblist->str;
+	psqlargv[i++] = "-v";
+	/* Hide TABLEAM and compression to allow tests against different AMs */
+	psqlargv[i++] = "HIDE_TABLEAM=on";
+	psqlargv[i++] = "-v";
+	psqlargv[i++] = "HIDE_TOAST_COMPRESSION=on";
+	psqlargv[i++] = NULL;
+	Assert(i <= lengthof(psqlargv));
 
 	appnameenv = psprintf("pg_regress/%s", testname);
 	setenv("PGAPPNAME", appnameenv, 1);
 	free(appnameenv);
 
-	pid = spawn_process(psql_cmd);
-
+	pid = spawn_process(psqlargv[0], psqlargv,
+						infile,
+						outfile,
+						"&1");
 	if (pid == INVALID_PID)
 	{
 		fprintf(stderr, _("could not start process for test %s\n"),
#10Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#9)
Re: pg_regress: lookup shellprog in $PATH

On Thu, Aug 25, 2022 at 04:04:39PM -0400, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Aug 25, 2022 at 10:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If we were executing a program that the user needs to have some control
over, sure, but what we have here is an implementation detail that I
doubt anyone cares about. The fact that we're using a shell at all is
only because nobody has cared to manually implement I/O redirection logic
in these places; otherwise we'd be execl()'ing the server or psql directly.
Maybe the best answer would be to do that, and get out of the business
of knowing where the shell is?

The Windows side of this is completely untested and may be broken; also,
perhaps Windows has something more nearly equivalent to execvp() that we
could use instead of reconstructing a command line? It's annoying that

Windows has nothing like execvp(), unfortunately.

the patch removes all shell-quoting hazards on the Unix side but they
are still there on the Windows side.

It's feasible to take cmd.exe out of the loop. One could then eliminate
cmd.exe quoting (the "^" characters). Can't avoid the rest of the quoting
(https://docs.microsoft.com/en-us/cpp/cpp/main-function-command-line-args#parsing-c-command-line-arguments).
Bypassing cmd.exe would also make it easy to remove the ban on newlines and
carriage returns in arguments.