Fix errno handling in popen_check() to avoid false error reports

Started by ma lzabout 1 month ago2 messageshackers
Jump to latest
#1ma lz
ma100@hotmail.com

Hi Hackers,
I noticed a potential issue in popen_check() (src/common/wait_error.c) regarding errno handling when popen() succeeds but leaves a non-zero errno behind.
Problem
In certain environments (e.g., CentOS 7 host with Docker container running Euler 24.03, older kernel), popen() internally first tries clone3, which fails with ENOSYS (38), then falls back to traditional clone and succeeds. However, the errno value from the failed clone3 attempt (38) is not cleared, so after a successful popen(), errno remains 38.
Current code in popen_check():
c

errno = 0;
cmdfd = popen(command, mode);
if (cmdfd == NULL)
pg_log_error("could not execute command \"%s\": %m", command);
return cmdfd;

If a caller (incorrectly) checks errno even when popen() succeeds, they might see errno == ENOSYS and mistakenly believe an error occurred, even though the command executed successfully.
Proposed Fix
Move errno = 0 to the success path, so errno is only cleared when popen() actually succeeds:
c

cmdfd = popen(command, mode);
if (cmdfd == NULL) {
pg_log_error("could not execute command \"%s\": %m", command);
} else {
/* Success: clear any stale errno from internal fallback attempts */
errno = 0;
}
return cmdfd;

This ensures that after a successful popen(), errno is zero, preventing any confusion in upper layers that might (for historical reasons) inspect errno even on success.
Testing
I tested this change in the environment where the issue was observed:

*
Host: CentOS 7
*
Container: Euler 24.03 (glibc compiled with __ASSUME_CLONE3)
*
Before the fix: errno remained 38 after successful popen()
*
After the fix: errno is properly reset to 0

All regression tests pass.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: ma lz (#1)
Re: Fix errno handling in popen_check() to avoid false error reports

ma lz <ma100@hotmail.com> writes:

In certain environments (e.g., CentOS 7 host with Docker container running Euler 24.03, older kernel), popen() internally first tries clone3, which fails with ENOSYS (38), then falls back to traditional clone and succeeds. However, the errno value from the failed clone3 attempt (38) is not cleared, so after a successful popen(), errno remains 38.

Sure.

If a caller (incorrectly) checks errno even when popen() succeeds, they might see errno == ENOSYS and mistakenly believe an error occurred, even though the command executed successfully.

That would be a a bug in that caller (and I don't see any such
caller). It is *extremely* common for system functions to trash errno
in non-failure paths. That variable is only promised to be meaningful
after a failure return.

The point of the errno = 0 step in popen_check() is so that we don't
print an unrelated error code if popen fails without setting errno.
POSIX doesn't require it to set errno, oddly enough:

Upon successful completion, popen() shall return a pointer to an
open stream that can be used to read or write to the
pipe. Otherwise, it shall return a null pointer and may set errno
to indicate the error.

Proposed Fix
Move errno = 0 to the success path, so errno is only cleared when popen() actually succeeds:

This is based on a complete misunderstanding of what that step is for.
It would break the case we're protecting against without fixing
anything.

regards, tom lane