report on not thread-safe functions

Started by Peter Eisentrautalmost 2 years ago9 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

In the context of the multithreaded-server project, I looked into
potentially not thread-safe functions.

(See proposed next steps at the end of this message.)

Here is a list of functions in POSIX that are possibly not thread-safe:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01

I checked those against the PostgreSQL server source code (backend +
common + timezone), and found that the following from those are in
use:

- dlerror()
- getenv()
- getgrnam()
- getopt()
- getpwuid()
- localeconv()
- localtime()
- nl_langinfo()
- readdir()
- setenv()
- setlocale()
- strerror()
- strsignal()
- strtok()
- system()
- unsetenv()

Additionally, there are non-standard functions that are not
thread-safe, such as getopt_long().

Also, there are replacement functions such as pg_gmtime() and
pg_localtime() that mirror standard thread-unsafe functions and that
appear to be equally unsafe. (Note to those looking into annotating
global variables: You also need to check static local variables.)

Conversely, some of the above might actually be thread-safe in
some/many/all implementations. For example, strerror() and system()
are thread-safe in glibc. So you might actually get a multithreaded
server running in that environment with fewer source code changes but
have it fail in others. Just something to keep in mind.

I also tried the concurrency-mt-unsafe check from clang-tidy
(https://clang.llvm.org/extra/clang-tidy/checks/concurrency/mt-unsafe.html).
Run it for example like this:

clang-tidy -p build --quiet --checks='-*,concurrency-mt-unsafe'
src/backend/main/*.c

(needs a compilation database in the build directory)

(You can't just run it like src/backend/**/*.c because some .c files
don't compile standalone, and then the whole thing aborts with too
many errors. Maybe with a judicious exclusion list, this can be
achieved. However, it's also not good dealing with compilation
options like FRONTEND. So it can't easily serve as an automated
checker, but it's okay as a manual exploration tool.)

In addition to the POSIX list above, this also flagged:

- exit()
- sigprocmask()

Allegedly, you can toggle it between posix and glibc modes, but I
haven't succeeded with that. So for example, it does not actually
flag strerror() out of the box, presumably because that is not in its
glibc list.

Now some more detailed comments on these functions:

- dlerror()

dlerror() gets the error from the last dlopen() call, which is
obviously not thread-safe. This might require some deeper
investigation of the whole dfmgr.c mechanism. (Which might be
appropriate in any case, since in multithreaded environments, you
don't need to load a library into each session separately.)

- exit()

Most of the exit() calls happen where there are not multiple threads
active. But some emergency exit calls like in elog.c might more
correctly use _exit()?

- getenv()
- setenv()
- unsetenv()

getenv() is unsafe if there are concurrent setenv() or unsetenv()
calls. We should try to move all those to early in the program
startup. This seems doable. Some calls are related to locale stuff,
which is a separate subproject to clean up. There are some calls to
setenv("KRB5*"), which would need to be fixed. The source code
comments nearby already contain ideas how to.

- getgrnam()
- getpwuid()
- localtime()

These have _r replacements.

- getopt()

This needs a custom replacement. (There is no getopt_r() because
programs usually don't call getopt() after startup.)

(Note: This is also called during session startup, not only during
initial postmaster start. So we definitely need something here, if we
want to, like, start more than one session concurrently.)

- localeconv()
- nl_langinfo()
- setlocale()

The locale business needs to be reworked to use locale_t and _l
functions. This is already being discussed for other reasons.

- readdir()

This is listed as possibly thread-unsafe, but I think it is
thread-safe in practice. You just can't work on the same DIR handle
from multiple threads. There is a readdir_r(), but that's already
deprecated. I think we can ignore this one.

- sigprocmask()

It looks like this is safe in practice. Also, there is
pthread_sigmask() if needed.

- strerror()

Use strerror_r(). There are very few calls of this, actually, since
most potential users use printf %m.

- strsignal()

Use strsignal_r(). These calls are already wrapped in pg_strsignal()
for Windows portability, so it's easy to change.

But this also led me to think that it is potentially dangerous to have
different standards of thread-safety across the tree. pg_strsignal()
is used by wait_result_to_str() which is used by pclose_check()
... and at that point you have completely lost track of what you are
dealing with underneath. So if someone were to put, say,
pclose_check() into pgbench, it could be broken.

- strtok()

Use strtok_r() or maybe even strsep() (but there are small semantic
differences with the latter).

- system()

As mentioned above, this actually safe on some systems. If there are
systems where it's not safe, then this could require some nontrivial
work.

Suggested next steps:

- The locale API business is already being worked on under separate
cover.

- Getting rid of the setenv("KRB5*") calls is a small independently
doable project.

- Check if we can get rid of the getopt() calls at session startup.
Else figure out a thread-safe replacement.

- Replace remaining strtok() with strsep(). I think the semantics of
strsep() are actually more correct for the uses I found. (strtok()
skips over multiple adjacent separators, but strsep() would return
empty fields.)

After those, the remaining issues seem less complicated.

#2Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#1)
Re: report on not thread-safe functions

On Thu, 2024-06-06 at 16:34 +0200, Peter Eisentraut wrote:

- setlocale()

The locale business needs to be reworked to use locale_t and _l
functions.  This is already being discussed for other reasons.

I posted a few patches to do this for collation:

https://commitfest.postgresql.org/48/5023/

Regards,
Jeff Davis

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#1)
Thread-safe getopt() (was: report on not thread-safe functions)

On 06/06/2024 17:34, Peter Eisentraut wrote:

Additionally, there are non-standard functions that are not
thread-safe, such as getopt_long().

getopt_long() is not used in the server, only in client programs. The
server binary does actually accept a few "long" arguments, like --single
and --describe-config, but it has special code to handle them and
doesn't use getopt_long(). So we can leave getopt_long() alone for now.

- getopt()

This needs a custom replacement.  (There is no getopt_r() because
programs usually don't call getopt() after startup.)

(Note: This is also called during session startup, not only during
initial postmaster start.  So we definitely need something here, if we
want to, like, start more than one session concurrently.)

Here's a patch for a thread-safe version of getopt(). I implemented it
as two functions, pg_getopt_start() and pg_getopt_next(). Since there's
no standard to follow, we have freedom on how to implement it, and IMHO
that feels more natural than the single getopt() function. I took the
implementation from the getopt() replacement we already had for
platforms that don't have getopt(), moving all the global variables it
used to a struct.

The last patch attached replaces all calls in the server to use the new
variant, but leaves all the calls in client programs alone. I considered
changing the client programs as well, but there's no immediate need, and
it seems nice to use OS functions when possible.

(The first patch fixes a little harmless bug in get_stats_option_name()
that's gone unnoticed since 2006 but got in the way now.)

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Fix-latent-bug-in-get_stats_option_name.patchtext/x-patch; charset=UTF-8; name=0001-Fix-latent-bug-in-get_stats_option_name.patchDownload+2-3
0002-Invent-custom-pg_getopt_ctx-that-is-thread-safe.patchtext/x-patch; charset=UTF-8; name=0002-Invent-custom-pg_getopt_ctx-that-is-thread-safe.patchDownload+189-71
0003-Replace-getopt-with-our-re-entrant-variant-in-the-ba.patchtext/x-patch; charset=UTF-8; name=0003-Replace-getopt-with-our-re-entrant-variant-in-the-ba.patchDownload+77-92
#4Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#3)
Re: Thread-safe getopt()

On 19.05.25 22:22, Heikki Linnakangas wrote:

- getopt()

This needs a custom replacement.  (There is no getopt_r() because
programs usually don't call getopt() after startup.)

(Note: This is also called during session startup, not only during
initial postmaster start.  So we definitely need something here, if we
want to, like, start more than one session concurrently.)

Here's a patch for a thread-safe version of getopt(). I implemented it
as two functions, pg_getopt_start() and pg_getopt_next(). Since there's
no standard to follow, we have freedom on how to implement it, and IMHO
that feels more natural than the single getopt() function. I took the
implementation from the getopt() replacement we already had for
platforms that don't have getopt(), moving all the global variables it
used to a struct.

The last patch attached replaces all calls in the server to use the new
variant, but leaves all the calls in client programs alone. I considered
changing the client programs as well, but there's no immediate need, and
it seems nice to use OS functions when possible.

(The first patch fixes a little harmless bug in get_stats_option_name()
that's gone unnoticed since 2006 but got in the way now.)

That first patch seems like a genuine latent bug that should be fixed.

The API you have created here looks pretty good to me.

I don't think we need to apply this to BootstrapModeMain() or
PostmasterMain(), it would be sufficient to use it in
process_postgres_switches() in tcop/postgres.c. It don't see any
advantage in using it where not needed (let alone client programs).

(I don't suppose there is a way to get rid of the need to do
command-line option parsing for the startup packet. It seems to be too
widely used.)

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Eisentraut (#4)
Re: Thread-safe getopt()

Thanks for the review!

On 27/03/2026 16:29, Peter Eisentraut wrote:

On 19.05.25 22:22, Heikki Linnakangas wrote:

- getopt()

This needs a custom replacement.  (There is no getopt_r() because
programs usually don't call getopt() after startup.)

(Note: This is also called during session startup, not only during
initial postmaster start.  So we definitely need something here, if we
want to, like, start more than one session concurrently.)

Here's a patch for a thread-safe version of getopt(). I implemented it
as two functions, pg_getopt_start() and pg_getopt_next(). Since
there's no standard to follow, we have freedom on how to implement it,
and IMHO that feels more natural than the single getopt() function. I
took the implementation from the getopt() replacement we already had
for platforms that don't have getopt(), moving all the global
variables it used to a struct.

The last patch attached replaces all calls in the server to use the
new variant, but leaves all the calls in client programs alone. I
considered changing the client programs as well, but there's no
immediate need, and it seems nice to use OS functions when possible.

(The first patch fixes a little harmless bug in
get_stats_option_name() that's gone unnoticed since 2006 but got in
the way now.)

That first patch seems like a genuine latent bug that should be fixed.

The API you have created here looks pretty good to me.

I don't think we need to apply this to BootstrapModeMain() or
PostmasterMain(), it would be sufficient to use it in
process_postgres_switches() in tcop/postgres.c.  It don't see any
advantage in using it where not needed (let alone client programs).

I'd prefer to change those too. It's nice to be able to say "there are
no getopt() calls in the backend binary", and not have to scrutinize
which ones are OK. Also I actually like the new API better than plain
getopt().

Agreed on the client programs.

(I don't suppose there is a way to get rid of the need to do command-
line option parsing for the startup packet.  It seems to be too widely
used.)

Right..

Attached is a rebased version of the patches, no material changes.
Barring objections, I'll commit later tonight or tomorrow.

- Heikki

Attachments:

v2-0001-Fix-latent-bug-in-get_stats_option_name.patchtext/x-patch; charset=UTF-8; name=v2-0001-Fix-latent-bug-in-get_stats_option_name.patchDownload+2-3
v2-0002-Invent-custom-pg_getopt_ctx-that-is-thread-safe.patchtext/x-patch; charset=UTF-8; name=v2-0002-Invent-custom-pg_getopt_ctx-that-is-thread-safe.patchDownload+189-71
v2-0003-Replace-getopt-with-our-re-entrant-variant-in-the.patchtext/x-patch; charset=UTF-8; name=v2-0003-Replace-getopt-with-our-re-entrant-variant-in-the.patchDownload+77-92
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#5)
Re: Thread-safe getopt()

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Here's a patch for a thread-safe version of getopt(). I implemented it
as two functions, pg_getopt_start() and pg_getopt_next(). Since
there's no standard to follow, we have freedom on how to implement it,

Why in the world would we invent our own getopt API rather than
just replacing the relevant state variables with thread-local ones,
entirely internally to getopt.c?

It's not like this is a performance-critical code path. I think
the cognitive load for developers of having to learn YA API should
weigh against doing things this way.

regards, tom lane

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#6)
Re: Thread-safe getopt()

On 30/03/2026 17:34, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Here's a patch for a thread-safe version of getopt(). I implemented it
as two functions, pg_getopt_start() and pg_getopt_next(). Since
there's no standard to follow, we have freedom on how to implement it,

Why in the world would we invent our own getopt API rather than
just replacing the relevant state variables with thread-local ones,
entirely internally to getopt.c?

It's not like this is a performance-critical code path. I think
the cognitive load for developers of having to learn YA API should
weigh against doing things this way.

The new API is very similar to the same as the plain getopt() API, it'll
be quickly recognizable to anyone who's familiar with getopt().

Thread-local variables would work too, but I think we'd need to give
them different names anyway to avoid conflicting with the plain getopt()
global variables.

- Heikki

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: Thread-safe getopt()

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Thread-local variables would work too, but I think we'd need to give
them different names anyway to avoid conflicting with the plain getopt()
global variables.

Oh, excellent point. I was thinking the change could be purely local to
getopt.c, but that probably wouldn't work because some of them are in
the visible API.

regards, tom lane

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#8)
Re: Thread-safe getopt()

On 30/03/2026 17:54, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Thread-local variables would work too, but I think we'd need to give
them different names anyway to avoid conflicting with the plain getopt()
global variables.

Oh, excellent point. I was thinking the change could be purely local to
getopt.c, but that probably wouldn't work because some of them are in
the visible API.

Committed, thanks.

- Heikki