Use of signal-unsafe functions from signal handlers
Hi all,
Typically, signal-unsafe functions should not be called from signal
handlers. In particular, calling malloc() directly or indirectly can cause
deadlocks, making PostgreSQL unresponsive to signals.
Unless I am missing something, bgworker_die uses ereport, which indirectly
calls printf-like functions, which are not signal-safe since they use
malloc(). In rare cases, this can lead to deadlocks with stacks that look
like this (from https://github.com/timescale/timescaledb/issues/4200):
#0 0x00007f0e4d1040eb in __lll_lock_wait_private () from
target:/lib/x86_64-linux-gnu/libc.so.6
[...]
#3 malloc (size=53)
[...]
#7 0x000055b9212235b1 in errmsg ()
#8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
/build/timescaledb/src/bgw/scheduler.c:841
#9 <signal handler called>
[...]
#13 free (ptr=<optimized out>)
#14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
[...]
A simple fix for this is to introduce a signal-safe version of write_stderr
and use that from bgworker_die, which the attached patch does. Am I missing
something or is this a bug?
Best wishes,
Mats Kindahl (Timescale)
Attachments:
0001-Add-signal-safe-function-to-write-to-log.patchtext/x-patch; charset=US-ASCII; name=0001-Add-signal-safe-function-to-write-to-log.patchDownload+34-10
Hi,
On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
Typically, signal-unsafe functions should not be called from signal
handlers. In particular, calling malloc() directly or indirectly can cause
deadlocks, making PostgreSQL unresponsive to signals.Unless I am missing something, bgworker_die uses ereport, which indirectly
calls printf-like functions, which are not signal-safe since they use
malloc(). In rare cases, this can lead to deadlocks with stacks that look
like this (from https://github.com/timescale/timescaledb/issues/4200):#0 0x00007f0e4d1040eb in __lll_lock_wait_private () from
target:/lib/x86_64-linux-gnu/libc.so.6
[...]
#3 malloc (size=53)
[...]
#7 0x000055b9212235b1 in errmsg ()
#8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
/build/timescaledb/src/bgw/scheduler.c:841
#9 <signal handler called>
[...]
#13 free (ptr=<optimized out>)
#14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
[...]
As far as I can see the problem comes from your handle_sigterm:
static void handle_sigterm(SIGNAL_ARGS)
{
/*
* do not use a level >= ERROR because we don't want to exit here but
* rather only during CHECK_FOR_INTERRUPTS
*/
ereport(LOG,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating TimescaleDB job scheduler due to administrator command")));
die(postgres_signal_arg);
}
As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory. But this is only for ERROR or
higher levels. Using an ereport(LOG, ...) level in a sigterm handler indeed
isn't safe.
On Tue, May 24, 2022 at 12:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,
On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
Typically, signal-unsafe functions should not be called from signal
handlers. In particular, calling malloc() directly or indirectly cancause
deadlocks, making PostgreSQL unresponsive to signals.
Unless I am missing something, bgworker_die uses ereport, which
indirectly
calls printf-like functions, which are not signal-safe since they use
malloc(). In rare cases, this can lead to deadlocks with stacks that look
like this (from https://github.com/timescale/timescaledb/issues/4200):#0 0x00007f0e4d1040eb in __lll_lock_wait_private () from
target:/lib/x86_64-linux-gnu/libc.so.6
[...]
#3 malloc (size=53)
[...]
#7 0x000055b9212235b1 in errmsg ()
#8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
/build/timescaledb/src/bgw/scheduler.c:841
#9 <signal handler called>
[...]
#13 free (ptr=<optimized out>)
#14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
[...]As far as I can see the problem comes from your handle_sigterm:
static void handle_sigterm(SIGNAL_ARGS)
{
/*
* do not use a level >= ERROR because we don't want to exit here
but
* rather only during CHECK_FOR_INTERRUPTS
*/
ereport(LOG,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating TimescaleDB job scheduler due
to administrator command")));
die(postgres_signal_arg);
}As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory. But this is only for ERROR or
higher levels. Using an ereport(LOG, ...) level in a sigterm handler
indeed
isn't safe.
Yes, and we have already fixed this in the TimescaleDB code, so this is not
an issue for us (https://github.com/timescale/timescaledb/pull/4323). (We
actually replace the bgworker_die with just die as signal handler.)
However, bgworker_die()---which is part of PostgreSQL and is the default
signal handler for background workers---is using ereport() and I think this
should be a problem for all error levels since the problem is in the usage
of the formatting function to format the error message, not where you write
it. This would mean that any existing background workers would have a
window for triggering this issue on shutdown.
On Tue, May 24, 2022 at 02:10:36PM +0200, Mats Kindahl wrote:
On Tue, May 24, 2022 at 12:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory. But this is only for ERROR or
higher levels. Using an ereport(LOG, ...) level in a sigterm handler
indeed
isn't safe.Yes, and we have already fixed this in the TimescaleDB code, so this is not
an issue for us (https://github.com/timescale/timescaledb/pull/4323). (We
actually replace the bgworker_die with just die as signal handler.)However, bgworker_die()---which is part of PostgreSQL and is the default
signal handler for background workers---is using ereport() and I think this
should be a problem for all error levels since the problem is in the usage
of the formatting function to format the error message, not where you write
it. This would mean that any existing background workers would have a
window for triggering this issue on shutdown.
Yes, but it's using ereport with FATAL level, so if it's the top level message
the ErrorContext should be in initial state or have been reset previously, and
if it's not then the escape hatch will reset the context. So in any case there
will be a guarantee to have at least 8kB available in that context, that any
palloc will be able to use to format the message.
Julien Rouhaud <rjuju123@gmail.com> writes:
Yes, but it's using ereport with FATAL level, so if it's the top level message
the ErrorContext should be in initial state or have been reset previously, and
if it's not then the escape hatch will reset the context. So in any case there
will be a guarantee to have at least 8kB available in that context, that any
palloc will be able to use to format the message.
ereport() itself is just the tip of the iceberg; even if it's safe
(which I concur it isn't), there's also the atexit/on_proc_exit
functions that are likely to be called during shutdown. So yeah,
this coding is not too safe. I'm not sure that getting rid of it
would be a net win though, as we'd replace it-might-crash hazards
with it-might-never-exit hazards, from bgworkers that neglect to
respond to ShutdownRequestPending.
regards, tom lane
On Tue, May 24, 2022 at 10:15:55AM -0400, Tom Lane wrote:
ereport() itself is just the tip of the iceberg; even if it's safe
(which I concur it isn't), there's also the atexit/on_proc_exit
functions that are likely to be called during shutdown. So yeah,
this coding is not too safe. I'm not sure that getting rid of it
would be a net win though, as we'd replace it-might-crash hazards
with it-might-never-exit hazards, from bgworkers that neglect to
respond to ShutdownRequestPending.
Hmm. Shouldn't we worry about FloatExceptionHandler() that gets used
on SIGFPE?
--
Michael
Hi,
On 2022-05-25 09:45:31 +0900, Michael Paquier wrote:
On Tue, May 24, 2022 at 10:15:55AM -0400, Tom Lane wrote:
ereport() itself is just the tip of the iceberg; even if it's safe
(which I concur it isn't), there's also the atexit/on_proc_exit
functions that are likely to be called during shutdown. So yeah,
this coding is not too safe. I'm not sure that getting rid of it
would be a net win though, as we'd replace it-might-crash hazards
with it-might-never-exit hazards, from bgworkers that neglect to
respond to ShutdownRequestPending.
IMO the it-might-never-exit hazard is lower, and can be addressed by the
authors of bgworkers by adding the checks. Whereas the it-might-crash can't
really be addressed well (except reimplementing the signal handler).
It might be worth adding some debugging infrastructure to make it easier to
find spots that don't check various forms of interrupts. If we required a
function to check for things like ShutdownRequestPending, we could check in
that function how long it's been since the last check, subtracting time spent
in WaitLatch etc.
Hmm. Shouldn't we worry about FloatExceptionHandler() that gets used on
SIGFPE?
That probably is comparatively low-risk, because SIGFPE is a synchronous
signal. So it'll be triggered in the middle of a floating point math operation
that where divide-by-zero wouldn't be a bug - which we really really shouldn't
have while holding a spinlock or such.
Greetings,
Andres Freund
Michael Paquier <michael@paquier.xyz> writes:
Hmm. Shouldn't we worry about FloatExceptionHandler() that gets used
on SIGFPE?
SIGFPE is generated synchronously, so that one's not really a problem,
I think: it can only interrupt code that is doing trap-prone arithmetic.
Anyway that code's been like that for ages and nobody has reported an
issue there.
regards, tom lane