Preventing hangups in bgworker start/stop during DB shutdown
Here's an attempt at closing the race condition discussed in [1]/messages/by-id/16785-c0207d8c67fb5f25@postgresql.org
(and in some earlier threads, though I'm too lazy to find them).
The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.
I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion. I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure. But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.
Thoughts?
regards, tom lane
Attachments:
cancel-unserviced-worker-requests-1.patchtext/x-diff; charset=us-ascii; name=cancel-unserviced-worker-requests-1.patchDownload+70-8
On Wed, 23 Dec 2020 at 05:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion. I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure. But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.Thoughts?
Callers who launch bgworkers already have to cope with conditions such as
the worker failing immediately after launch, or before attaching to the
shmem segment used for worker management by whatever extension is launching
it.
So I think it's reasonable to lie and say we launched it. The caller must
already cope with this case to behave correctly.
Patch specifics:
This function should only be called from the postmaster
It'd be good to
Assert(IsPostmasterEnvironment && !IsUnderPostmaster)
in these functions.
Otherwise at first read the patch and rationale looks sensible to me.
(When it comes to the bgw APIs in general I have a laundry list of things
I'd like to change or improve around signal handling, error trapping and
recovery, and lots more, but that's for another thread.)
On Wed, Dec 23, 2020 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion. I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure. But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.Thoughts?
1) Yeah, the postmaster will not be able to start the bg workers in
following cases, when bgworker_should_start_now returns false. So we
might encounter the hang issue.
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
switch (pmState)
{
case PM_NO_CHILDREN:
case PM_WAIT_DEAD_END:
case PM_SHUTDOWN_2:
case PM_SHUTDOWN:
case PM_WAIT_BACKENDS:
case PM_STOP_BACKENDS:
break;
2) What if postmaster enters pmState >= PM_STOP_BACKENDS state after
it calls BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS)?
First of all, is it possible? I think yes, because we are in
sigusr1_handler(), and I don't see we blocking the signal that sets
pmState >= PM_STOP_BACKENDS either in sigusr1_handler or in
BackgroundWorkerStateChange(). Though it's a small window, we might
get into the hangup issue? If yes, can we check the pmState in the for
loop in BackgroundWorkerStateChange()?
if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
{
- BackgroundWorkerStateChange();
+ /* Accept new dynamic worker requests only if not stopping. */
+ BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS);
StartWorkerNeeded = true;
}
3) Can we always say that if bgw_restart_time is BGW_NEVER_RESTART,
then it's a dynamic bg worker? I think we can also have normal
bgworkers with BGW_NEVER_RESTART flag(I see one in worker_spi.c
_PG_init()), will there be any problem? Or do we need some comment
tweaking?
+ /*
+ * If this is a dynamic worker request, and we aren't allowing new
+ * dynamic workers, then immediately mark it for termination; the next
+ * stanza will take care of cleaning it up.
+ */
+ if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART &&
+ !allow_new_workers)
+ slot->terminate = true;
4) IIUC, in the patch we mark slot->terminate = true only for
BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
bgw_restart_time seconds and don't we hit the hanging issue(that we
are trying to solve here) for those bg workers?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
On Wed, Dec 23, 2020 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).
2) What if postmaster enters pmState >= PM_STOP_BACKENDS state after
it calls BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS)?
First of all, is it possible? I think yes, because we are in
sigusr1_handler(), and I don't see we blocking the signal that sets
pmState >= PM_STOP_BACKENDS either in sigusr1_handler or in
BackgroundWorkerStateChange().
If you're asking whether the postmaster's signal handlers can interrupt
each other, they can't; see comment at the start of each one. If you're
wondering about the order of operations in sigusr1_handler, I agree that
seems wrong now. I'm inclined to move the BackgroundWorkerStateChange
call to be just before maybe_start_bgworkers(). That way it's after the
possible entry to PM_HOT_STANDBY state. The later steps can't change
pmState, except for PostmasterStateMachine, which would be responsible for
anything that needs to be done to bgworkers as a result of making a state
change.
3) Can we always say that if bgw_restart_time is BGW_NEVER_RESTART,
then it's a dynamic bg worker?
That assumption's already baked into ResetBackgroundWorkerCrashTimes,
for one. Personally I'd have designed things with some more-explicit
indicator, but I'm not interested in revisiting those API decisions now;
any cleanup we might undertake would result in a non-back-patchable fix.
As far as whether it's formally correct to do this given the current
APIs, I think it is. We're essentially pretending that the worker
got launched and instantly exited. If it's BGW_NEVER_RESTART then that
would result in deregistration in any case, while if it's not that,
then the worker record should get kept for a possible later restart.
I think we can also have normal
bgworkers with BGW_NEVER_RESTART flag(I see one in worker_spi.c
_PG_init()),
That might be a bug in worker_spi.c, but since that's only test code,
I don't care about it too much. Nobody's really going to care what
that module does in a postmaster shutdown.
4) IIUC, in the patch we mark slot->terminate = true only for
BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
bgw_restart_time seconds and don't we hit the hanging issue(that we
are trying to solve here) for those bg workers?
The hang problem is not with the worker itself, it's with anything
that might be waiting around for the worker to finish. It doesn't
seem to me to make a whole lot of sense to wait for a restartable
worker; what would that mean?
There's definitely room to revisit and clarify these APIs, and maybe
(if we don't change them altogether) add some Asserts about what are
sane combinations of properties. But my purpose today is just to get
a back-patchable bug fix, and that sort of change wouldn't fit in.
regards, tom lane
I wrote:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
4) IIUC, in the patch we mark slot->terminate = true only for
BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
bgw_restart_time seconds and don't we hit the hanging issue(that we
are trying to solve here) for those bg workers?
The hang problem is not with the worker itself, it's with anything
that might be waiting around for the worker to finish. It doesn't
seem to me to make a whole lot of sense to wait for a restartable
worker; what would that mean?
Upon further looking around, I noted that autoprewarm's
autoprewarm_start_worker() function does that, so we can't really
dismiss it.
However, what we can do instead is to change the condition to be
"cancel pending bgworker requests if there is a waiting process".
Almost all of the time, that means it's a dynamic bgworker with
BGW_NEVER_RESTART, so there's no difference. In the exceptional
cases like autoprewarm_start_worker, this would result in removing
a bgworker registration record for a restartable worker ... but
since we're shutting down, that record would have no effect before
the postmaster exits, anyway. I think we can live with that, at
least till such time as somebody redesigns this in a cleaner way.
I pushed a fix along those lines.
regards, tom lane
On Fri, 25 Dec 2020 at 06:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
4) IIUC, in the patch we mark slot->terminate = true only for
BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
bgw_restart_time seconds and don't we hit the hanging issue(that we
are trying to solve here) for those bg workers?The hang problem is not with the worker itself, it's with anything
that might be waiting around for the worker to finish. It doesn't
seem to me to make a whole lot of sense to wait for a restartable
worker; what would that mean?Upon further looking around, I noted that autoprewarm's
autoprewarm_start_worker() function does that, so we can't really
dismiss it.However, what we can do instead is to change the condition to be
"cancel pending bgworker requests if there is a waiting process".
Almost all of the time, that means it's a dynamic bgworker with
BGW_NEVER_RESTART, so there's no difference. In the exceptional
cases like autoprewarm_start_worker, this would result in removing
a bgworker registration record for a restartable worker ... but
since we're shutting down, that record would have no effect before
the postmaster exits, anyway. I think we can live with that, at
least till such time as somebody redesigns this in a cleaner way.I pushed a fix along those lines.
Thanks for the change.
Cleanups like this in the BGW API definitely make life easier.