EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS
Hi,
While looking at a patch I noticed that SubPostmasterMain() for bgworkers
unconditionally does
/* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
InitProcess();
which presents a problem, because StartBackgroundWorker() then does
/*
* If we're not supposed to have shared memory access, then detach from
* shared memory. If we didn't request shared memory access, the
* postmaster won't force a cluster-wide restart if we exit unexpectedly,
* so we'd better make sure that we don't mess anything up that would
* require that sort of cleanup.
*/
if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)
{
ShutdownLatchSupport();
dsm_detach_all();
PGSharedMemoryDetach();
}
which presents a problem: We've initialized all kind of references to shared
memory, own a PGPROC, but have detached from shared memory.
In practice this will lead pretty quickly to a segfault, because process exit
will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or
logging dereferences MyProc, or ...
It seems the above code block would need to at least do shmem_exit() before
the PGSharedMemoryDetach()?
This code has been introduced in
commit 4d155d8b08fe08c1a1649fdbad61c6dcf4a8671f
Author: Robert Haas <rhaas@postgresql.org>
Date: 2014-05-07 14:54:43 -0400
Detach shared memory from bgworkers without shmem access.
Since the postmaster won't perform a crash-and-restart sequence
for background workers which don't request shared memory access,
we'd better make sure that they can't corrupt shared memory.
Patch by me, review by Tom Lane.
but before that things were just slightly differently broken...
I tested both the crash and that a shmem_exit() fixes it with an ugly hack in
regress.c. I don't really know how to write a good testcase for this, given
that the communication facilities of a unconnected bgworker are quite
limited... I guess the bgworker could create files or something :(.
Greetings,
Andres Freund
Hi,
On 2021-08-01 23:51:16 -0700, Andres Freund wrote:
In practice this will lead pretty quickly to a segfault, because process exit
will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or
logging dereferences MyProc, or ...It seems the above code block would need to at least do shmem_exit() before
the PGSharedMemoryDetach()?
I tested both the crash and that a shmem_exit() fixes it with an ugly hack in
regress.c. I don't really know how to write a good testcase for this, given
that the communication facilities of a unconnected bgworker are quite
limited... I guess the bgworker could create files or something :(.
Hm. There may be another way: Because BackgroundWorkerEntry() does not
actually need a lock, we could transport BackgroundWorkerData via
backend_save_variables(). To make the connected case work, we'd of course
continue to call CreateSharedMemoryAndSemaphores (and thus InitProcess()) for
those.
But that doesn't really seem better? Sure, it's nice to not need a pgproc
entry unnecessarily, but it's not like unconnected bgworkers are commonly
used and it does seem like it'd end up with more complicated code?
Greetings,
Andres Freund
On Mon, Aug 2, 2021 at 2:51 AM Andres Freund <andres@anarazel.de> wrote:
which presents a problem: We've initialized all kind of references to shared
memory, own a PGPROC, but have detached from shared memory.In practice this will lead pretty quickly to a segfault, because process exit
will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or
logging dereferences MyProc, or ...It seems the above code block would need to at least do shmem_exit() before
the PGSharedMemoryDetach()?This code has been introduced in
commit 4d155d8b08fe08c1a1649fdbad61c6dcf4a8671f
Author: Robert Haas <rhaas@postgresql.org>
Date: 2014-05-07 14:54:43 -0400Detach shared memory from bgworkers without shmem access.
Since the postmaster won't perform a crash-and-restart sequence
for background workers which don't request shared memory access,
we'd better make sure that they can't corrupt shared memory.Patch by me, review by Tom Lane.
but before that things were just slightly differently broken...
If you're saying that this code has been 100% broken for 7 years and
nobody's noticed until now, then that suggests that nobody actually
uses non-shmem-connected bgworkers. I sort of hate to give up on that
concept but if we've really gone that many years without anyone
noticing obvious breakage then maybe we should.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
If you're saying that this code has been 100% broken for 7 years and
nobody's noticed until now, then that suggests that nobody actually
uses non-shmem-connected bgworkers. I sort of hate to give up on that
concept but if we've really gone that many years without anyone
noticing obvious breakage then maybe we should.
Well, the problem only exists on Windows so maybe this indeed
escaped notice. Still, this is good evidence that the case isn't
used *much*, and TBH I don't see many applications for it.
I can't say I'm excited about putting effort into fixing it.
regards, tom lane
Hi,
On 2021-08-02 11:00:49 -0400, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
If you're saying that this code has been 100% broken for 7 years and
nobody's noticed until now, then that suggests that nobody actually
uses non-shmem-connected bgworkers. I sort of hate to give up on that
concept but if we've really gone that many years without anyone
noticing obvious breakage then maybe we should.Well, the problem only exists on Windows so maybe this indeed
escaped notice.
Right. I did briefly look around and I didn't find bgworkers without
shmem attachement...
Still, this is good evidence that the case isn't used *much*, and TBH
I don't see many applications for it. I can't say I'm excited about
putting effort into fixing it.
Yea, I don't think it adds that much - without e.g. sharing a file
descriptor with the unconnected bgworker one can't implement something
like syslogger.
Greetings,
Andres Freund
On 2021-Aug-02, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
If you're saying that this code has been 100% broken for 7 years and
nobody's noticed until now, then that suggests that nobody actually
uses non-shmem-connected bgworkers. I sort of hate to give up on that
concept but if we've really gone that many years without anyone
noticing obvious breakage then maybe we should.Well, the problem only exists on Windows so maybe this indeed
escaped notice. Still, this is good evidence that the case isn't
used *much*, and TBH I don't see many applications for it.
I can't say I'm excited about putting effort into fixing it.
When I included this case I was thinking in tasks which would just run
stuff not directly connected to data. Something like a sub-daemon: say
a connection pooler, which is a bgworker just so that it starts and
stops together with postmaster, and share facilities like GUC
configuration and SIGHUP handling, etc. It doesn't look like anybody
has had an interest in developing such a thing, so if this is
obstructing your work, I don't object to removing the no-shmem case.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Hi,
On Mon, Aug 2, 2021, at 11:08, Alvaro Herrera wrote:
On 2021-Aug-02, Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
If you're saying that this code has been 100% broken for 7 years and
nobody's noticed until now, then that suggests that nobody actually
uses non-shmem-connected bgworkers. I sort of hate to give up on that
concept but if we've really gone that many years without anyone
noticing obvious breakage then maybe we should.Well, the problem only exists on Windows so maybe this indeed
escaped notice. Still, this is good evidence that the case isn't
used *much*, and TBH I don't see many applications for it.
I can't say I'm excited about putting effort into fixing it.When I included this case I was thinking in tasks which would just run
stuff not directly connected to data. Something like a sub-daemon: say
a connection pooler, which is a bgworker just so that it starts and
stops together with postmaster, and share facilities like GUC
configuration and SIGHUP handling, etc.
I think nearly all such cases are going to want some monitoring from within the database - which then needs shared memory.
And even if not - it's not really that useful to avoid a crash-restart if your worker dies with a segfault. There's no really legitimate reasons for completely unhandled errors even if not connected to shmem.
It doesn't look like anybody
has had an interest in developing such a thing, so if this is
obstructing your work, I don't object to removing the no-shmem case.
It's not obstructing me right now. I noticed it'd crash on EXEC_BACKEND when I tried to understand some code (see the nearby thread about straightening out process startup).
I do think there's some potential gains in simplicity and robustness that are made mildly harder by a subprocess that first attaches and detaches from shm (it's the only case where we can't easily unify the place InitProcess() is called between EB and ! EB right now). There's several ways that could be tackled. Removing the need to have that if obviously one of them.
Regards,
Andres
On 2021-Aug-02, Andres Freund wrote:
When I included this case I was thinking in tasks which would just run
stuff not directly connected to data. Something like a sub-daemon: say
a connection pooler, which is a bgworker just so that it starts and
stops together with postmaster, and share facilities like GUC
configuration and SIGHUP handling, etc.I think nearly all such cases are going to want some monitoring from
within the database - which then needs shared memory.
True. Observability for such things is critical (pgbouncer goes quite
some trouble to offer SQL-queryable views into its metrics), which kills
the argument.
I do think there's some potential gains in simplicity and robustness
that are made mildly harder by a subprocess that first attaches and
detaches from shm (it's the only case where we can't easily unify the
place InitProcess() is called between EB and ! EB right now). There's
several ways that could be tackled. Removing the need to have that if
obviously one of them.
Hmm, I don't remember that an shmem-unconnected bgworker first connected
to it and then let go. It seems weird to do it that way. My intention,
as far as I recall, is that they would just never connect to shmem,
period.
--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)
Hi,
On Mon, Aug 2, 2021, at 12:12, Alvaro Herrera wrote:
On 2021-Aug-02, Andres Freund wrote:
I do think there's some potential gains in simplicity and robustness
that are made mildly harder by a subprocess that first attaches and
detaches from shm (it's the only case where we can't easily unify the
place InitProcess() is called between EB and ! EB right now). There's
several ways that could be tackled. Removing the need to have that if
obviously one of them.Hmm, I don't remember that an shmem-unconnected bgworker first connected
to it and then let go. It seems weird to do it that way. My intention,
as far as I recall, is that they would just never connect to shmem,
period.
They currently do for EXEC_BACKEND. See SubPostmasterMain(). There the definition of the worker is passed via shared memory. So it does the full reattach thing, which requires lwlock, which requires PGPROC. We could get away without that by passing more through the variables file (either the worker definition or the address of the bgworker shmem piece).
Greetings,
Andres
On 2021-Aug-02, Andres Freund wrote:
On Mon, Aug 2, 2021, at 12:12, Alvaro Herrera wrote:
Hmm, I don't remember that an shmem-unconnected bgworker first connected
to it and then let go. It seems weird to do it that way. My intention,
as far as I recall, is that they would just never connect to shmem,
period.They currently do for EXEC_BACKEND. See SubPostmasterMain(). There the
definition of the worker is passed via shared memory. So it does the
full reattach thing, which requires lwlock, which requires PGPROC. We
could get away without that by passing more through the variables file
(either the worker definition or the address of the bgworker shmem
piece).
Ah, that makes sense. That doesn't sound super fragile, but it is odd
and it's probably a good argument for removing the feature, particularly
since nobody seems to be using it.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
Hi,
On 2021-08-02 15:34:07 -0400, Alvaro Herrera wrote:
Ah, that makes sense. That doesn't sound super fragile, but it is odd
and it's probably a good argument for removing the feature, particularly
since nobody seems to be using it.
ISTM we concluded that we should remove unconnected workers. Writing a patch
to do so left me with two questions:
First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat
it as a required flag going forward. That way we don't silently start being
attached to shared memory in case somebody actually has a unattached
worker. And if we ever wanted to add the ability to have unattached workers
back, it'll also be easier this way. Perhaps it also has a small amount of
signalling value reminding people that they need to be careful...
The second question is what we want to do in the backbranches. I think the
reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an
error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat
it as a required flag going forward.
+1
The second question is what we want to do in the backbranches. I think the
reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an
error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used.
I think doing nothing is fine. Given the lack of complaints, we're
more likely to break something than fix anything useful.
regards, tom lane
Hi,
On 2021-08-05 20:02:02 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat
it as a required flag going forward.+1
The second question is what we want to do in the backbranches. I think the
reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an
error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used.I think doing nothing is fine. Given the lack of complaints, we're
more likely to break something than fix anything useful.
Done in the attached patch. I don't think we need to add more to the docs than
the flag being required?
Greetings,
Andres Freund
Attachments:
0001-Remove-support-for-background-workers-without-BGWORK.patchtext/x-diff; charset=us-asciiDownload+38-64
On Thu, Aug 5, 2021 at 8:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think doing nothing is fine. Given the lack of complaints, we're
more likely to break something than fix anything useful.
+1.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Aug 09, 2021 at 11:07:14AM -0400, Robert Haas wrote:
On Thu, Aug 5, 2021 at 8:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think doing nothing is fine. Given the lack of complaints, we're
more likely to break something than fix anything useful.+1.
FWIW, the only interesting case I have in my plugin box for a
background worker that does not attach to shared memory is a template
of worker able to catch signals, to be used as a base for simple
actions. So that's not really interesting. Making the SHMEM flag be
something mandatory on HEAD while doing nothing in the back-branches
sounds good to me, so +1.
--
Michael
On 2021-08-05 19:56:49 -0700, Andres Freund wrote:
Done in the attached patch. I don't think we need to add more to the docs than
the flag being required?
Pushed that patch now. If we want further additions to the docs we can
do so separately.