Non-robustness in pmsignal.c
As I mentioned in another thread, I came across a reproducible
situation in which a memory clobber in a child backend crashes
the postmaster too, at least on FreeBSD/arm64. Needless to say,
this is Not Cool. I've now traced down what is happening,
and it's this:
1. Careless coding in aset.c causes it to decide to wipe_mem
the universe. (I'll have more to say about that separately;
the point of this thread is keeping the postmaster alive
afterwards.) Apparently, there's not any non-live memory
space between process-local memory and shared memory on this
platform, so the failing backend manages to trash shared memory
too before it finally hits SIGSEGV.
2. Most of the background processes die on something like
TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File: "latch.c", Line: 686, PID: 5916)
or they encounter what seems to be a stuck spinlock. The postmaster,
however, SIGSEGVs. It's not supposed to do that; it is supposed to
be sufficiently arms-length from shared memory that it can recover
despite a backend trashing shared memory contents.
3. The cause of the SIGSEGV is that AssignPostmasterChildSlot
naively believes that it can trust PMSignalState->next_child_flag
to be a valid array index, so after that's been clobbered with
something like 0x7f7f7f7f we index off the end of memory.
I see no good reason for that state variable to be in shared memory
at all, so the attached patch just moves it to postmaster static
data. We also need a less-exposed copy of the array size variable.
4. That's enough to stop the SIGSEGV crash, but the postmaster
still fails to recover, because then it hits
elog(FATAL, "no free slots in PMChildFlags array");
since all of the array entries have been clobbered as well.
In the attached patch, I fixed this by treating the case similarly
to failure to fork a new child process. This seems to be enough
to let the postmaster survive, and recover after it starts noticing
crashing children.
5. It's possible that we should take some proactive steps to get out
of the "no free slots" situation, rather than just wait for some
child to crash. I'm inclined not to, however. It'd be hard-to-test
corner-case code, and given the lack of field reports like this,
the situation must be awfully rare.
Thoughts?
regards, tom lane
Attachments:
Hi,
On 2022-10-07 19:57:35 -0400, Tom Lane wrote:
As I mentioned in another thread, I came across a reproducible
situation in which a memory clobber in a child backend crashes
the postmaster too, at least on FreeBSD/arm64. Needless to say,
this is Not Cool.
Ugh.
I've now traced down what is happening, and it's this:
1. Careless coding in aset.c causes it to decide to wipe_mem
the universe. (I'll have more to say about that separately;
the point of this thread is keeping the postmaster alive
afterwards.) Apparently, there's not any non-live memory
space between process-local memory and shared memory on this
platform, so the failing backend manages to trash shared memory
too before it finally hits SIGSEGV.
Perhaps it'd be worth mark a page or two inaccessible, via
mprotect(PROT_NONE), at the start and end of shared memory. I've wondered
about a debugging mode where we do that after separate shared memory
allocations even. But start/end would be something we could conceivably always
enable.
2. Most of the background processes die on something like
TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File: "latch.c", Line: 686, PID: 5916)
or they encounter what seems to be a stuck spinlock. The postmaster,
however, SIGSEGVs. It's not supposed to do that; it is supposed to
be sufficiently arms-length from shared memory that it can recover
despite a backend trashing shared memory contents.
3. The cause of the SIGSEGV is that AssignPostmasterChildSlot
naively believes that it can trust PMSignalState->next_child_flag
to be a valid array index, so after that's been clobbered with
something like 0x7f7f7f7f we index off the end of memory.
I see no good reason for that state variable to be in shared memory
at all, so the attached patch just moves it to postmaster static
data. We also need a less-exposed copy of the array size variable.
Those make sense to me.
4. That's enough to stop the SIGSEGV crash, but the postmaster
still fails to recover, because then it hitselog(FATAL, "no free slots in PMChildFlags array");
since all of the array entries have been clobbered as well.
In the attached patch, I fixed this by treating the case similarly
to failure to fork a new child process. This seems to be enough
to let the postmaster survive, and recover after it starts noticing
crashing children.
Why are we even tracking PM_CHILD_UNUSED / PM_CHILD_ASSIGNED in shared memory?
ISTM those should live in postmaster local memory (maybe copied to shared
memory). PM_CHILD_ACTIVE and PM_CHILD_WALSENDER do have to live in shared
memory, but ...
Your fix seems ok. We really ought to deduplicate the way we start postmaster
children, but that's obviously work for another day.
5. It's possible that we should take some proactive steps to get out
of the "no free slots" situation, rather than just wait for some
child to crash. I'm inclined not to, however. It'd be hard-to-test
corner-case code, and given the lack of field reports like this,
the situation must be awfully rare.
Agreed.
Are you thinking these should be backpatched?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
Why are we even tracking PM_CHILD_UNUSED / PM_CHILD_ASSIGNED in shared memory?
Because those flags are set by the child processes too, cf
MarkPostmasterChildActive and MarkPostmasterChildInactive.
Are you thinking these should be backpatched?
I am, but I'm not inclined to push this immediately before a wrap.
If we intend to wrap 15.0 on Monday then I'll wait till after that.
OTOH, if we slip that a week, I'd be okay with pushing in the
next day or two.
regards, tom lane
Hi,
On 2022-10-07 20:35:58 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Why are we even tracking PM_CHILD_UNUSED / PM_CHILD_ASSIGNED in shared memory?
Because those flags are set by the child processes too, cf
MarkPostmasterChildActive and MarkPostmasterChildInactive.
Only PM_CHILD_ACTIVE and PM_CHILD_WALSENDER though. We could afford another
MaxLivePostmasterChildren() sized array...
Are you thinking these should be backpatched?
I am, but I'm not inclined to push this immediately before a wrap.
+1
If we intend to wrap 15.0 on Monday then I'll wait till after that.
OTOH, if we slip that a week, I'd be okay with pushing in the
next day or two.
Makes sense.
- Andres
Andres Freund <andres@anarazel.de> writes:
On 2022-10-07 20:35:58 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Why are we even tracking PM_CHILD_UNUSED / PM_CHILD_ASSIGNED in shared memory?
Because those flags are set by the child processes too, cf
MarkPostmasterChildActive and MarkPostmasterChildInactive.
Only PM_CHILD_ACTIVE and PM_CHILD_WALSENDER though. We could afford another
MaxLivePostmasterChildren() sized array...
Oh, I see what you mean --- one private and one public array.
Maybe that makes more sense than what I did, not sure.
I am, but I'm not inclined to push this immediately before a wrap.
+1
OK, I'll take a little more time on this and maybe code it up as
you suggest.
regards, tom lane
I wrote:
Andres Freund <andres@anarazel.de> writes:
Only PM_CHILD_ACTIVE and PM_CHILD_WALSENDER though. We could afford another
MaxLivePostmasterChildren() sized array...
Oh, I see what you mean --- one private and one public array.
Maybe that makes more sense than what I did, not sure.
Yeah, that's definitely a better way. I'll push this after the
release freeze lifts.
regards, tom lane
Attachments:
Hi,
On 2022-10-08 13:15:07 -0400, Tom Lane wrote:
I wrote:
Andres Freund <andres@anarazel.de> writes:
Only PM_CHILD_ACTIVE and PM_CHILD_WALSENDER though. We could afford another
MaxLivePostmasterChildren() sized array...Oh, I see what you mean --- one private and one public array.
Maybe that makes more sense than what I did, not sure.Yeah, that's definitely a better way. I'll push this after the
release freeze lifts.
Cool, thanks for exploring.
/* * Signal handler to be notified if postmaster dies. */ @@ -142,7 +152,25 @@ PMSignalShmemInit(void) { /* initialize all flags to zeroes */ MemSet(unvolatize(PMSignalData *, PMSignalState), 0, PMSignalShmemSize()); - PMSignalState->num_child_flags = MaxLivePostmasterChildren(); + num_child_inuse = MaxLivePostmasterChildren(); + PMSignalState->num_child_flags = num_child_inuse; + + /* + * Also allocate postmaster's private PMChildInUse[] array. We + * might've already done that in a previous shared-memory creation + * cycle, in which case free the old array to avoid a leak. (Do it + * like this to support the possibility that MaxLivePostmasterChildren + * changed.) In a standalone backend, we do not need this. + */ + if (PostmasterContext != NULL) + { + if (PMChildInUse) + pfree(PMChildInUse); + PMChildInUse = (bool *) + MemoryContextAllocZero(PostmasterContext, + num_child_inuse * sizeof(bool)); + } + next_child_inuse = 0; } }
When can PostmasterContext be NULL here, and why can we just continue without
(re-)allocating PMChildInUse?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
When can PostmasterContext be NULL here, and why can we just continue without
(re-)allocating PMChildInUse?
We'd only get into the !found stanza in a postmaster or a
standalone backend. A standalone backend isn't ever going to call
AssignPostmasterChildSlot or ReleasePostmasterChildSlot, so it
does not need the array; and it also doesn't have a PostmasterContext,
so there's not a good place to allocate the array either.
Perhaps there's a better way to distinguish am-I-a-postmaster,
but I thought checking PostmasterContext is fine since that ties
directly to what the code needs to do.
Yes, the code would malfunction if the PostmasterContext != NULL
condition changed from one cycle to the next, but that shouldn't
happen.
regards, tom lane