Memory context can be its own parent and child in replication command
Hi,
While running some tests with logical replication, I've run in a
situation where a walsender process was stuck in an infinite loop with
the following backtrace:
#0 in MemoryContextDelete (...) at ../src/backend/utils/mmgr/mcxt.c:474
#1 in exec_replication_command (cmd_string=... "BEGIN") at
../src/backend/replication/walsender.c:2005
Which matches the following while loop:
while (curr->firstchild != NULL)
curr = curr->firstchild;
Inspecting the memory context, I have the following:
$9 = (MemoryContext) 0xafb741c35360
(gdb) p *curr
$10 = {type = T_AllocSetContext, isReset = false, allowInCritSection =
false, mem_allocated = 8192, methods = 0xafb7278a82d8
<mcxt_methods+216>, parent = 0xafb741c35360, firstchild =
0xafb741c35360, prevchild = 0x0, nextchild = 0x0, name =
0xafb7275f68a8 "Replication command context", ident = 0x0, reset_cbs =
0x0}
So the memory context is 0xafb741c35360, which is also the same value
for parent and firstchild. This explains the infinite loop as
MemoryContextDelete tries to find the leaf with no children.
I was able to get a rr recording of the issue and trace how this
happened. This can be reproduced by triggering 2 replication commands,
with the first one doing a snapshot export:
CREATE_REPLICATION_SLOT "test_slot" LOGICAL "test_decoding" ( SNAPSHOT
"export");
DROP_REPLICATION_SLOT "test_slot";
- CreateReplicationSlot will start a new transaction to handle the
snapshot export.
- This transaction will save the replication command context (let's
call it ctx1) in its state.
- ctx1 is deleted at the end of exec_replication_command
- During the next replication command, the transaction is aborted in
SnapBuildClearExportedSnapshot
- The transaction restores ctx1 as the CurrentMemoryContext
- AllocSetContextCreate is called with ctx1 as a parent, and it will
pull ctx1 from the freelist
- ctx1's parent and child will be set to ctx1 and returned
- During ctx1 deletion, it will be stuck in an infinite loop
I've added a tap test to reproduce the issue, along with an assertion
during context creation to check the parent and returned context are
not the same so the test would immediately abort and not stay stuck.
To fix this, it seems like saving and restoring the memory context
after the call AbortCurrentTransaction was the best approach. It is
similar to what's done with CurrentResourceOwner. I've thought of
switching to the TopMemoryContext before exporting the snapshot so
aborting the transaction will switch back to TopMemoryContext, but
this would still require restoring the memory context after the
transaction is aborted.
Regards,
Anthonin Bonnefoy
Attachments:
v01-0001-Avoid-using-deleted-context-with-replication-com.patchapplication/octet-stream; name=v01-0001-Avoid-using-deleted-context-with-replication-com.patchDownload+47-1
The issue seems to have been introduced by 2129d184a206c when
transactionState->priorContext was added and is only present in HEAD.
On Mon, Mar 10, 2025 at 09:07:59AM +0100, Anthonin Bonnefoy wrote:
The issue seems to have been introduced by 2129d184a206c when
transactionState->priorContext was added and is only present in HEAD.
It seems to me that you mean 1afe31f03cd2, no?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Mon, Mar 10, 2025 at 09:07:59AM +0100, Anthonin Bonnefoy wrote:
The issue seems to have been introduced by 2129d184a206c when
transactionState->priorContext was added and is only present in HEAD.
It seems to me that you mean 1afe31f03cd2, no?
I dunno about this patch: it seems to me it's doing things exactly
backwards, namely trying to restore the CurrentMemoryContext after
a transaction abort to what it was just beforehand. With only
a slightly different set of suppositions, this is introducing
use of a deleted context rather than removing it.
I'm inclined to suspect that the real problem is somebody installed
the wrong context as current just before the transaction was started.
I don't have the energy to look closer right now, though.
Independently of where the actual bug is there, it seems not
nice that it's so easy to bollix the memory context data
structures. I wonder if we can make that more detectable
at reasonable cost.
regards, tom lane
On Tue, Mar 11, 2025 at 1:09 AM Michael Paquier <michael@paquier.xyz> wrote:
It seems to me that you mean 1afe31f03cd2, no?
Yes, that was a bad copy/paste from me.
On Tue, Mar 11, 2025 at 2:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I dunno about this patch: it seems to me it's doing things exactly
backwards, namely trying to restore the CurrentMemoryContext after
a transaction abort to what it was just beforehand. With only
a slightly different set of suppositions, this is introducing
use of a deleted context rather than removing it.
Yeah, I'm not super happy about the fix either. This leaves the issue
that CurrentMemoryContext is set to a deleted context, even if
temporarily.
I'm inclined to suspect that the real problem is somebody installed
the wrong context as current just before the transaction was started.
I don't have the energy to look closer right now, though.
The context installed is the replication command context created at
the beginning of exec_replication_command. This context is deleted at
the end of the function while the transaction is still running. Maybe
a better fix would be to not delete the Replication command context
when a transaction was started to handle the export, and switch back
to this context at the start of the next exec_replication_command?
This way, the memory context referenced by priorContext would still be
valid when the transaction is aborted.
This would also require the Replication command context to not be
attached under the MessageContext to avoid being deleted at the end of
the query cycle.
Independently of where the actual bug is there, it seems not
nice that it's so easy to bollix the memory context data
structures. I wonder if we can make that more detectable
at reasonable cost.
One thing that made it hard to trace the issue was that there's no way
to know if a MemoryContext was deleted or not. Setting the
MemoryContext's node type to T_Invalid during reset could be a way to
tag the context as deleted. And this will be able to leverage the
existing MemoryContextIsValid check and additional
MemoryContextIsValid checks could be added before restoring the
priorContext.
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context)
context->methods->delete_context(context);
+ context->type = T_Invalid;
VALGRIND_DESTROY_MEMPOOL(context);
}
However, when testing this on my mac, it seems to trigger a heap
corruption during initdb.
postgres(9057,0x200690840) malloc: Heap corruption detected, free list
is damaged at 0x60000322bb10
*** Incorrect guard value: 276707563012096
postgres(9057,0x200690840) malloc: *** set a breakpoint in
malloc_error_break to debug
While it ran fine on linux. I didn't have time to check why yet.
On Tue, Mar 11, 2025 at 10:26 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
--- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context)context->methods->delete_context(context);
+ context->type = T_Invalid;
VALGRIND_DESTROY_MEMPOOL(context);
}However, when testing this on my mac, it seems to trigger a heap
corruption during initdb.
Which is normal since this would only work with AllocSet as other
delelte_context methods would just free the context... A better spot
would be just before putting the context in the freelist in
AllocSetDelete.
I've updated the patch with another approach:
0001: This sets the MemoryContext type to T_Invalid just before adding
it to the aset freelist, allowing to distinguish between a
MemoryContext that was placed in the freelist and shouldn't be used,
and a valid one. The existing MemoryContextIsValid checks will be
triggered if a MemoryContext in the freelist is used.
I've also added additional MemoryContextIsValid to make sure the
MemoryContext restored by a transaction is valid.
0002: Keep the replication command alive when there's an ongoing
snapshot export. This way, the context restored is valid and can be
reused. This requires the replication command context to be created
under the TopMemoryContext.
Regards,
Anthonin Bonnefoy
Attachments:
v02-0001-Add-additional-memory-context-checks.patchapplication/octet-stream; name=v02-0001-Add-additional-memory-context-checks.patchDownload+13-3
v02-0002-Avoid-using-deleted-context-with-replication-com.patchapplication/octet-stream; name=v02-0002-Avoid-using-deleted-context-with-replication-com.patchDownload+89-15
Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> writes:
I've updated the patch with another approach:
I got back to looking at this today. I still don't like very much
about it. After thinking for awhile, I concur that we want to create
the cmd_context under TopMemoryContext, but I think we can make things
a great deal simpler and less fragile by just keeping it in existence
indefinitely, as attached. The fundamental problem here is that
exec_replication_command is creating and deleting the context without
regard to where transaction boundaries are. I don't think the best
way to deal with that is to require it to know where those boundaries
are.
I also don't like the proposed test script. The test case alone would
be fine, but spinning up an entire new instance seems a rather huge
overhead to carry forevermore for a single test that likely will never
find anything again. I tried to cram the test case into one of the
existing scripts, but it was hard to find one where it would fit
naturally. An obvious candidate is subscription/t/100_bugs.pl, but
the test failed there for lack of access to the test_decoding plugin.
Perhaps we could use another plugin, but I didn't try hard.
As for the memory-context-bug-catching changes, I'm not very convinced
by the idea of setting context->type = T_Invalid. That would help
if someone tries to reference the context while it's still in the
freelist, but not if the reference is from re-use of a stale pointer
after the context has been reassigned (which IIUC is the case here).
I also think that decorating a few MemoryContextSwitchTos with
assertions looks pretty random. I wonder if we should content
ourselves with adding some assertions that catch attempts to make
a context be its own parent. It looks like MemoryContextSetParent
already does that, so the only new assert would be in
MemoryContextCreate.
regards, tom lane
Attachments:
v3-preserve-replication-cmd-context.patchtext/x-diff; charset=us-ascii; name=v3-preserve-replication-cmd-context.patchDownload+33-9
I wrote:
... An obvious candidate is subscription/t/100_bugs.pl, but
the test failed there for lack of access to the test_decoding plugin.
Perhaps we could use another plugin, but I didn't try hard.
Ah, I realized we could just use pgoutput, since the test doesn't
actually do anything that would exercise the plugin. So here's
a v4 with the test restored.
regards, tom lane
Attachments:
v4-0001-Use-the-same-cmd_context-throughout-a-walsender-s.patchtext/x-diff; charset=us-ascii; name*0=v4-0001-Use-the-same-cmd_context-throughout-a-walsender-s.p; name*1=atchDownload+51-10
Hi,
So here's a v4 with the test restored.
I tested this patch, it fixes the issue reported. It passes Github CI tests.
already does that, so the only new assert would be in
MemoryContextCreate.
+1 for adding the assertion to increase the chances of this bug being
caught by memory context infrastructure.
I had the following comment.
Why do we do this:
- MemoryContext old_context;
+ MemoryContext old_context = CurrentMemoryContext;
Instead of implementing it as done in the previous version of this code,
i.e.
old_context = MemoryContextSwitchTo(cmd_context);
Thank you,
Rahila Syed
Rahila Syed <rahilasyed90@gmail.com> writes:
+1 for adding the assertion to increase the chances of this bug being
caught by memory context infrastructure.
I verified that an assertion inside MemoryContextCreate would catch
this bug, so I added one (in 5ec8b01c3), and then pushed the main
fix at 80b727eb9.
I had the following comment.
Why do we do this: - MemoryContext old_context; + MemoryContext old_context = CurrentMemoryContext; Instead of implementing it as done in the previous version of this code, i.e. old_context = MemoryContextSwitchTo(cmd_context);
Because capturing old_context at that point is too late. The crux
of the bug is exactly that SnapBuildClearExportedSnapshot may have
changed CurrentMemoryContext. We could have avoided that by
reordering the code, or done something like what Anthonin did in v02.
But this way is perfectly idiomatic IMO. There are some dozens of
similar usages elsewhere in our code, and it makes very sure that
old_context captures the caller's context even if we rearrange things
some more inside exec_replication_command.
regards, tom lane