reorderbuffer: memory overconsumption with medium-size subxacts
Hello
Found this on Postgres 9.6, but I think it affects back to 9.4.
I've seen a case where reorderbuffer keeps very large amounts of memory
in use, without spilling to disk, if the main transaction does little or
no changes and many subtransactions execute changes just below the
threshold to spill to disk.
The particular case we've seen is the main transaction does one UPDATE,
then a subtransaction does something between 300 and 4000 changes.
Since all these are below max_changes_in_memory, nothing gets spilled to
disk. (To make matters worse: even if there are some subxacts that do
more than max_changes_in_memory, only that subxact is spilled, not the
whole transaction.) This was causing a 16GB-machine to die, unable to
process the long transaction; had to add additional 16 GB of physical
RAM for the machine to be able to process the transaction.
I think there's a one-line fix, attached: just add the number of changes
in a subxact to nentries_mem when the transaction is assigned to the
parent. Since a wal ASSIGNMENT records happens once every 32 subxacts,
this accumulates just that number of subxact changes in memory before
spilling, which is much more reasonable. (Hmm, I wonder why this
happens every 32 subxacts, if the code seems to be using
PGPROC_MAX_CACHED_SUBXIDS which is 64.)
Hmm, while writing this I am wonder if this affects cases with many
levels of subtransactions. Not sure how are nested subxacts handled by
reorderbuffer.c, but reading code I think it is okay.
Of course, there's Tomas logical_work_mem too, but that's too invasive
to backpatch.
--
�lvaro Herrera PostgreSQL Expert, https://www.2ndQuadrant.com/
Attachments:
less-memory.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f6ea315422..d5fdd7c6a6 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -869,6 +869,8 @@ ReorderBufferAssignChild(ReorderBuffer *rb, TransactionId xid,
/* Possibly transfer the subtxn's snapshot to its top-level txn. */
ReorderBufferTransferSnapToParent(txn, subtxn);
+ txn->nentries_mem += subtxn->nentries_mem;
+
/* Verify LSN-ordering invariant */
AssertTXNLsnOrder(rb);
}
@@ -2333,7 +2335,6 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
spilled++;
}
- Assert(spilled == txn->nentries_mem);
Assert(dlist_is_empty(&txn->changes));
txn->nentries_mem = 0;
txn->serialized = true;
On 12/16/18 4:06 PM, Alvaro Herrera wrote:
Hello
Found this on Postgres 9.6, but I think it affects back to 9.4.
I've seen a case where reorderbuffer keeps very large amounts of memory
in use, without spilling to disk, if the main transaction does little or
no changes and many subtransactions execute changes just below the
threshold to spill to disk.The particular case we've seen is the main transaction does one UPDATE,
then a subtransaction does something between 300 and 4000 changes.
Since all these are below max_changes_in_memory, nothing gets spilled to
disk. (To make matters worse: even if there are some subxacts that do
more than max_changes_in_memory, only that subxact is spilled, not the
whole transaction.) This was causing a 16GB-machine to die, unable to
process the long transaction; had to add additional 16 GB of physical
RAM for the machine to be able to process the transaction.
Yeah. We do the check for each xact separately, so it's vulnerable to
this scenario (subxact large just below max_changes_in_memory).
I think there's a one-line fix, attached: just add the number of changes
in a subxact to nentries_mem when the transaction is assigned to the
parent. Since a wal ASSIGNMENT records happens once every 32 subxacts,
this accumulates just that number of subxact changes in memory before
spilling, which is much more reasonable. (Hmm, I wonder why this
happens every 32 subxacts, if the code seems to be using
PGPROC_MAX_CACHED_SUBXIDS which is 64.)
Not sure, for a couple of reasons ...
Doesn't that essentially mean we'd always evict toplevel xact(s),
including all subxacts, no matter how tiny those subxacts are? That
seems like it might easily cause regressions for the common case with
many small subxact and one huge subxact (which is the only one we'd
currently spill, I think). That seems annoying.
But even if we decide it's the right approach, isn't the proposed patch
a couple of bricks shy? It redefines the nentries_mem field from "per
(sub)xact" to "total" for the toplevel xact, but then updates it only in
when assigning the child. But the subxacts may receive changes after
that, so ReorderBufferQueueChange() probably needs to update the value
for toplevel xact too, I guess. And ReorderBufferCheckSerializeTXN()
should probably check the toplevel xact too ...
Maybe a simpler solution would be to simply track total number of
changes in memory (from all xacts), and then evict the largest one. But
I doubt that would be backpatchable - it's pretty much what the
logical_work_mem patch does.
And of course, addressing this on pg11 is a bit more complicated due to
the behavior of Generation context (see the logical_work_mem thread for
details).
Hmm, while writing this I am wonder if this affects cases with many
levels of subtransactions. Not sure how are nested subxacts handled by
reorderbuffer.c, but reading code I think it is okay.
That should be OK. The assignments don't care about the intermediate
subxacts, they only care about the toplevel xact and current subxact.
Of course, there's Tomas logical_work_mem too, but that's too invasive
to backpatch.
Yep.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2018-12-16 12:06:16 -0300, Alvaro Herrera wrote:
Found this on Postgres 9.6, but I think it affects back to 9.4.
I've seen a case where reorderbuffer keeps very large amounts of memory
in use, without spilling to disk, if the main transaction does little or
no changes and many subtransactions execute changes just below the
threshold to spill to disk.The particular case we've seen is the main transaction does one UPDATE,
then a subtransaction does something between 300 and 4000 changes.
Since all these are below max_changes_in_memory, nothing gets spilled to
disk. (To make matters worse: even if there are some subxacts that do
more than max_changes_in_memory, only that subxact is spilled, not the
whole transaction.) This was causing a 16GB-machine to die, unable to
process the long transaction; had to add additional 16 GB of physical
RAM for the machine to be able to process the transaction.I think there's a one-line fix, attached: just add the number of changes
in a subxact to nentries_mem when the transaction is assigned to the
parent.
Isn't this going to cause significant breakage, because we rely on
nentries_mem to be accurate?
/* try to load changes from disk */
if (entry->txn->nentries != entry->txn->nentries_mem)
Greetings,
Andres Freund
On 2018-Dec-16, Andres Freund wrote:
I think there's a one-line fix, attached: just add the number of changes
in a subxact to nentries_mem when the transaction is assigned to the
parent.Isn't this going to cause significant breakage, because we rely on
nentries_mem to be accurate?/* try to load changes from disk */
if (entry->txn->nentries != entry->txn->nentries_mem)
Bahh.
Are you suggesting I should try a more thorough rewrite of
reorderbuffer.c?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2018-12-16 17:30:30 -0300, Alvaro Herrera wrote:
On 2018-Dec-16, Andres Freund wrote:
I think there's a one-line fix, attached: just add the number of changes
in a subxact to nentries_mem when the transaction is assigned to the
parent.Isn't this going to cause significant breakage, because we rely on
nentries_mem to be accurate?/* try to load changes from disk */
if (entry->txn->nentries != entry->txn->nentries_mem)Bahh.
Are you suggesting I should try a more thorough rewrite of
reorderbuffer.c?
I'm saying you can't just randomly poke at one variable without actually
checking how it's used. I kinda doubt you're going to find something
that's immediately backpatchable here. Perhaps something for master
that's then backpatched after it matured for a while.
Greetings,
Andres Freund