FailedAssertion at ReorderBufferCheckMemoryLimit()

Started by Fujii Masaoover 5 years ago5 messages
#1Fujii Masao
masao.fujii@oss.nttdata.com

Hi,

I encountered the following assertion failure when I changed
logical_decoding_work_mem to lower value while logical replication
is running. This happend in the master branch.

TRAP: FailedAssertion("rb->size < logical_decoding_work_mem * 1024L", File: "reorderbuffer.c", Line: 2403)
0 postgres 0x000000010755bf80 ExceptionalCondition + 160
1 postgres 0x00000001072d9f81 ReorderBufferCheckMemoryLimit + 257
2 postgres 0x00000001072d9b74 ReorderBufferQueueChange + 228
3 postgres 0x00000001072cd107 DecodeInsert + 391
4 postgres 0x00000001072cc4ef DecodeHeapOp + 335
5 postgres 0x00000001072cb9e4 LogicalDecodingProcessRecord + 196
6 postgres 0x000000010730bf06 XLogSendLogical + 166
7 postgres 0x000000010730b409 WalSndLoop + 217
8 postgres 0x00000001073075fc StartLogicalReplication + 716
9 postgres 0x0000000107305d38 exec_replication_command + 1192
10 postgres 0x000000010737ea7f PostgresMain + 2463
11 postgres 0x00000001072a9c4a BackendRun + 570
12 postgres 0x00000001072a907b BackendStartup + 475
13 postgres 0x00000001072a7fe1 ServerLoop + 593
14 postgres 0x00000001072a5a5a PostmasterMain + 5898
15 postgres 0x0000000107187b59 main + 761
16 libdyld.dylib 0x00007fff6c00c3d5 start + 1

ReorderBufferCheckMemoryLimit() explains that it relies on
the following (commented) assumption. But this seems incorrect
when logical_decoding_work_mem is decreased. I wonder if we may
need to keep evicting the transactions until we don't exceed
memory limit.

/*
* And furthermore, evicting the transaction should get us below the
* memory limit again - it is not possible that we're still exceeding the
* memory limit after evicting the transaction.
*
* This follows from the simple fact that the selected transaction is at
* least as large as the most recent change (which caused us to go over
* the memory limit). So by evicting it we're definitely back below the
* memory limit.
*/
Assert(rb->size < logical_decoding_work_mem * 1024L);

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Fujii Masao (#1)
Re: FailedAssertion at ReorderBufferCheckMemoryLimit()

On Tue, Jun 9, 2020 at 1:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

Hi,

I encountered the following assertion failure when I changed
logical_decoding_work_mem to lower value while logical replication
is running. This happend in the master branch.

TRAP: FailedAssertion("rb->size < logical_decoding_work_mem * 1024L", File: "reorderbuffer.c", Line: 2403)

..

ReorderBufferCheckMemoryLimit() explains that it relies on
the following (commented) assumption. But this seems incorrect
when logical_decoding_work_mem is decreased.

Yeah, that could be a problem.

I wonder if we may
need to keep evicting the transactions until we don't exceed
memory limit.

Yes, that should be the right fix here.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#2)
1 attachment(s)
Re: FailedAssertion at ReorderBufferCheckMemoryLimit()

On Tue, Jun 9, 2020 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 9, 2020 at 1:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I wonder if we may
need to keep evicting the transactions until we don't exceed
memory limit.

Yes, that should be the right fix here.

Can you please check whether the attached patch fixes the problem for you?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

fix_rb_mem_ovfl_1.patchapplication/octet-stream; name=fix_rb_mem_ovfl_1.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 84fcd29..8579794 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2359,7 +2359,8 @@ ReorderBufferLargestTXN(ReorderBuffer *rb)
 
 /*
  * Check whether the logical_decoding_work_mem limit was reached, and if yes
- * pick the transaction to evict and spill the changes to disk.
+ * pick the largest (sub)transaction  at-a-time to evict and spill its changes to
+ * disk until we reach under the memory limit.
  *
  * XXX At this point we select just a single (largest) transaction, but
  * we might also adapt a more elaborate eviction strategy - for example
@@ -2376,30 +2377,33 @@ ReorderBufferCheckMemoryLimit(ReorderBuffer *rb)
 		return;
 
 	/*
-	 * Pick the largest transaction (or subtransaction) and evict it from
-	 * memory by serializing it to disk.
+	 * Loop until we reach under the memory limit.  One might think that just
+	 * by evicting the largest (sub)transaction we will come under the memory
+	 * limit based on assumption that the selected transaction is at least as
+	 * large as the most recent change (which caused us to go over the memory
+	 * limit). However, that is not true because a user can reduce the
+	 * logical_decoding_work_mem to a smaller value before the most recent
+	 * change.
 	 */
-	txn = ReorderBufferLargestTXN(rb);
+	while (rb->size >= logical_decoding_work_mem * 1024L)
+	{
+		/*
+		 * Pick the largest transaction (or subtransaction) and evict it from
+		 * memory by serializing it to disk.
+		 */
+		txn = ReorderBufferLargestTXN(rb);
 
-	ReorderBufferSerializeTXN(rb, txn);
+		ReorderBufferSerializeTXN(rb, txn);
 
-	/*
-	 * After eviction, the transaction should have no entries in memory, and
-	 * should use 0 bytes for changes.
-	 */
-	Assert(txn->size == 0);
-	Assert(txn->nentries_mem == 0);
+		/*
+		 * After eviction, the transaction should have no entries in memory,
+		 * and should use 0 bytes for changes.
+		 */
+		Assert(txn->size == 0);
+		Assert(txn->nentries_mem == 0);
+	}
 
-	/*
-	 * And furthermore, evicting the transaction should get us below the
-	 * memory limit again - it is not possible that we're still exceeding the
-	 * memory limit after evicting the transaction.
-	 *
-	 * This follows from the simple fact that the selected transaction is at
-	 * least as large as the most recent change (which caused us to go over
-	 * the memory limit). So by evicting it we're definitely back below the
-	 * memory limit.
-	 */
+	/* We must be under the memory limit now. */
 	Assert(rb->size < logical_decoding_work_mem * 1024L);
 }
 
#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Amit Kapila (#3)
Re: FailedAssertion at ReorderBufferCheckMemoryLimit()

On 2020/06/10 12:00, Amit Kapila wrote:

On Tue, Jun 9, 2020 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 9, 2020 at 1:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I wonder if we may
need to keep evicting the transactions until we don't exceed
memory limit.

Yes, that should be the right fix here.

Can you please check whether the attached patch fixes the problem for you?

Thanks for the patch! The patch looks good to me.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Fujii Masao (#4)
Re: FailedAssertion at ReorderBufferCheckMemoryLimit()

On Wed, Jun 10, 2020 at 9:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/06/10 12:00, Amit Kapila wrote:

On Tue, Jun 9, 2020 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 9, 2020 at 1:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

I wonder if we may
need to keep evicting the transactions until we don't exceed
memory limit.

Yes, that should be the right fix here.

Can you please check whether the attached patch fixes the problem for you?

Thanks for the patch! The patch looks good to me.

Thanks, pushed!

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com