[BUG] Assert failure in ReorderBufferReturnTXN during logical decoding due to leaked specinsert change
Hi,
We recently encountered an issue with the Postgres server. After we dropped a publication
that was being used by a subscriber, the walsender process on the publisher started crashing
with an assertion failure in `ReorderBufferReturnTXN()` while decoding changes from the replication slot.
The error message:
"TRAP: failed Assert("txn->size == 0"), File: "reorderbuffer.c", Line: 494"
Issue observed in PostgreSQL 17.7 with assert-enabled builds.
TRAP: failed Assert("txn->size == 0"), File: "reorderbuffer.c", Line: 494, PID: 96941
0 postgres ExceptionalCondition + 216
1 postgres ReorderBufferReturnTXN + 284
2 postgres ReorderBufferCleanupTXN + 1292
3 postgres ReorderBufferProcessTXN + 4304
4 postgres ReorderBufferReplay + 284
5 postgres ReorderBufferCommit + 128
6 postgres DecodeCommit + 584
7 postgres xact_decode + 408
8 postgres LogicalDecodingProcessRecord + 192
9 postgres XLogSendLogical + 204
10 postgres WalSndLoop + 256
11 postgres StartLogicalReplication + 636
12 postgres exec_replication_command + 1384
13 postgres PostgresMain + 2476
14 postgres BackendInitialize + 0
15 postgres postmaster_child_launch + 304
16 postgres BackendStartup + 448
17 postgres ServerLoop + 372
18 postgres PostmasterMain + 6396
19 postgres startup_hacks + 0
20 dyld start + 7184
Simplified steps to reproduce:
Step 1: Create a table, set up a logical replication slot, and execute an `INSERT ... ON CONFLICT ... DO UPDATE` statement.
```
CREATE TABLE test_updates (id INT PRIMARY KEY, value TEXT);
SELECT * FROM pg_create_logical_replication_slot('testing_slot', 'pgoutput');
INSERT INTO test_updates (id, value) VALUES (1, 'first_insert')
ON CONFLICT(id) DO UPDATE SET value = excluded.value;
```
Step 2: Start logical decoding using `pg_logical_slot_peek_binary_changes` on the slot,
referencing a publication (`pub_1`) that does not exist.
```
SELECT *
FROM pg_logical_slot_peek_binary_changes('testing_slot', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub_1');
```
After Step 2, the Postgres server crashes with the assertion failure shown above.
Reason for server crash:
1. In `ReorderBufferProcessTXN()`, when a `REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT`
is encountered, the change is unlinked from the list by `dlist_delete()` and stored in `specinsert`.
2. Next, when a `REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM` is encountered,
the stored `specinsert` change is retrieved and applied as a regular insert change.
3. Since publication `pub_1` does not exist, an error is raised when applying the change
and caught by the `PG_CATCH` block in `ReorderBufferProcessTXN()`. The error call stack is:
```
ReorderBufferProcessTXN()
ReorderBufferApplyChange()
→ rb->apply_change() [pgoutput_change()]
→ get_rel_sync_entry()
→ LoadPublications()
→ GetPublicationByName()
→ get_publication_oid() ← throws ERROR here
```
4. In the `PG_CATCH` block, the `specinsert` change is cleaned up only for
streaming/prepared transactions (via `ReorderBufferResetTXN()`) but not for non-streaming transactions.
In the `else` branch at line `2697`, `ReorderBufferCleanupTXN()` is called, which iterates through
the remaining changes and frees them. However, since the `specinsert` change was already unlinked
from the list, it is never found and never freed. This leaked change is still accounted for in `txn->size`,
causing the assertion failure.
Proposed Solution: Free the pending `specinsert` change in the `else` branch before calling `ReorderBufferCleanupTXN()`.
Since the `specinsert` change has already been unlinked, `ReorderBufferCleanupTXN()` cannot find or free it.
Explicitly freeing it here prevents the `Assert(txn->size == 0)` failure in `ReorderBufferReturnTXN()`.
```
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2691,12 +2691,18 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
ReorderBufferResetTXN(rb, txn, snapshot_now,
command_id, prev_lsn,
specinsert);
}
else
{
+ /* Return the spec insert change before cleaning up the transaction */
+ if (specinsert != NULL)
+ {
+ ReorderBufferReturnChange(rb, specinsert, true);
+ specinsert = NULL;
+ }
ReorderBufferCleanupTXN(rb, txn);
MemoryContextSwitchTo(ecxt);
PG_RE_THROW();
}
}
PG_END_TRY();
```
Along with this fix, we have also added a TAP test to verify in patch.
Additional Suggestion:
Currently in `PG_CATCH` block, `specinsert` is only freed in the `ERRCODE_TRANSACTION_ROLLBACK` branch
for streaming or prepared transactions, via `ReorderBufferResetTXN()` at line 2691.
Would it make sense to move the freeing of `specinsert` before the if/else branch,
so that it is always freed regardless of the error path? This would avoid duplication and ensure
that `specinsert` is always cleaned up.
Regards,
Vishal Prasanna
Zoho Corporation
Attachments:
0001-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error.patchapplication/octet-stream; name=0001-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error.patchDownload+44-1
Dear Vishal,
Thanks for reporting the issue.
Your analysis looks correct. ReorderBufferProcessTXN() rarely fails, thus the
issue may be missed. Looks like the code path you pointed out exists from the
beginning, but I found your workload could reproduce only on PG17.
Regarding the PG16 and earlier versions, the problematic Assert() in
ReorderBufferReturnTXN() does not exist. The commit dbed2e3 added it, and
it was backpatched till PG17. We deallocate ReorderBufferTXN even if the size
is not zero for such versions.
Regarding PG18 and master, the provided workload did not cause an ERROR.
7c99dc5 changed the behavior to report a WARNING instead.
Anyway, I agreed that the leak could happen across all versions.
How about fixing the issue as shown below? Thought?
1. Add an Assert() for PG16-.
2. Make sure specinsert is released for all supported versions.
3. Consider a workload that could reproduce on PG18 and master.
3-1. If found, the test code can be added for all versions.
3-2. Otherwise, the test code can be added for PG17-.
Additional Suggestion:
Currently in `PG_CATCH` block, `specinsert` is only freed in the `ERRCODE_TRANSACTION_ROLLBACK` branch
for streaming or prepared transactions, via `ReorderBufferResetTXN()` at line 2691.
Would it make sense to move the freeing of `specinsert` before the if/else branch,
so that it is always freed regardless of the error path? This would avoid duplication and ensure
that `specinsert` is always cleaned up.
It looks OK for me. In this case an argument should be reduced from
ReorderBufferResetTXN(), right? It is harmless because the function is static one.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Hi,
Thanks for the response.
Anyway, I agreed that the leak could happen across all versions.
How about fixing the issue as shown below? Thought?
1. Add an Assert() for PG16-.
Yeah, Adding `Assert(txn->size == 0)` to PG 14, 15, 16 would ensure everything
is cleaned up before freeing the `ReorderBufferTXN`. Consider adding this assert as well.
2. Make sure specinsert is released for all supported versions.
In PG 14 to 17, `ReorderBufferReturnChange()` is used to free the `ReorderBufferChange`.
In PG 18, this function was renamed to `ReorderBufferFreeChange()`.
Refer:
PG 14, 15, 16, 17: PG14-17-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patch
PG 18: PG18-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patch
The above changes cover the `specinsert` cleanup in the error path for all supported versions.
3. Consider a workload that could reproduce on PG18 and master.
3-1. If found, the test code can be added for all versions.
3-2. Otherwise, the test code can be added for PG17-.
Found a workload that can reproduce the issue across all supported versions, except PG 14.
For PG 14, since row_filter is not supported, so we can go with the 'publication does not exist' error instead.
Refer: PG14-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patch
For PG 15 - 18, using the row_filter option we can cause an error in the logical decoder.
Refer: PG15-18-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patch
Additional Suggestion:
Currently in `PG_CATCH` block, `specinsert` is only freed in the `ERRCODE_TRANSACTION_ROLLBACK` branch
for streaming or prepared transactions, via `ReorderBufferResetTXN()` at line 2691.
Would it make sense to move the freeing of `specinsert` before the if/else branch,
so that it is always freed regardless of the error path? This would avoid duplication and ensure
that `specinsert` is always cleaned up.
It looks OK for me. In this case an argument should be reduced from
ReorderBufferResetTXN(), right? It is harmless because the function is a static one.
Yes, the `specinsert` is no longer needed in `ReorderBufferResetTXN()`. Updated the patch where `specinsert` cleanup
is now handled in the `PG_CATCH()` block of `ReorderBufferProcessTXN()`, so it is always freed before the if/else branch.
Refer:
PG14-17-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patch
PG18-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patch
Regards,
Vishal Prasanna
Zoho Corporation
Attachments:
PG14-17-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG14-17-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+9-12
PG14-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG14-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patchDownload+39-2
PG15-18-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG15-18-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patchDownload+41-1
PG18-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG18-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+9-12
On 2026-Feb-25, Vishal Prasanna wrote:
It looks OK for me. In this case an argument should be reduced from
ReorderBufferResetTXN(), right? It is harmless because the function is a static one.Yes, the `specinsert` is no longer needed in
`ReorderBufferResetTXN()`. Updated the patch where `specinsert`
cleanup is now handled in the `PG_CATCH()` block of
`ReorderBufferProcessTXN()`, so it is always freed before the if/else
branch.
Please don't do this. Changing the argument list of an exported
function is an ABI break. That's an OK change to do in branch master
(to keep the interface clean), but for released branches it is not
welcome, because it causes problems for users that have extensions that
call the function and were compiled with its older definition.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Dear Álvaro, Vishal,
Yes, the `specinsert` is no longer needed in
`ReorderBufferResetTXN()`. Updated the patch where `specinsert`
cleanup is now handled in the `PG_CATCH()` block of
`ReorderBufferProcessTXN()`, so it is always freed before the if/else
branch.Please don't do this. Changing the argument list of an exported
function is an ABI break. That's an OK change to do in branch master
(to keep the interface clean), but for released branches it is not
welcome, because it causes problems for users that have extensions that
call the function and were compiled with its older definition.
To confirm, ReorderBufferResetTXN() seems a static function and proposed patch
does not modify *.h files. So they do not break the ABI and OK to remove, right?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Hayato, Álvaro,
Please don't do this. Changing the argument list of an exported
function is an ABI break. That's an OK change to do in branch master
(to keep the interface clean), but for released branches it is not
welcome, because it causes problems for users that have extensions that
call the function and were compiled with its older definition.To confirm, ReorderBufferResetTXN() seems a static function and proposed patch
does not modify *.h files. So they do not break the ABI and OK to remove, right?
```
File: reorderbuffer.c
2156: /*
2157: * Helper function for ReorderBufferProcessTXN to handle the concurrent
2158: * abort of the streaming transaction. This resets the TXN such that it
2159: * can be used to stream the remaining data of transaction being processed.
2160: * This can happen when the subtransaction is aborted and we still want to
2161: * continue processing the main or other subtransactions data.
2162: */
2163: static void
2164: ReorderBufferResetTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
```
Confirmed, `ReorderBufferResetTXN()` is a static function used only by ReorderBufferProcessTXN().
It is not exposed outside reorderbuffer.c, and the patch does not affect any public exported function.
Regards,
Vishal Prasanna
Zoho Corporation
Dear Vishal,
Thanks for the patch.
Found a workload that can reproduce the issue across all supported versions, except PG 14.
For PG 14, since row_filter is not supported, so we can go with the 'publication does not exist' error instead.
Refer: PG14-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patchFor PG 15 - 18, using the row_filter option we can cause an error in the logical decoder.
Per my understanding, for PG16-, the provided workload cannot cause an assertion
failure because it misses the Assert() in the ReorderBufferReturnTXN(), right?
Adding the line is essential, otherwise the test could pass even without the fix.
Below contains more comments. I mainly checked on PG18, so something might not
be suitable for others.
01.
I think it is better to combine tests actual code patches into one. Because they
would be done when patches are committed.
02.
This issue can happen even on HEAD, but PG18-Fix-specinsert... cannot be applied
atop the branch. Can you create it as well?
03. 100_bugs.pl
Other tests start from the comment like "The bug...", but it does not follow.
Can we update?
04. 100_bugs.pl
Can we just rotate a log instead of starting new instance? It might be faster.
05. 100_bugs.pl for PG
```
# The publication row filter WHERE ((a / 0) > 0) will trigger a division by zero error.
```
I think the comment can be improved like:
Create a publication with the zero-division row filter. It always throws an
ERROR before publishing changes, when the filter is evaluated.
Please see attached my top-up patch for PG18, it addresses comments 03-05.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
kuroda.diffsapplication/octet-stream; name=kuroda.diffsDownload+8-5
Hi Hayato,
Found a workload that can reproduce the issue across all supported versions, except PG 14.
For PG 14, since row_filter is not supported, so we can go with the 'publication does not exist' error instead.
Refer: PG14-Test-specinsert-cleanup-in-ReorderBufferProcessTXN-error-path.patchFor PG 15 - 18, using the row_filter option we can cause an error in the logical decoder.
Per my understanding, for PG16-, the provided workload cannot cause an assertion
failure because it misses the Assert() in the ReorderBufferReturnTXN(), right?
Adding the line is essential, otherwise the test could pass even without the fix.
Right. Added `Assert(txn->size == 0)` in `ReorderBufferReturnTXN()` for PG14, 15, 16 patches.
01.
I think it is better to combine tests actual code patches into one. Because they
would be done when patches are committed.
combined the fix and test into a single patch. since some changes are version specific
(like add assert check for PG14-16, different function names from PG17 -> 18, test case)
separate patches are provided per version. PG15 and PG16 share the same patch.
02.
This issue can happen even on HEAD, but PG18-Fix-specinsert... cannot be applied
atop the branch. Can you create it as well?
created a separate patch for master.
03. 100_bugs.pl
Other tests start from the comment like "The bug...", but it does not follow.
Can we update?04. 100_bugs.pl
Can we just rotate a log instead of starting new instance? It might be faster.
05. 100_bugs.pl for PG
```
# The publication row filter WHERE ((a / 0) > 0) will trigger a division by zero error.
```I think the comment can be improved like:
Create a publication with the zero-division row filter. It always throws an
ERROR before publishing changes, when the filter is evaluated.Please see attached my top-up patch for PG18, it addresses comments 03-05.
Applied your suggestions across all versions. Thanks.
Overall:
PG14: Assert check + fix + test (uses 'pub does not exist' error)
PG15-16: Assert check + fix + test (uses row_filter error)
PG17: fix + test (uses row_filter error)
PG18, master: fix + test (uses row_filter error)
Regards,
Vishal Prasanna
Zoho Corporation
Attachments:
master-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=master-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+54-12
PG14-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG14-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+54-13
PG15-16-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG15-16-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+57-12
PG17-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG17-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+54-12
PG18-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchapplication/octet-stream; name=PG18-Fix-specinsert-leak-in-ReorderBufferProcessTXN-error-path.patchDownload+54-12
Dear Vishal,
Thanks for updating the patch. They look good to me except one minor point.
```
@@ -2663,6 +2655,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
if (using_subtxn)
RollbackAndReleaseCurrentSubTransaction();
+ /* Free the specinsert change before freeing the ReorderBufferTXN */
+ if (specinsert != NULL)
+ {
+ ReorderBufferReturnChange(rb, specinsert, true);
+ specinsert = NULL;
+ }
```
In PG17-, we seem to use the term "return" to deallocate the change. Should we follow that?
I have no strong opinion for it.
I have no comments anymore. I registered your patch in commitfest [1]https://commitfest.postgresql.org/patch/6569/: and marked as
"Ready for Committer" not to forget. If needed you can register your name as author.
[1]: https://commitfest.postgresql.org/patch/6569/
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Hi Hayato,
```
@@ -2663,6 +2655,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
if (using_subtxn)
RollbackAndReleaseCurrentSubTransaction();+ /* Free the specinsert change before freeing the ReorderBufferTXN */ + if (specinsert != NULL) + { + ReorderBufferReturnChange(rb, specinsert, true); + specinsert = NULL; + } ```In PG17-, we seem to use the term "return" to deallocate the change. Should we follow that?
I have no strong opinion for it.
Internally, `ReorderBufferReturnChange()` frees the change, which is why comment uses "Free".
Either term is fine for me.
Thanks for the review and for registering the patch.
Regards,
Vishal Prasanna
Zoho Corporation
Dear Vishal,
I noticed commitfest app got angry your patch, but your patch could still be applied.
Per my understanding, cfbot tried to apply all patches to master but failed.
Here are fixed version, only suffix is changed for some patches.
Best regards,
Hayato Kuroda
FUJITSU LIMITED