[PATCH] Release replication slot on error in SQL-callable slot functions
Hi Hackers,
SQL-callable replication slot functions acquire a slot (setting
the process-global MyReplicationSlot) but can then ERROR before reaching
ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
set because there is no subtransaction-level cleanup hook for replication
slots.
Any subsequent slot operation in the same session then hits
Assert(MyReplicationSlot == NULL) and crashes the backend on assert
enabled builds. In release builds the stale MyReplicationSlot is silently
overwritten,
permanently orphaning the old slot as "active." The orphaned slot blocks
any other
session from acquiring it, vacuum and WAL deletion.
Repro:
SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
DO $$ BEGIN
PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;
SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT
pg_create_logical_replication_slot('adv_test', 'test_decoding');
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line:
638, PID: 1096805
Attached a patch to address this by wrapping error-prone paths in
PG_TRY/PG_CATCH blocks
and call ReplicationSlotRelease().
Thanks,
Satya
Attachments:
v1-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patchapplication/octet-stream; name=v1-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patchDownload+94-67
On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
Hi Hackers,
SQL-callable replication slot functions acquire a slot (setting
the process-global MyReplicationSlot) but can then ERROR before reaching
ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
set because there is no subtransaction-level cleanup hook for replication
slots.Any subsequent slot operation in the same session then hits
Assert(MyReplicationSlot == NULL) and crashes the backend on assert
enabled builds. In release builds the stale MyReplicationSlot is silently overwritten,
permanently orphaning the old slot as "active." The orphaned slot blocks any other
session from acquiring it, vacuum and WAL deletion.Repro:
SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
DO $$ BEGIN
PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks
and call ReplicationSlotRelease().
Thanks for the report and the patch!
I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
a good direction for addressing the issue you reported.
+ PG_CATCH();
+ {
+ ReplicationSlotRelease();
When create_logical_replication_slot() is called with temporary = true,
the created logical replication slot has RS_TEMPORARY persistency. Such a slot
is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is
dropped via ReplicationSlotDropAcquired().
So even with the v1 patch, a temporary logical replication slot can remain
unexpectedly if pg_create_logical_replication_slot() throws an error.
In this case, should create_logical_replication_slot() explicitly drop the slot
with ReplicationSlotDropAcquired(), or temporarily change the slot persistency
to RS_EPHEMERAL before calling ReplicationSlotRelease()?
Does a newly created logical replication slot created by
pg_copy_logical_replication_slot() have the same issue?
+ PG_CATCH();
+ {
+ ReplicationSlotRelease();
Should ReplicationSlotRelease() be called only when MyReplicationSlot
is not NULL?
/* Acquire the slot so we "own" it */
ReplicationSlotAcquire(NameStr(*slotname), true, true);
- /* A slot whose restart_lsn has never been reserved cannot be advanced */
- if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
- ereport(ERROR,
- (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("replication slot \"%s\" cannot be advanced",
- NameStr(*slotname)),
- errdetail("This slot has never previously reserved
WAL, or it has been invalidated.")));
+ PG_TRY();
+ {
+ /* A slot whose restart_lsn has never been reserved cannot be
advanced */
+ if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
Shouldn't ReplicationSlotAcquire() also be moved inside the PG_TRY() block?
Because it can throw an error after setting MyReplicationSlot.
Regards,
--
Fujii Masao
On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:Hi Hackers,
SQL-callable replication slot functions acquire a slot (setting
the process-global MyReplicationSlot) but can then ERROR before reaching
ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
set because there is no subtransaction-level cleanup hook for replication
slots.Any subsequent slot operation in the same session then hits
Assert(MyReplicationSlot == NULL) and crashes the backend on assert
enabled builds. In release builds the stale MyReplicationSlot is silently overwritten,
permanently orphaning the old slot as "active." The orphaned slot blocks any other
session from acquiring it, vacuum and WAL deletion.Repro:
SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
DO $$ BEGIN
PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks
and call ReplicationSlotRelease().Thanks for the report and the patch!
I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
a good direction for addressing the issue you reported.+ PG_CATCH(); + { + ReplicationSlotRelease();When create_logical_replication_slot() is called with temporary = true,
the created logical replication slot has RS_TEMPORARY persistency. Such a slot
is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is
dropped via ReplicationSlotDropAcquired().So even with the v1 patch, a temporary logical replication slot can remain
unexpectedly if pg_create_logical_replication_slot() throws an error.
In this case, should create_logical_replication_slot() explicitly drop the slot
with ReplicationSlotDropAcquired(), or temporarily change the slot persistency
to RS_EPHEMERAL before calling ReplicationSlotRelease()?Does a newly created logical replication slot created by
pg_copy_logical_replication_slot() have the same issue?
Additionally pg_logical_slot_get_changes also has the same issue, it
can be reproduced by the following:
SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');
DO $$
BEGIN
-- This will ERROR if the slot_get changes fails for the slot.
PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
NULL, 'nonexistent-option', 'val');
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;
SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL);
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
Line: 638, PID: 80308
postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
[0x642e7b2ebae1]
postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
[0x642e7b00d732]
Regards,
Vignesh
Hi
On Wed, May 20, 2026 at 11:49 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:Hi Hackers,
SQL-callable replication slot functions acquire a slot (setting
the process-global MyReplicationSlot) but can then ERROR beforereaching
ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
EXCEPTION block (which uses a subtransaction), MyReplicationSlotremains
set because there is no subtransaction-level cleanup hook for
replication
slots.
Any subsequent slot operation in the same session then hits
Assert(MyReplicationSlot == NULL) and crashes the backend on assert
enabled builds. In release builds the stale MyReplicationSlot issilently overwritten,
permanently orphaning the old slot as "active." The orphaned slot
blocks any other
session from acquiring it, vacuum and WAL deletion.
Repro:
SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
DO $$ BEGIN
PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL,
NULL);
2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT
pg_create_logical_replication_slot('adv_test', 'test_decoding');
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
Line: 638, PID: 1096805
Attached a patch to address this by wrapping error-prone paths in
PG_TRY/PG_CATCH blocks
and call ReplicationSlotRelease().
Thanks for the report and the patch!
I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
a good direction for addressing the issue you reported.+ PG_CATCH(); + { + ReplicationSlotRelease();When create_logical_replication_slot() is called with temporary = true,
the created logical replication slot has RS_TEMPORARY persistency. Sucha slot
is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot
is
dropped via ReplicationSlotDropAcquired().
So even with the v1 patch, a temporary logical replication slot can
remain
unexpectedly if pg_create_logical_replication_slot() throws an error.
In this case, should create_logical_replication_slot() explicitly dropthe slot
with ReplicationSlotDropAcquired(), or temporarily change the slot
persistency
to RS_EPHEMERAL before calling ReplicationSlotRelease()?
Does a newly created logical replication slot created by
pg_copy_logical_replication_slot() have the same issue?Additionally pg_logical_slot_get_changes also has the same issue, it
can be reproduced by the following:
SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');DO $$
BEGIN
-- This will ERROR if the slot_get changes fails for the slot.
PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
NULL, 'nonexistent-option', 'val');
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL,
NULL);TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
Line: 638, PID: 80308
postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
[0x642e7b2ebae1]
postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
[0x642e7b00d732]
Thank you for letting me know. Fixing these cases in the next update, will
send it shortly.
Thanks,
Satya
Show quoted text
Thanks for reporting the issue. I could reproduce the same issue with
all these as well:
pg_logical_slot_peek_changes
pg_logical_slot_get_binary_changes
pg_logical_slot_peek_binary_changes
thanks
Shveta
Hi
On Fri, May 22, 2026 at 2:16 AM shveta malik <shveta.malik@gmail.com> wrote:
Thanks for reporting the issue. I could reproduce the same issue with
all these as well:pg_logical_slot_peek_changes
pg_logical_slot_get_binary_changes
pg_logical_slot_peek_binary_changes
Please find the attached v2 patch that addressed these three cases as well.
Thanks,
Satya
Attachments:
v2-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patchapplication/octet-stream; name=v2-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patchDownload+242-72
On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
Hi
On Fri, May 22, 2026 at 2:16 AM shveta malik <shveta.malik@gmail.com> wrote:
Thanks for reporting the issue. I could reproduce the same issue with
all these as well:pg_logical_slot_peek_changes
pg_logical_slot_get_binary_changes
pg_logical_slot_peek_binary_changesPlease find the attached v2 patch that addressed these three cases as well.
Thank You for addressuing these cases. A few comments:
1)
+-- Test 2: session remains usable after the error (MyReplicationSlot cleared)
It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2'
2)
--------
+-- Test 4: copy_replication_slot with max_replication_slots exceeded.
+-- We reduce max_replication_slots artificially by filling all remaining slots.
+-- Instead, trigger an error by copying to an already-existing name.
+DO $$
+BEGIN
+ PERFORM pg_copy_logical_replication_slot('regression_slot_t3',
'regression_slot_t3');
+EXCEPTION WHEN OTHERS THEN
+ RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+-- The original slot must still exist and be usable
+SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
+ WHERE slot_name = 'regression_slot_t3';
-----------
I don't think we can hit the Assert with above test (at-least I could
not). Since creation of slot itself will fail as the slot with
same-name already exists, MyReplicationSlot will never be set and thus
Assert will not be hit. A better testcase will be below which fails
during LoadOutputPlugin() after slot-creation and MyReplicationSlot is
set already.
SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding');
DO $$
BEGIN
PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot',
false, 'nonexistent_plugin');
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;
SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL);
3)
So overall these are the problematic APIs:
pg_create_logical_replication_slot
pg_replication_slot_advance
pg_copy_logical_replication_slot
pg_logical_slot_peek_binary_changes
pg_logical_slot_peek_changes
pg_logical_slot_get_changes
pg_logical_slot_get_binary_changes
First 3 are are mutually exclusive fixes fow which we have added
testcases. Last 4 are addressed by fixing common function
pg_logical_slot_get_changes_guts(). I think we should add a test case
for at-least any one of these APIs to cover
pg_logical_slot_get_changes_guts().
Thanks.
Shveta