Fix race in ReplicationSlotRelease for ephemeral slots

Started by Zhijie Hou (Fujitsu)17 days ago15 messageshackers
Jump to latest
#1Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com

Hi,

While testing the slot release logic, I noticed a bug in
ReplicationSlotRelease() where it may access a replication slot array entry that
has already been released by itself.

The detail is: When releasing an ephemeral replication slot,
ReplicationSlotRelease() first drops the slot via ReplicationSlotDropAcquired().
After this point, the slot's shared memory slot array entry can be immediately
reused by another backend creating a new slot.

However, ReplicationSlotRelease() continued executing common cleanup code that
still dereferenced the old slot pointer and updated shared memory fields such as
effective_xmin. If the slot array entry had already been reallocated, these
writes could inadvertently affect a different, unrelated slot.

I am attaching a patch that avoids touching slot shared-memory state after
dropping an ephemeral slot. Keep the post-release shared-memory updates only for
non-ephemeral slots, where the slot remains valid after release.

To reproduce, we can use the following steps:

1. Attach gdb to the backend and set a breakpoint in ReplicationSlotRelease()
right after ReplicationSlotDropAcquired() is called.
2. Create an ephemeral slot in the above backend with an invalid output plugin:
SELECT pg_create_logical_replication_slot('test_slot_dropped', 'pgoutput2', false, false, true);
3. Once the breakpoint is hit, start another backend and create a new slot
named 'test_slot_created'.
4. Release the breakpoint and allow the first backend to continue. At this
point, you will see it updating the new slot 'test_slot_created' -> active_proc
(and effective_xmin, if a snapshot is being exported) to invalid values.
5. Start a third backend and attempt to acquire the same slot
'test_slot_created' ? this should not be possible under normal circumstances,
but the bug allows it.

I haven't attached a test for this fix, as the change is straightforward and the
likelihood of encountering this bug is low, so it may not be worth adding test
cycles for it. However, if others feel differently, I'm OK to add one.

Best Regards,
Hou zj

Attachments:

v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.patchapplication/octet-stream; name=v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.patchDownload+35-34
#2Fujii Masao
masao.fujii@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#1)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Wed, May 27, 2026 at 8:50 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

Hi,

While testing the slot release logic, I noticed a bug in
ReplicationSlotRelease() where it may access a replication slot array entry that
has already been released by itself.

The detail is: When releasing an ephemeral replication slot,
ReplicationSlotRelease() first drops the slot via ReplicationSlotDropAcquired().
After this point, the slot's shared memory slot array entry can be immediately
reused by another backend creating a new slot.

However, ReplicationSlotRelease() continued executing common cleanup code that
still dereferenced the old slot pointer and updated shared memory fields such as
effective_xmin. If the slot array entry had already been reallocated, these
writes could inadvertently affect a different, unrelated slot.

Good catch!

I am attaching a patch that avoids touching slot shared-memory state after
dropping an ephemeral slot. Keep the post-release shared-memory updates only for
non-ephemeral slots, where the slot remains valid after release.

Thanks for the patch! It looks good to me.
Barring any objections, I will commit it.

I haven't attached a test for this fix, as the change is straightforward and the
likelihood of encountering this bug is low, so it may not be worth adding test
cycles for it.

+1

Regards,

--
Fujii Masao

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#2)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Fri, May 29, 2026 at 5:09 PM Fujii Masao <masao.fujii@gmail.com> wrote:

I am attaching a patch that avoids touching slot shared-memory state after
dropping an ephemeral slot. Keep the post-release shared-memory updates only for
non-ephemeral slots, where the slot remains valid after release.

Thanks for the patch! It looks good to me.
Barring any objections, I will commit it.

Since this fix should be backpatched to all supported branches,
I prepared patches for them. Attached.

Regards,

--
Fujii Masao

Attachments:

pg18-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.txttext/plain; charset=US-ASCII; name=pg18-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.txtDownload+35-34
pg14-pg16-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.txttext/plain; charset=US-ASCII; name=pg14-pg16-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.txtDownload+27-26
pg17-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.txttext/plain; charset=US-ASCII; name=pg17-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.txtDownload+39-38
#4Srinath Reddy Sadipiralla
srinath2133@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#1)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

Hi,

On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
wrote:

Hi,

While testing the slot release logic, I noticed a bug in
ReplicationSlotRelease() where it may access a replication slot array
entry that
has already been released by itself.

The detail is: When releasing an ephemeral replication slot,
ReplicationSlotRelease() first drops the slot via
ReplicationSlotDropAcquired().
After this point, the slot's shared memory slot array entry can be
immediately
reused by another backend creating a new slot.

However, ReplicationSlotRelease() continued executing common cleanup code
that
still dereferenced the old slot pointer and updated shared memory fields
such as
effective_xmin. If the slot array entry had already been reallocated, these
writes could inadvertently affect a different, unrelated slot.

I am attaching a patch that avoids touching slot shared-memory state after
dropping an ephemeral slot. Keep the post-release shared-memory updates
only for
non-ephemeral slots, where the slot remains valid after release.

To reproduce, we can use the following steps:

1. Attach gdb to the backend and set a breakpoint in
ReplicationSlotRelease()
right after ReplicationSlotDropAcquired() is called.
2. Create an ephemeral slot in the above backend with an invalid output
plugin:
SELECT pg_create_logical_replication_slot('test_slot_dropped',
'pgoutput2', false, false, true);
3. Once the breakpoint is hit, start another backend and create a new slot
named 'test_slot_created'.
4. Release the breakpoint and allow the first backend to continue. At this
point, you will see it updating the new slot 'test_slot_created' ->
active_proc
(and effective_xmin, if a snapshot is being exported) to invalid values.
5. Start a third backend and attempt to acquire the same slot
'test_slot_created' ? this should not be possible under normal
circumstances,
but the bug allows it.

patch LGTM.

I haven't attached a test for this fix, as the change is straightforward
and the
likelihood of encountering this bug is low, so it may not be worth adding
test
cycles for it. However, if others feel differently, I'm OK to add one.

+1 for a test. The fix is just an else, so a future refactor could change
it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot,
nothing
would catch it. Injection points make it deterministic; I've attached a
diff patch that adds
a test that fails without the fix and passes with it.

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Attachments:

nocfbot-test.patchapplication/octet-stream; name=nocfbot-test.patchDownload+82-0
#5Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Srinath Reddy Sadipiralla (#4)
RE: Fix race in ReplicationSlotRelease for ephemeral slots

On Saturday, May 30, 2026 1:44 AM Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:

On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <mailto:houzj.fnst@fujitsu.com> wrote:
I haven't attached a test for this fix, as the change is straightforward and the
Likelihood of encountering this bug is low, so it may not be worth adding test
cycles for it. However, if others feel differently, I'm OK to add one.

+1 for a test. The fix is just an else, so a future refactor could change it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot, nothing
would catch it. Injection points make it deterministic; I've attached a diff patch that adds
a test that fails without the fix and passes with it.

Thanks for the test.

I'm not sure if adding an injection point for this rare case is worthwhile. Even
if we were to add one, future refactoring of that function could shift the
position of the injection point, so its long-term usefulness is uncertain. I
don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.

Best Regards,
Hou zj

#6Xuneng Zhou
xunengzhou@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#5)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

Hi,

On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Saturday, May 30, 2026 1:44 AM Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:

On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <mailto:houzj.fnst@fujitsu.com> wrote:
I haven't attached a test for this fix, as the change is straightforward and the
Likelihood of encountering this bug is low, so it may not be worth adding test
cycles for it. However, if others feel differently, I'm OK to add one.

+1 for a test. The fix is just an else, so a future refactor could change it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot, nothing
would catch it. Injection points make it deterministic; I've attached a diff patch that adds
a test that fails without the fix and passes with it.

Thanks for the test.

I'm not sure if adding an injection point for this rare case is worthwhile. Even
if we were to add one, future refactoring of that function could shift the
position of the injection point, so its long-term usefulness is uncertain. I
don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.

Thanks for reporting this issue. I can reproduce it with lldb on Mac.

postgres=# SELECT pg_create_logical_replication_slot(
postgres(# 'test_slot_dropped',
postgres(# 'pgoutput2',
postgres(# false,
postgres(# false,
postgres(# true
postgres(# );
ERROR: could not access file "pgoutput2": No such file or directory

787 * decoding be disabled.
788 */
789 ReplicationSlotDropAcquired(is_logical);
-> 790 }
791
792 /*
793 * If slot needed to temporarily restrain both data and catalog xmin to
Target 0: (postgres) stopped.
(lldb) expr -- slot->data.name.data
(char[64]) $0 = "test_slot_created"
(lldb) expr -- slot->data.persistency
(ReplicationSlotPersistency) $1 = RS_PERSISTENT
(lldb) expr -- slot->active_proc
(ProcNumber) $2 = 126

The fix looks good to me.

There's an adjacent bug around drop_local_obsolete_slots. The root
cause of them looks similar -- ReplicationSlot * is a pointer to a
reusable shared-memory array cell, not a durable identity for the same
slot. In drop_local_obsolete_slots, the issue is that the slot has
been freed after ReplicationSlotDropAcquired(false); however, another
backend may reuse the same cell before the unlock/log reads. This
seems less severe -- it does not normally corrupt slot state, because
the code only read after the drop. But it can still unlock/log
misusing the identity of a different slot. Attached a test using
injection point to reproduce it and a patch to fix it.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachments:

repro_slotsync_stale_pointer.pltext/x-perl-script; charset=US-ASCII; name=repro_slotsync_stale_pointer.plDownload
v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patchapplication/octet-stream; name=v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patchDownload+16-10
#7Fujii Masao
masao.fujii@gmail.com
In reply to: Xuneng Zhou (#6)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

Hi,

On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I'm not sure if adding an injection point for this rare case is worthwhile. Even
if we were to add one, future refactoring of that function could shift the
position of the injection point, so its long-term usefulness is uncertain. I
don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.

I've pushed the patch. Thanks!

IMO the proposed test looks a bit too narrow, so I'm not sure it's worth
adding at this point. For now, I've committed only the code fix.

There's an adjacent bug around drop_local_obsolete_slots. The root
cause of them looks similar -- ReplicationSlot * is a pointer to a
reusable shared-memory array cell, not a durable identity for the same
slot. In drop_local_obsolete_slots, the issue is that the slot has
been freed after ReplicationSlotDropAcquired(false); however, another
backend may reuse the same cell before the unlock/log reads. This
seems less severe -- it does not normally corrupt slot state, because
the code only read after the drop. But it can still unlock/log
misusing the identity of a different slot. Attached a test using
injection point to reproduce it and a patch to fix it.

Thanks again for the report and patch!

  /* Drop the local slot if it is not required to be retained. */
  if (!local_sync_slot_required(local_slot, remote_slot_list))
  {
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
  bool synced_slot;

Is it really safe to read slot_database before acquiring the database lock?

BTW, I'm also wondering whether it's safe for
local_sync_slot_required() to access local_slot without holding a lock.

Regards,

--
Fujii Masao

#8Xuneng Zhou
xunengzhou@gmail.com
In reply to: Fujii Masao (#7)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

Hi Fujii-san,

On Wed, Jun 3, 2026 at 8:03 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

Hi,

On Sat, May 30, 2026 at 4:14 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

I'm not sure if adding an injection point for this rare case is worthwhile. Even
if we were to add one, future refactoring of that function could shift the
position of the injection point, so its long-term usefulness is uncertain. I
don't have a strong opinion on this, so I'll leave it to Fujii-San to decide.

I've pushed the patch. Thanks!

IMO the proposed test looks a bit too narrow, so I'm not sure it's worth
adding at this point. For now, I've committed only the code fix.

There's an adjacent bug around drop_local_obsolete_slots. The root
cause of them looks similar -- ReplicationSlot * is a pointer to a
reusable shared-memory array cell, not a durable identity for the same
slot. In drop_local_obsolete_slots, the issue is that the slot has
been freed after ReplicationSlotDropAcquired(false); however, another
backend may reuse the same cell before the unlock/log reads. This
seems less severe -- it does not normally corrupt slot state, because
the code only read after the drop. But it can still unlock/log
misusing the identity of a different slot. Attached a test using
injection point to reproduce it and a patch to fix it.

Thanks again for the report and patch!

/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
bool synced_slot;

Is it really safe to read slot_database before acquiring the database lock?

Reading slot_database before taking the database lock seems not
inherently unsafe by itself. The comment suggests that the lock is
primarily used to prevent conflicts with the startup process running
ReplicationSlotsDropDBSlots() during db-drop replay; it does not
protect replication slot array reuse.

The unsafe part could be reading slot_database from local_slot after
ReplicationSlotControlLock has been released. At this point, the slot
array cell may already have been freed and reused, so the value read
may no longer belong to the slot that get_local_synced_slots()
originally collected. As a result, we could end up locking the wrong
database.

There seems to be two related issues:

1) Before drop: reading local_slot->data.database /
local_slot->data.name after the slot-array lock was released, before
verifying the cell still represents the same synced slot.

2) After drop: calling ReplicationSlotDropAcquired(false) and then
reading local_slot->data.database / local_slot->data.name for
unlocking/logging after the cell may have been reused by another
backend.

The prior patch targets to fix 2), but leaves 1) unfixed.

BTW, I'm also wondering whether it's safe for
local_sync_slot_required() to access local_slot without holding a lock.

I share the same concern here. The issue smells similar to the one
discussed above -- reading values from the array cell directly without
the protection of array lock. To solve them altogether, it might be
better to stop carrying raw ReplicationSlot * values across
unprotected windows. We can get the snapshot values such as slot name
& database oid from get_local_synced_slots() instead. Then
local_sync_slot_required() works from the snapshot, and
drop_local_obsolete_slots() uses the copied database OID to take the
database lock. Before dropping, it should revalidate by copied
name/database that the slot is still a synced logical slot, then
acquire/drop by copied name, and use only copied values for
unlock/logging. I plan to prepare a refactoring patch for this. Does
that seem like the right direction to you?

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

#9Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Xuneng Zhou (#8)
RE: Fix race in ReplicationSlotRelease for ephemeral slots

On Friday, June 5, 2026 8:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Wed, Jun 3, 2026 at 8:03 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou@gmail.com>

wrote:

/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
bool synced_slot;

Is it really safe to read slot_database before acquiring the database lock?

Reading slot_database before taking the database lock seems not
inherently unsafe by itself. The comment suggests that the lock is
primarily used to prevent conflicts with the startup process running
ReplicationSlotsDropDBSlots() during db-drop replay; it does not
protect replication slot array reuse.

The unsafe part could be reading slot_database from local_slot after
ReplicationSlotControlLock has been released. At this point, the slot
array cell may already have been freed and reused, so the value read
may no longer belong to the slot that get_local_synced_slots()
originally collected. As a result, we could end up locking the wrong
database.

There seems to be two related issues:

1) Before drop: reading local_slot->data.database /
local_slot->data.name after the slot-array lock was released, before
verifying the cell still represents the same synced slot.

I recall condition (1) is considered acceptable, since the database lock is
released immediately after re-verifying that the slot is no longer the original
'synced' one anyway. Additionally, this race can only occur when replaying a
DROP DATABASE, which is rare in practice. Since we only take a shared lock, it
does not seem to cause real issues.

2) After drop: calling ReplicationSlotDropAcquired(false) and then
reading local_slot->data.database / local_slot->data.name for
unlocking/logging after the cell may have been reused by another
backend.

Right. I missed this race condition during implementation and agree it's a real
issue. An even bigger problem is that we could release a lock on the wrong
database if the slot entry is reused after being dropped. I think we should fix
this by saving the database OID at the beginning, as your patch does.

The prior patch targets to fix 2), but leaves 1) unfixed.

BTW, I'm also wondering whether it's safe for
local_sync_slot_required() to access local_slot without holding a lock.

I share the same concern here. The issue smells similar to the one
discussed above -- reading values from the array cell directly without
the protection of array lock.

To solve them altogether, it might be
better to stop carrying raw ReplicationSlot * values across
unprotected windows. We can get the snapshot values such as slot name
& database oid from get_local_synced_slots() instead. Then
local_sync_slot_required() works from the snapshot, and
drop_local_obsolete_slots() uses the copied database OID to take the
database lock. Before dropping, it should revalidate by copied
name/database that the slot is still a synced logical slot, then
acquire/drop by copied name, and use only copied values for
unlock/logging. I plan to prepare a refactoring patch for this. Does
that seem like the right direction to you?

Saving only the name and database OID would force us to search for the slot
again in the loop, which was one reason we didn't implement it that way.

Best Regards,
Hou zj

#10Xuneng Zhou
xunengzhou@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#9)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Sat, Jun 6, 2026 at 5:35 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Friday, June 5, 2026 8:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Wed, Jun 3, 2026 at 8:03 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou@gmail.com>

wrote:

/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
bool synced_slot;

Is it really safe to read slot_database before acquiring the database lock?

Reading slot_database before taking the database lock seems not
inherently unsafe by itself. The comment suggests that the lock is
primarily used to prevent conflicts with the startup process running
ReplicationSlotsDropDBSlots() during db-drop replay; it does not
protect replication slot array reuse.

The unsafe part could be reading slot_database from local_slot after
ReplicationSlotControlLock has been released. At this point, the slot
array cell may already have been freed and reused, so the value read
may no longer belong to the slot that get_local_synced_slots()
originally collected. As a result, we could end up locking the wrong
database.

There seems to be two related issues:

1) Before drop: reading local_slot->data.database /
local_slot->data.name after the slot-array lock was released, before
verifying the cell still represents the same synced slot.

I recall condition (1) is considered acceptable, since the database lock is
released immediately after re-verifying that the slot is no longer the original
'synced' one anyway. Additionally, this race can only occur when replaying a
DROP DATABASE, which is rare in practice. Since we only take a shared lock, it
does not seem to cause real issues.

2) After drop: calling ReplicationSlotDropAcquired(false) and then
reading local_slot->data.database / local_slot->data.name for
unlocking/logging after the cell may have been reused by another
backend.

Right. I missed this race condition during implementation and agree it's a real
issue. An even bigger problem is that we could release a lock on the wrong
database if the slot entry is reused after being dropped. I think we should fix
this by saving the database OID at the beginning, as your patch does.

The prior patch targets to fix 2), but leaves 1) unfixed.

BTW, I'm also wondering whether it's safe for
local_sync_slot_required() to access local_slot without holding a lock.

I share the same concern here. The issue smells similar to the one
discussed above -- reading values from the array cell directly without
the protection of array lock.

To solve them altogether, it might be
better to stop carrying raw ReplicationSlot * values across
unprotected windows. We can get the snapshot values such as slot name
& database oid from get_local_synced_slots() instead. Then
local_sync_slot_required() works from the snapshot, and
drop_local_obsolete_slots() uses the copied database OID to take the
database lock. Before dropping, it should revalidate by copied
name/database that the slot is still a synced logical slot, then
acquire/drop by copied name, and use only copied values for
unlock/logging. I plan to prepare a refactoring patch for this. Does
that seem like the right direction to you?

Saving only the name and database OID would force us to search for the slot
again in the loop, which was one reason we didn't implement it that way.

The extra search may not be ideal, especially for the worker with a
large number of slots to sync. But the cost could be avoided with an
extra field like slotno. Were there other blocking issues discussed
before?

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#9)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Sat, Jun 6, 2026 at 3:05 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Friday, June 5, 2026 8:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Wed, Jun 3, 2026 at 8:03 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou@gmail.com>

wrote:

/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
bool synced_slot;

Is it really safe to read slot_database before acquiring the database lock?

Reading slot_database before taking the database lock seems not
inherently unsafe by itself. The comment suggests that the lock is
primarily used to prevent conflicts with the startup process running
ReplicationSlotsDropDBSlots() during db-drop replay; it does not
protect replication slot array reuse.

The unsafe part could be reading slot_database from local_slot after
ReplicationSlotControlLock has been released. At this point, the slot
array cell may already have been freed and reused, so the value read
may no longer belong to the slot that get_local_synced_slots()
originally collected. As a result, we could end up locking the wrong
database.

There seems to be two related issues:

1) Before drop: reading local_slot->data.database /
local_slot->data.name after the slot-array lock was released, before
verifying the cell still represents the same synced slot.

I recall condition (1) is considered acceptable, since the database lock is
released immediately after re-verifying that the slot is no longer the original
'synced' one anyway. Additionally, this race can only occur when replaying a
DROP DATABASE, which is rare in practice. Since we only take a shared lock, it
does not seem to cause real issues.

It seems that (1) is talking about the access to local_slot->data.name
before we acquire database lock in local_sync_slot_required() whereas
your response doesn't seem to address that concern. If not, then how
exactly does the database lock protect what we are doing in
local_sync_slot_required()?

--
With Regards,
Amit Kapila.

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#11)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Thu, Jun 11, 2026 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jun 6, 2026 at 3:05 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Friday, June 5, 2026 8:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

On Wed, Jun 3, 2026 at 8:03 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Tue, Jun 2, 2026 at 3:00 PM Xuneng Zhou <xunengzhou@gmail.com>

wrote:

/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ bool dropped = false;
+ NameData slot_name = {0};
+ Oid slot_database = local_slot->data.database;
bool synced_slot;

Is it really safe to read slot_database before acquiring the database lock?

Reading slot_database before taking the database lock seems not
inherently unsafe by itself. The comment suggests that the lock is
primarily used to prevent conflicts with the startup process running
ReplicationSlotsDropDBSlots() during db-drop replay; it does not
protect replication slot array reuse.

The unsafe part could be reading slot_database from local_slot after
ReplicationSlotControlLock has been released. At this point, the slot
array cell may already have been freed and reused, so the value read
may no longer belong to the slot that get_local_synced_slots()
originally collected. As a result, we could end up locking the wrong
database.

There seems to be two related issues:

1) Before drop: reading local_slot->data.database /
local_slot->data.name after the slot-array lock was released, before
verifying the cell still represents the same synced slot.

I recall condition (1) is considered acceptable, since the database lock is
released immediately after re-verifying that the slot is no longer the original
'synced' one anyway. Additionally, this race can only occur when replaying a
DROP DATABASE, which is rare in practice. Since we only take a shared lock, it
does not seem to cause real issues.

It seems that (1) is talking about the access to local_slot->data.name
before we acquire database lock in local_sync_slot_required() whereas
your response doesn't seem to address that concern. If not, then how
exactly does the database lock protect what we are doing in
local_sync_slot_required()?

I re-analyzed this case and found the 'Before-drop' case is safe. In
the gap between get_local_synced_slots() releasing
ReplicationSlotControlLock and LockSharedObject,
ReplicationSlotsDropDBSlots() can run and free a synced slot cell
because the slotsync worker holds no database lock yet. The cell can
then be reused by any user-created (non-synced) slot. It could lead to
following risks which I think are already addressed due to recheck of
sync flag.

1. Stale name read in local_sync_slot_required(): The reused cell
holds a different name. local_sync_slot_required() might return false
(drop needed). But then the in_use && synced spinlock check sees
synced = false and skips the actual drop. The wrong decision is
caught.
2. Wrong database OID read at line 551: The reused cell holds OID_B
from the new slot. We lock OID_B, then at lines 563–565 we see synced
= false, skip the drop, and unlock OID_B at line 579. Since no drop
occurred, the cell is still the same non-synced slot, so the lock and
unlock see the same OID_B. Symmetric — no lock leak.
3. Acquiring the wrong slot at line 575: Once the shared database lock
is held at line 551, ReplicationSlotsDropDBSlots() is blocked from
freeing the cell. The slotsync worker itself won't free a synced slot
from any other code path while inside this function. So, this should
not happen.

Does this match your analysis? If so, After-drop case is still a risk,
and for that, the patch proposed in email [1]/messages/by-id/CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com seems to address it.

[1]: /messages/by-id/CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#13Fujii Masao
masao.fujii@gmail.com
In reply to: Amit Kapila (#12)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Thu, Jun 11, 2026 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1. Stale name read in local_sync_slot_required(): The reused cell
holds a different name. local_sync_slot_required() might return false
(drop needed). But then the in_use && synced spinlock check sees
synced = false and skips the actual drop. The wrong decision is
caught.

Yes, we could skip the actual drop. But then wouldn't we still emit
the log message "dropped replication slot ..." even though no slot was
actually dropped?

2. Wrong database OID read at line 551: The reused cell holds OID_B
from the new slot. We lock OID_B, then at lines 563–565 we see synced
= false, skip the drop, and unlock OID_B at line 579. Since no drop
occurred, the cell is still the same non-synced slot, so the lock and
unlock see the same OID_B. Symmetric — no lock leak.

What happens if the slot for OID_B is dropped after we lock
OID_B, and then a new slot for OID_C reuses the same array entry? In
that case, wouldn't the later unlock read OID_C from
local_slot->data.database even though the lock was originally taken on
OID_B?

Regards,

--
Fujii Masao

#14Xuneng Zhou
xunengzhou@gmail.com
In reply to: Fujii Masao (#13)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

Hi Fujii-san, Amit,

The issues look real to me and could be dealt with patch v1 partially.

On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jun 11, 2026 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1. Stale name read in local_sync_slot_required(): The reused cell
holds a different name. local_sync_slot_required() might return false
(drop needed). But then the in_use && synced spinlock check sees
synced = false and skips the actual drop. The wrong decision is
caught.

Yes, we could skip the actual drop. But then wouldn't we still emit
the log message "dropped replication slot ..." even though no slot was
actually dropped?

With v1, we won't emit the log message unless the log is factually
dropped. However it did not prevent the stale read in
local_sync_slot_required().

2. Wrong database OID read at line 551: The reused cell holds OID_B
from the new slot. We lock OID_B, then at lines 563–565 we see synced
= false, skip the drop, and unlock OID_B at line 579. Since no drop
occurred, the cell is still the same non-synced slot, so the lock and
unlock see the same OID_B. Symmetric — no lock leak.

What happens if the slot for OID_B is dropped after we lock
OID_B, and then a new slot for OID_C reuses the same array entry? In
that case, wouldn't the later unlock read OID_C from
local_slot->data.database even though the lock was originally taken on
OID_B?

V1 stops doing the venerable second read of local_slot->data.database.
So if the copied value was already stale and points to OID_B, v1 is at
least symmetric:

read OID_B once
lock OID_B
cell reused as OID_C
unlock OID_B

But v1 seems not to fully solve issue 1.

It can still do this:

cell already reused before slot_database is copied
v1 copies OID_B from replacement slot
locks OID_B
recheck sees synced=false
skips drop
unlocks OID_B

That is still a stale read and possibly a wasted/wrong database lock,
but it doesn't leak the lock, unlocks the wrong object, logs a false
drop, or drops the replacement slot.

In an off-list chat with Zhijie, we kinda thought that holding the
lock of a wrong db for a brief time doesn't seem to harm a lot. The
concurrent dropping-db operation leads to this issue seems rare in
practice. He stated that the deletion of the slot seems unavoidable
because we have to acquire the database lock after releasing the
replication slot lock to avoid the deadlock with the startup/drop db
operation. Therefore, he prefered keeping the design simple and
avoiding the fatal issue over doing a broader refactoring work. I
don't have a strong opinion on this. Still attaching the refactoring
patch to do some clean-up in case someone thinks it's worthwhile.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachments:

v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patchapplication/octet-stream; name=v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patchDownload+16-10
v1-0002-Avoid-stale-slot-pointers-in-slotsync-cleanup.patchapplication/octet-stream; name=v1-0002-Avoid-stale-slot-pointers-in-slotsync-cleanup.patchDownload+167-40
#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Xuneng Zhou (#14)
Re: Fix race in ReplicationSlotRelease for ephemeral slots

On Fri, Jun 12, 2026 at 8:22 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:

The issues look real to me and could be dealt with patch v1 partially.

On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Jun 11, 2026 at 8:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1. Stale name read in local_sync_slot_required(): The reused cell
holds a different name. local_sync_slot_required() might return false
(drop needed). But then the in_use && synced spinlock check sees
synced = false and skips the actual drop. The wrong decision is
caught.

Yes, we could skip the actual drop. But then wouldn't we still emit
the log message "dropped replication slot ..." even though no slot was
actually dropped?

With v1, we won't emit the log message unless the log is factually
dropped. However it did not prevent the stale read in
local_sync_slot_required().

2. Wrong database OID read at line 551: The reused cell holds OID_B
from the new slot. We lock OID_B, then at lines 563–565 we see synced
= false, skip the drop, and unlock OID_B at line 579. Since no drop
occurred, the cell is still the same non-synced slot, so the lock and
unlock see the same OID_B. Symmetric — no lock leak.

What happens if the slot for OID_B is dropped after we lock
OID_B, and then a new slot for OID_C reuses the same array entry? In
that case, wouldn't the later unlock read OID_C from
local_slot->data.database even though the lock was originally taken on
OID_B?

V1 stops doing the venerable second read of local_slot->data.database.
So if the copied value was already stale and points to OID_B, v1 is at
least symmetric:

read OID_B once
lock OID_B
cell reused as OID_C
unlock OID_B

But v1 seems not to fully solve issue 1.

It can still do this:

cell already reused before slot_database is copied
v1 copies OID_B from replacement slot
locks OID_B
recheck sees synced=false
skips drop
unlocks OID_B

That is still a stale read and possibly a wasted/wrong database lock,
but it doesn't leak the lock, unlocks the wrong object, logs a false
drop, or drops the replacement slot.

In an off-list chat with Zhijie, we kinda thought that holding the
lock of a wrong db for a brief time doesn't seem to harm a lot. The
concurrent dropping-db operation leads to this issue seems rare in
practice. He stated that the deletion of the slot seems unavoidable
because we have to acquire the database lock after releasing the
replication slot lock to avoid the deadlock with the startup/drop db
operation. Therefore, he prefered keeping the design simple and
avoiding the fatal issue over doing a broader refactoring work.

+1. I also think this change is not worth it.

I

don't have a strong opinion on this. Still attaching the refactoring
patch to do some clean-up in case someone thinks it's worthwhile.

I feel even if there is an argument to do such a refactoring, it can
be done separately. We can push forward with 0001 and then do more
discussion for 0002, if required. I can take care of 0001 unless
Fujii-San wishes to take care of it?

--
With Regards,
Amit Kapila.