There is a defect in the ReplicationSlotCreate() function where it iterates through ReplicationSlotCtl->replication_slots[max_replication_slots] to find a slot but does not break out of the loop when a slot is found.
On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:
There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses the ReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it should break the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources.
We could exit early, but any system which max_replication_slots set high enough
that the savings are measurable in a non-hot codepath is likely having other
performance problems as well (max_replication_slots is by default 10).
If you feel strongly about I suggest creating a patch and submitting. Also,
please don't send 6Mb images of text when simply copy/paste of the text would
have worked even better.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:
There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses the ReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it should break the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources.
We could exit early, but any system which max_replication_slots set high enough
that the savings are measurable in a non-hot codepath is likely having other
performance problems as well (max_replication_slots is by default 10).
Are we reading the same code?
for (i = 0; i < max_replication_slots; i++)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("replication slot \"%s\" already exists", name)));
if (!s->in_use && slot == NULL)
slot = s;
}
In the first place, breaking early would be wrong: we would fail to
scan all slots to check for duplicate name. In the second place,
the code does choose the first not-in-use slot, because of the
check for "slot == NULL" before changing the value of "slot".
If we intended to support hundreds or thousands of replication
slots, it'd perhaps be worthwhile to replace this with a hashtable
lookup. But that doesn't seem like a very credible use-case.
For typical numbers of slots this way is likely faster than
messing with a hashtable.
regards, tom lane
On 14 Jan 2025, at 15:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:
There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses the ReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it should break the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources.
We could exit early, but any system which max_replication_slots set high enough
that the savings are measurable in a non-hot codepath is likely having other
performance problems as well (max_replication_slots is by default 10).Are we reading the same code?
for (i = 0; i < max_replication_slots; i++)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("replication slot \"%s\" already exists", name)));
if (!s->in_use && slot == NULL)
slot = s;
}In the first place, breaking early would be wrong: we would fail to
scan all slots to check for duplicate name. In the second place,
the code does choose the first not-in-use slot, because of the
check for "slot == NULL" before changing the value of "slot".
My interpretation of the proposal was that once the first not-in-use slot
chosen and slot set to s, the for loop can be exited since no further slot
assignment will happen. You are right though that we still need to inspect all
replication slots to check for duplicates.
--
Daniel Gustafsson
<div style="min-height:22px;margin-bottom:8px;"><br></div><div style="min-height:22px;margin-bottom:8px;">Yes, it's this piece of code. Thank you for your patient explanation. What you said makes a lot of sense. Even after obtaining the available slots, we still need to traverse and check if there are any duplicate slots.</div><div id="QQMailSignature" class="mail-footer" aria-hidden="true"><hr style="margin: 0 0 10px 0;border: 0;border-bottom:1px solid #E6E8EB;height:0;line-height:0;font-size:0;padding: 20px 0 0 0;width: 50px;">发自我的iPhone</div><div id="original-content"><br><br><div class="xm_mail_oringinal_describe"><div style="font-size:70%;padding:2px 0;">------------------ Original ------------------</div><div style="font-size:70%;background:#f0f0f0;color:#212121;padding:8px;border-radius:4px"><div><b>From:</b> Daniel Gustafsson <daniel@yesql.se></div><div><b>Date:</b> Tue,Jan 14,2025 6:57 PM</div><div><b>To:</b> lxiaogang5 <lxiaogang5@gmail.com></div><div><b>Cc:</b> pgsql-bugs <pgsql-bugs@postgresql.org></div><div><b>Subject:</b> Re: There is a defect in the ReplicationSlotCreate() function whereit iterates throughReplicationSlotCtl->replication_slots[max_replication_slots] to find a slotbut does not break out of the loop when a slot is found.</div></div></div><br><pre style="white-space:pre-wrap;">> On 14 Jan 2025, at 05:24, lxiaogang5 <lxiaogang5@gmail.com> wrote:
> There is a logical defect in the function ReplicationSlotCreate (creating a new slot) when it internally traverses the ReplicationSlotCtl->replication_slots[max_replication_slots] array. When an available slot is found (slot = s), it should break the current for loop. This issue still exists in the latest code, resulting in wasted CPU resources.
We could exit early, but any system which max_replication_slots set high enough
that the savings are measurable in a non-hot codepath is likely having other
performance problems as well (max_replication_slots is by default 10).
If you feel strongly about I suggest creating a patch and submitting. Also,
please don't send 6Mb images of text when simply copy/paste of the text would
have worked even better.
--
Daniel Gustafsson
</pre></div>