Replication slot stats misgivings
Hi,
I started to write this as a reply to
/messages/by-id/20210318015105.dcfa4ceybdjubf2i@alap3.anarazel.de
but I think it doesn't really fit under that header anymore.
On 2021-03-17 18:51:05 -0700, Andres Freund wrote:
It does make it easier for the shared memory stats patch, because if
there's a fixed number + location, the relevant stats reporting doesn't
need to go through a hashtable with the associated locking. I guess
that may have colored my perception that it's better to just have a
statically sized memory allocation for this. Noteworthy that SLRU stats
are done in a fixed size allocation as well...
As part of reviewing the replication slot stats patch I looked at
replication slot stats a fair bit, and I've a few misgivings. First,
about the pgstat.c side of things:
- If somehow slot stat drop messages got lost (remember pgstat
communication is lossy!), we'll just stop maintaining stats for slots
created later, because there'll eventually be no space for keeping
stats for another slot.
- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.
- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.
- We do a linear search through all replication slots whenever receiving
stats for a slot. Even though there'd be a perfectly good index to
just use all throughout - the slots index itself. It looks to me like
slots stat reports can be fairly frequent in some workloads, so that
doesn't seem great.
- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?
- pgstat_report_replslot() already has a lot of stats parameters, it
seems likely that we'll get more. Seems like we should just use a
struct of stats updates.
And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.
- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.
I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.
That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.
It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a
I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.
Greetings,
Andres Freund
On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:
And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.
We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.
- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.
Won't spill_bytes and stream_bytes will give you the amount of data sent?
I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.
This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that. What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?
It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a
Here, your last sentence seems to be incomplete.
I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.
But how will you detect in your idea that some of the stats from the
already dropped slot?
I'll create an entry for this in PG14 Open items wiki.
--
With Regards,
Amit Kapila.
On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:
And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.Won't spill_bytes and stream_bytes will give you the amount of data sent?
I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.
What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.
What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?
A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.
--
With Regards,
Amit Kapila.
Hi,
On 2021-03-20 09:25:40 +0530, Amit Kapila wrote:
On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:
And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.
It seems like the obvious answer here is to sync stats when releasing
the slot?
- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.Won't spill_bytes and stream_bytes will give you the amount of data sent?
I don't think either tracks changes that were neither spilled nor
streamed? And if they are, they're terribly misnamed?
I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot?
I think it's pretty easy to make that bulletproof. Add a
pgstat_report_replslot_create(), and use that in
ReplicationSlotCreate(). That is called with
ReplicationSlotAllocationLock held, so it can just safely zero out stats.
I don't think:
It has mixed of stats from the old and new slot.
Can happen in that scenario.
It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use aHere, your last sentence seems to be incomplete.
Oops, I was planning to suggest adding pgstat_report_replslot_create()
that zeroes out the pre-existing stats (or a parameter to
pgstat_report_replslot(), but I don't think that's better).
I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.But how will you detect in your idea that some of the stats from the
already dropped slot?
I don't think that is possible with my sketch?
Greetings,
Andres Freund
Hi,
On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.
pgstat_report_replslot_drop() already prevents that, no?
Greetings,
Andres Freund
On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.pgstat_report_replslot_drop() already prevents that, no?
Yeah, normally it would prevent that but what if a drop message is lost?
--
With Regards,
Amit Kapila.
On Sun, Mar 21, 2021 at 2:56 AM Andres Freund <andres@anarazel.de> wrote:
On 2021-03-20 09:25:40 +0530, Amit Kapila wrote:
On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:
And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.It seems like the obvious answer here is to sync stats when releasing
the slot?
Okay, that makes sense.
- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.Won't spill_bytes and stream_bytes will give you the amount of data sent?
I don't think either tracks changes that were neither spilled nor
streamed? And if they are, they're terribly misnamed?
Right, it won't track such changes but we can track that as well and I
understand it will be good to track that information. I think we were
too focused on stats for newly introduced features that we forget
about the non-spilled and non-streamed xacts.
Note - I have now created an entry for this in PG14 Open Items [1]https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items.
[1]: https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items
--
With Regards,
Amit Kapila.
Hi,
On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:
On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:
On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.pgstat_report_replslot_drop() already prevents that, no?
Yeah, normally it would prevent that but what if a drop message is lost?
That already exists as a danger, no? pgstat_recv_replslot() uses
pgstat_replslot_index() to find the slot by name. So if a drop message
is lost we'd potentially accumulate into stats of an older slot. It'd
probably a lower risk with what I suggested, because the initial stat
report slot.c would use something like pgstat_report_replslot_create(),
which the stats collector can use to reset the stats to 0?
If we do it right the lossiness will be removed via shared memory stats
patch... But architecturally the name based lookup and unpredictable
number of stats doesn't fit in super well.
Greetings,
Andres Freund
On Mon, Mar 22, 2021 at 3:10 AM Andres Freund <andres@anarazel.de> wrote:
On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:
On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:
On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.pgstat_report_replslot_drop() already prevents that, no?
Yeah, normally it would prevent that but what if a drop message is lost?
That already exists as a danger, no? pgstat_recv_replslot() uses
pgstat_replslot_index() to find the slot by name. So if a drop message
is lost we'd potentially accumulate into stats of an older slot. It'd
probably a lower risk with what I suggested, because the initial stat
report slot.c would use something like pgstat_report_replslot_create(),
which the stats collector can use to reset the stats to 0?
okay, but I guess if we miss the create message as well then we will
have a similar danger. I think the benefit your idea will bring is to
use index-based lookup instead of name-based lookup. IIRC, we have
initially used the name here because we thought there is nothing like
OID for slots but your suggestion of using
ReplicationSlotCtl->replication_slots can address that.
If we do it right the lossiness will be removed via shared memory stats
patch...
Okay.
--
With Regards,
Amit Kapila.
On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.
I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?
- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.
Agreed.
- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?
That's because we followed other definitions in pgstat.h that use
char[NAMEDATALEN]. I'm okay with using NameData.
- pgstat_report_replslot() already has a lot of stats parameters, it
seems likely that we'll get more. Seems like we should just use a
struct of stats updates.
Agreed.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?
This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.
I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returned
entries in the directory). Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.
I think this can happen only if the drop message is lost, right?
I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returned
entries in the directory). Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?
I wonder how in this scheme, we will remove the risk of running out of
'replSlotStats' and still restore correct stats assuming the drop
message is lost? Do we want to check after restoring each slot info
whether the slot with that name exists?
--
With Regards,
Amit Kapila.
On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.I think this can happen only if the drop message is lost, right?
Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.
I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returned
entries in the directory). Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?I wonder how in this scheme, we will remove the risk of running out of
'replSlotStats' and still restore correct stats assuming the drop
message is lost? Do we want to check after restoring each slot info
whether the slot with that name exists?
Yeah, I think we need such a check at least if the number of slot
stats in the stats file is larger than max_replication_slots. Or we
can do that at every startup to remove orphaned slot stats.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Hi,
On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:
On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:
- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.I think this can happen only if the drop message is lost, right?
Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.
Or if slots are deleted on the file-system while the cluster is
shutdown. Which obviously is at best a semi-supported thing, but it
normally does work.
I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returneda
entries in the directory).
Very good point. Even if readdir() order were fixed, we'd still have the
problem because there can be "gaps" in the indexes for slots
(e.g. create slot_a, create slot_b, create slot_c, drop slot_b, leaving
you with index 0 and 2 used, and 1 unused).
Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?
That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).
Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?
Greetings,
Andres Freund
On Tue, Mar 23, 2021 at 10:54 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:
Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).
Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.
Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?
Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.
I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?
--
With Regards,
Amit Kapila.
On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 23, 2021 at 10:54 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:
Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.
Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.
I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?
I don't see any useful information in a slot for sanity checking.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?I don't see any useful information in a slot for sanity checking.
In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?
--
With Regards,
Amit Kapila.
Hi,
On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:
On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?I don't see any useful information in a slot for sanity checking.
In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?
I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.
Greetings,
Andres Freund
On Fri, Mar 26, 2021 at 1:17 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:
On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?I don't see any useful information in a slot for sanity checking.
In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.
Okay, that makes sense.
--
With Regards,
Amit Kapila.
Hi,
On 2021-03-26 07:58:58 +0530, Amit Kapila wrote:
On Fri, Mar 26, 2021 at 1:17 AM Andres Freund <andres@anarazel.de> wrote:
I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.Okay, that makes sense.
Any chance you could write a tap test exercising a few of these cases?
E.g. things like:
- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up
IMO, independent of the shutdown / startup issue, it'd be worth writing
a patch tracking the bytes sent independently of the slot stats storage
issues. That would also make the testing for the above cheaper...
Greetings,
Andres Freund