Replication slot stats misgivings

Started by Andres Freundalmost 5 years ago201 messages
#1Andres Freund
andres@anarazel.de

Hi,

I started to write this as a reply to
/messages/by-id/20210318015105.dcfa4ceybdjubf2i@alap3.anarazel.de
but I think it doesn't really fit under that header anymore.

On 2021-03-17 18:51:05 -0700, Andres Freund wrote:

It does make it easier for the shared memory stats patch, because if
there's a fixed number + location, the relevant stats reporting doesn't
need to go through a hashtable with the associated locking. I guess
that may have colored my perception that it's better to just have a
statically sized memory allocation for this. Noteworthy that SLRU stats
are done in a fixed size allocation as well...

As part of reviewing the replication slot stats patch I looked at
replication slot stats a fair bit, and I've a few misgivings. First,
about the pgstat.c side of things:

- If somehow slot stat drop messages got lost (remember pgstat
communication is lossy!), we'll just stop maintaining stats for slots
created later, because there'll eventually be no space for keeping
stats for another slot.

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.

- We do a linear search through all replication slots whenever receiving
stats for a slot. Even though there'd be a perfectly good index to
just use all throughout - the slots index itself. It looks to me like
slots stat reports can be fairly frequent in some workloads, so that
doesn't seem great.

- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

- pgstat_report_replslot() already has a lot of stats parameters, it
seems likely that we'll get more. Seems like we should just use a
struct of stats updates.

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.
- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a

I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.

Greetings,

Andres Freund

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#1)
Re: Replication slot stats misgivings

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

Won't spill_bytes and stream_bytes will give you the amount of data sent?

I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that. What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a

Here, your last sentence seems to be incomplete.

I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.

But how will you detect in your idea that some of the stats from the
already dropped slot?

I'll create an entry for this in PG14 Open items wiki.

--
With Regards,
Amit Kapila.

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#2)
Re: Replication slot stats misgivings

On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

Won't spill_bytes and stream_bytes will give you the amount of data sent?

I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.

What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.

What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.

--
With Regards,
Amit Kapila.

#4Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#2)
Re: Replication slot stats misgivings

Hi,

On 2021-03-20 09:25:40 +0530, Amit Kapila wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

It seems like the obvious answer here is to sync stats when releasing
the slot?

- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

Won't spill_bytes and stream_bytes will give you the amount of data sent?

I don't think either tracks changes that were neither spilled nor
streamed? And if they are, they're terribly misnamed?

I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot?

I think it's pretty easy to make that bulletproof. Add a
pgstat_report_replslot_create(), and use that in
ReplicationSlotCreate(). That is called with
ReplicationSlotAllocationLock held, so it can just safely zero out stats.

I don't think:

It has mixed of stats from the old and new slot.

Can happen in that scenario.

It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a

Here, your last sentence seems to be incomplete.

Oops, I was planning to suggest adding pgstat_report_replslot_create()
that zeroes out the pre-existing stats (or a parameter to
pgstat_report_replslot(), but I don't think that's better).

I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.

But how will you detect in your idea that some of the stats from the
already dropped slot?

I don't think that is possible with my sketch?

Greetings,

Andres Freund

#5Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#3)
Re: Replication slot stats misgivings

Hi,

On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:

On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.

What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.

What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.

pgstat_report_replslot_drop() already prevents that, no?

Greetings,

Andres Freund

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#5)
Re: Replication slot stats misgivings

On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:

On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.

What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.

What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.

pgstat_report_replslot_drop() already prevents that, no?

Yeah, normally it would prevent that but what if a drop message is lost?

--
With Regards,
Amit Kapila.

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#4)
Re: Replication slot stats misgivings

On Sun, Mar 21, 2021 at 2:56 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-03-20 09:25:40 +0530, Amit Kapila wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

It seems like the obvious answer here is to sync stats when releasing
the slot?

Okay, that makes sense.

- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

Won't spill_bytes and stream_bytes will give you the amount of data sent?

I don't think either tracks changes that were neither spilled nor
streamed? And if they are, they're terribly misnamed?

Right, it won't track such changes but we can track that as well and I
understand it will be good to track that information. I think we were
too focused on stats for newly introduced features that we forget
about the non-spilled and non-streamed xacts.

Note - I have now created an entry for this in PG14 Open Items [1]https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items.

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

--
With Regards,
Amit Kapila.

#8Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#6)
Re: Replication slot stats misgivings

Hi,

On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:

On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:

On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.

What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.

What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.

pgstat_report_replslot_drop() already prevents that, no?

Yeah, normally it would prevent that but what if a drop message is lost?

That already exists as a danger, no? pgstat_recv_replslot() uses
pgstat_replslot_index() to find the slot by name. So if a drop message
is lost we'd potentially accumulate into stats of an older slot. It'd
probably a lower risk with what I suggested, because the initial stat
report slot.c would use something like pgstat_report_replslot_create(),
which the stats collector can use to reset the stats to 0?

If we do it right the lossiness will be removed via shared memory stats
patch... But architecturally the name based lookup and unpredictable
number of stats doesn't fit in super well.

Greetings,

Andres Freund

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#8)
Re: Replication slot stats misgivings

On Mon, Mar 22, 2021 at 3:10 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:

On Sun, Mar 21, 2021 at 2:57 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:

On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

This idea is worth exploring to address the complaints but what do we
do when we detect that the stats are from the different slot? It has
mixed of stats from the old and new slot. We need to probably reset it
after we detect that.

What if the user created a slot with the same name after dropping the
slot and it has used the same index. I think chances are less but
still a possibility, but maybe that is okay.

What if after some frequency (say whenever we
run out of indexes) we check whether the slots we are maintaining is
pgstat.c have some stale slot entry (entry exists but the actual slot
is dropped)?

A similar drawback (the user created a slot with the same name after
dropping it) exists with this as well.

pgstat_report_replslot_drop() already prevents that, no?

Yeah, normally it would prevent that but what if a drop message is lost?

That already exists as a danger, no? pgstat_recv_replslot() uses
pgstat_replslot_index() to find the slot by name. So if a drop message
is lost we'd potentially accumulate into stats of an older slot. It'd
probably a lower risk with what I suggested, because the initial stat
report slot.c would use something like pgstat_report_replslot_create(),
which the stats collector can use to reset the stats to 0?

okay, but I guess if we miss the create message as well then we will
have a similar danger. I think the benefit your idea will bring is to
use index-based lookup instead of name-based lookup. IIRC, we have
initially used the name here because we thought there is nothing like
OID for slots but your suggestion of using
ReplicationSlotCtl->replication_slots can address that.

If we do it right the lossiness will be removed via shared memory stats
patch...

Okay.

--
With Regards,
Amit Kapila.

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#1)
Re: Replication slot stats misgivings

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?

- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.

Agreed.

- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

That's because we followed other definitions in pgstat.h that use
char[NAMEDATALEN]. I'm okay with using NameData.

- pgstat_report_replslot() already has a lot of stats parameters, it
seems likely that we'll get more. Seems like we should just use a
struct of stats updates.

Agreed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#10)
Re: Replication slot stats misgivings

On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?

This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.

I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returned
entries in the directory). Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#11)
Re: Replication slot stats misgivings

On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?

This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.

I think this can happen only if the drop message is lost, right?

I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returned
entries in the directory). Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?

I wonder how in this scheme, we will remove the risk of running out of
'replSlotStats' and still restore correct stats assuming the drop
message is lost? Do we want to check after restoring each slot info
whether the slot with that name exists?

--
With Regards,
Amit Kapila.

#13Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#12)
Re: Replication slot stats misgivings

On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?

This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.

I think this can happen only if the drop message is lost, right?

Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.

I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returned
entries in the directory). Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?

I wonder how in this scheme, we will remove the risk of running out of
'replSlotStats' and still restore correct stats assuming the drop
message is lost? Do we want to check after restoring each slot info
whether the slot with that name exists?

Yeah, I think we need such a check at least if the number of slot
stats in the stats file is larger than max_replication_slots. Or we
can do that at every startup to remove orphaned slot stats.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#14Andres Freund
andres@anarazel.de
In reply to: Masahiko Sawada (#13)
Re: Replication slot stats misgivings

Hi,

On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:

On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?

This problem happens in the case where max_replication_slots is
lowered and there still are stats for a slot.

I think this can happen only if the drop message is lost, right?

Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.

Or if slots are deleted on the file-system while the cluster is
shutdown. Which obviously is at best a semi-supported thing, but it
normally does work.

I understood the risk of running out of replSlotStats. If we use the
index in replSlotStats instead, IIUC we need to somehow synchronize
the indexes in between replSlotStats and
ReplicationSlotCtl->replication_slots. The order of replSlotStats is
preserved across restarting whereas the order of
ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
StartupReplicationSlots() doesn’t guarantee the order of the returneda
entries in the directory).

Very good point. Even if readdir() order were fixed, we'd still have the
problem because there can be "gaps" in the indexes for slots
(e.g. create slot_a, create slot_b, create slot_c, drop slot_b, leaving
you with index 0 and 2 used, and 1 unused).

Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?

That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Greetings,

Andres Freund

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#14)
Re: Replication slot stats misgivings

On Tue, Mar 23, 2021 at 10:54 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:

Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?

That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).

Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.

I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?

--
With Regards,
Amit Kapila.

#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#15)
Re: Replication slot stats misgivings

On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 23, 2021 at 10:54 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-03-23 23:37:14 +0900, Masahiko Sawada wrote:

Maybe we can compare the slot name in the
received message to the name in the element of replSlotStats. If they
don’t match, we swap entries in replSlotStats to synchronize the index
of the replication slot in ReplicationSlotCtl->replication_slots and
replSlotStats. If we cannot find the entry in replSlotStats that has
the name in the received message, it probably means either it's a new
slot or the previous create message is dropped, we can create the new
stats for the slot. Is that what you mean, Andres?

That doesn't seem great. Slot names are imo a poor identifier for
something happening asynchronously. The stats collector regularly
doesn't process incoming messages for periods of time because it is busy
writing out the stats file. That's also when messages to it are most
likely to be dropped (likely because the incoming buffer is full).

Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.

Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.

I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?

I don't see any useful information in a slot for sanity checking.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#16)
Re: Replication slot stats misgivings

On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.

Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.

I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?

I don't see any useful information in a slot for sanity checking.

In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?

--
With Regards,
Amit Kapila.

#18Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#17)
Re: Replication slot stats misgivings

Hi,

On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:

On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.

Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.

I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?

I don't see any useful information in a slot for sanity checking.

In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?

I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.

Greetings,

Andres Freund

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#18)
Re: Replication slot stats misgivings

On Fri, Mar 26, 2021 at 1:17 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:

On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Leaving aside restart case, without some sort of such sanity checking,
if both drop (of old slot) and create (of new slot) messages are lost
then we will start accumulating stats in old slots. However, if only
one of them is lost then there won't be any such problem.

Perhaps we could have RestoreSlotFromDisk() send something to the stats
collector ensuring the mapping makes sense?

Say if we send just the index location of each slot then probably we
can setup replSlotStats. Now say before the restart if one of the drop
messages was missed (by stats collector) and that happens to be at
some middle location, then we would end up restoring some already
dropped slot, leaving some of the still required ones. However, if
there is some sanity identifier like name along with the index, then I
think that would have worked for such a case.

Even such messages could also be lost? Given that any message could be
lost under a UDP connection, I think we cannot rely on a single
message. Instead, I think we need to loosely synchronize the indexes
while assuming the indexes in replSlotStats and
ReplicationSlotCtl->replication_slots are not synchronized.

I think it would have been easier if we would have some OID type of
identifier for each slot. But, without that may be index location of
ReplicationSlotCtl->replication_slots and slotname combination can
reduce the chances of slot stats go wrong quite less even if not zero.
If not name, do we have anything else in a slot that can be used for
some sort of sanity checking?

I don't see any useful information in a slot for sanity checking.

In that case, can we do a hard check for which slots exist if
replSlotStats runs out of space (that can probably happen only after
restart and when we lost some drop messages)?

I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.

Okay, that makes sense.

--
With Regards,
Amit Kapila.

#20Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#19)
Re: Replication slot stats misgivings

Hi,

On 2021-03-26 07:58:58 +0530, Amit Kapila wrote:

On Fri, Mar 26, 2021 at 1:17 AM Andres Freund <andres@anarazel.de> wrote:

I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.

Okay, that makes sense.

Any chance you could write a tap test exercising a few of these cases?
E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

IMO, independent of the shutdown / startup issue, it'd be worth writing
a patch tracking the bytes sent independently of the slot stats storage
issues. That would also make the testing for the above cheaper...

Greetings,

Andres Freund

#21vignesh C
vignesh21@gmail.com
In reply to: Andres Freund (#20)
Re: Replication slot stats misgivings

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-26 07:58:58 +0530, Amit Kapila wrote:

On Fri, Mar 26, 2021 at 1:17 AM Andres Freund <andres@anarazel.de> wrote:

I suggest we wait doing anything about this until we know if the shared
stats patch gets in or not (I'd give it 50% maybe). If it does get in
things get a good bit easier, because we don't have to deal with the
message loss issues anymore.

Okay, that makes sense.

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

IMO, independent of the shutdown / startup issue, it'd be worth writing
a patch tracking the bytes sent independently of the slot stats storage
issues. That would also make the testing for the above cheaper...

I can try to write a patch for this if nobody objects.

Regards,
Vignesh

#22Andres Freund
andres@anarazel.de
In reply to: vignesh C (#21)
Re: Replication slot stats misgivings

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

Greetings,

Andres Freund

#23vignesh C
vignesh21@gmail.com
In reply to: Andres Freund (#22)
1 attachment(s)
Re: Replication slot stats misgivings

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

Regards,
Vignesh

Attachments:

v1-0001-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Added-tests-for-verification-of-logical-replicati.patchDownload
From c3aded40e0516dbd42d52c72c5b95ec45f991923 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 31 Mar 2021 11:02:40 +0530
Subject: [PATCH v1] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 src/test/subscription/t/022_repl_stats.pl | 161 ++++++++++++++++++++++
 1 file changed, 161 insertions(+)
 create mode 100644 src/test/subscription/t/022_repl_stats.pl

diff --git a/src/test/subscription/t/022_repl_stats.pl b/src/test/subscription/t/022_repl_stats.pl
new file mode 100644
index 0000000000..7755a447c0
--- /dev/null
+++ b/src/test/subscription/t/022_repl_stats.pl
@@ -0,0 +1,161 @@
+# Test replication statistics data gets updated in pg_stat_replication_slots
+# view for logical replication.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 5;
+
+# Create publisher node.
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', 'logical_decoding_work_mem = 64kB');
+$node_publisher->start;
+
+# Create subscriber nodes.
+my $node_subscriber1 = get_new_node('subscriber1');
+$node_subscriber1->init(allows_streaming => 'logical');
+$node_subscriber1->start;
+
+my $node_subscriber2 = get_new_node('subscriber2');
+$node_subscriber2->init(allows_streaming => 'logical');
+$node_subscriber2->start;
+
+my $node_subscriber3 = get_new_node('subscriber3');
+$node_subscriber3->init(allows_streaming => 'logical');
+$node_subscriber3->start;
+
+# Create table on publisher.
+$node_publisher->safe_psql('postgres',
+        "CREATE TABLE test_tab (a int primary key, b varchar)");
+
+# Create table on subscribers.
+$node_subscriber1->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber2->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber3->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication.
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname1 = 'tap_sub1';
+$node_subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub1 
+	CONNECTION '$publisher_connstr 
+	application_name=$appname1' 
+	PUBLICATION tap_pub WITH (streaming = on)"
+);
+
+my $appname2 = 'tap_sub2';
+$node_subscriber2->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub2
+	CONNECTION '$publisher_connstr
+	application_name=$appname2'
+	PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+my $appname3 = 'tap_sub3';
+$node_subscriber3->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub3
+	CONNECTION '$publisher_connstr
+	application_name=$appname3'
+	PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+$node_publisher->wait_for_catchup($appname1);
+$node_publisher->wait_for_catchup($appname2);
+$node_publisher->wait_for_catchup($appname3);
+
+# Interleave a pair of transactions, each exceeding the 64kB limit.
+my $in  = '';
+my $out = '';
+
+my $timer = IPC::Run::timeout(180);
+
+my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
+        on_error_stop => 0);
+
+$in .= q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+DELETE FROM test_tab WHERE mod(a,3) = 0;
+};
+$h->pump_nb;
+
+$node_publisher->safe_psql(
+        'postgres', q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(5001, 9999) s(i);
+DELETE FROM test_tab WHERE a > 5000;
+COMMIT;
+});
+
+$in .= q{
+COMMIT;
+\q
+};
+$h->finish;    # errors make the next test fail, so ignore them here
+
+$node_publisher->wait_for_catchup($appname1);
+$node_publisher->wait_for_catchup($appname2);
+$node_publisher->wait_for_catchup($appname3);
+
+# Verify data is replicated to the subscribers.
+my $result =
+  $node_subscriber1->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+$result =
+  $node_subscriber2->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+$result =
+  $node_subscriber3->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+$result = $node_publisher->safe_psql('postgres', 
+	"SELECT slot_name,
+		spill_txns > 0 AS spill_txns,
+		spill_count > 0 AS spill_count,
+		spill_bytes > 0 AS spill_bytes,
+		stream_txns > 0 AS stream_txns,
+		stream_count > 0 AS stream_count,
+		stream_bytes > 0 AS stream_bytes,
+		stats_reset
+	FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|
+tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
+
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node_subscriber3->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub3;");
+
+$node_publisher->stop;
+$node_publisher->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node_publisher->safe_psql('postgres',
+        "SELECT slot_name,
+                spill_txns > 0 AS spill_txns,
+                spill_count > 0 AS spill_count,
+                spill_bytes > 0 AS spill_bytes,
+                stream_txns > 0 AS stream_txns,
+                stream_count > 0 AS stream_count,
+                stream_bytes > 0 AS stream_bytes,
+                stats_reset
+        FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|), 'check replication statistics are updated');
+
+# shutdown
+$node_subscriber1->stop;
+$node_subscriber2->stop;
+$node_subscriber3->stop;
+$node_publisher->stop;
-- 
2.25.1

#24vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#23)
Re: Replication slot stats misgivings

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

Regards,
Vignesh

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#24)
Re: Replication slot stats misgivings

On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

--
With Regards,
Amit Kapila.

#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#20)
Re: Replication slot stats misgivings

On Tue, Mar 30, 2021 at 9:58 AM Andres Freund <andres@anarazel.de> wrote:

IMO, independent of the shutdown / startup issue, it'd be worth writing
a patch tracking the bytes sent independently of the slot stats storage
issues. That would also make the testing for the above cheaper...

Agreed.

I think the bytes sent should be recorded by the decoding plugin, not
by the core side. Given that table filtering and row filtering,
tracking the bytes passed to the decoding plugin would not help gauge
the actual network I/O. In that sense, the description of stream_bytes
in the doc seems not accurate:

---
This and other streaming counters for this slot can be used to gauge
the network I/O which occurred during logical decoding and allow
tuning logical_decoding_work_mem.
---

It can surely be used to allow tuning logical_decoding_work_mem but it
could not be true for gauging the network I/O which occurred during
logical decoding.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#27vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#25)
4 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

Thanks for your comments, Attached patch has the fix for the same.
Also attached a couple of more patches which addresses the comments
which Andres had listed i.e changing char to NameData type and also to
display the unspilled/unstreamed transaction information in the
replication statistics.
Thoughts?

Regards,
Vignesh

Attachments:

v1-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchDownload
From 53d58eeca5335485aba96b57bea68b5807c7bcc4 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 21:15:47 +0530
Subject: [PATCH v1 1/4] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c     | 15 ++++++++-------
 src/backend/utils/adt/pgstatfuncs.c |  2 +-
 src/include/pgstat.h                |  6 +++---
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4b9bcd2b41..132fdef78c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1555,7 +1556,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1782,7 +1783,7 @@ pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = false;
 	msg.m_spill_txns = spilltxns;
 	msg.m_spill_count = spillcount;
@@ -1805,7 +1806,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -6758,7 +6759,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(msg->m_slotname.data, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -7056,7 +7057,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(msg->m_slotname.data, !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -7325,7 +7326,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -7338,7 +7339,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5102227a60..2ce8e1fd2d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2323,7 +2323,7 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d699502cd9..10784f947e 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -379,7 +379,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -507,7 +507,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -872,7 +872,7 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
-- 
2.25.1

v1-0002-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Added-tests-for-verification-of-logical-replicati.patchDownload
From 2a0e168194fa446600e84301f94d8ee2b4b39221 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 19:47:01 +0530
Subject: [PATCH v1 2/4] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 src/test/subscription/t/020_repl_stats.pl | 161 ++++++++++++++++++++++
 1 file changed, 161 insertions(+)
 create mode 100644 src/test/subscription/t/020_repl_stats.pl

diff --git a/src/test/subscription/t/020_repl_stats.pl b/src/test/subscription/t/020_repl_stats.pl
new file mode 100644
index 0000000000..4be58d417a
--- /dev/null
+++ b/src/test/subscription/t/020_repl_stats.pl
@@ -0,0 +1,161 @@
+# Test replication statistics data gets updated in pg_stat_replication_slots
+# view for logical replication.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 5;
+
+# Create publisher node.
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', 'logical_decoding_work_mem = 64kB');
+$node_publisher->start;
+
+# Create subscriber nodes.
+my $node_subscriber1 = get_new_node('subscriber1');
+$node_subscriber1->init(allows_streaming => 'logical');
+$node_subscriber1->start;
+
+my $node_subscriber2 = get_new_node('subscriber2');
+$node_subscriber2->init(allows_streaming => 'logical');
+$node_subscriber2->start;
+
+my $node_subscriber3 = get_new_node('subscriber3');
+$node_subscriber3->init(allows_streaming => 'logical');
+$node_subscriber3->start;
+
+# Create table on publisher.
+$node_publisher->safe_psql('postgres',
+        "CREATE TABLE test_tab (a int primary key, b varchar)");
+
+# Create table on subscribers.
+$node_subscriber1->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber2->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber3->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication.
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname1 = 'tap_sub1';
+$node_subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub1
+	CONNECTION '$publisher_connstr
+	application_name=$appname1'
+	PUBLICATION tap_pub WITH (streaming = on)"
+);
+
+my $appname2 = 'tap_sub2';
+$node_subscriber2->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub2
+	CONNECTION '$publisher_connstr
+	application_name=$appname2'
+	PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+my $appname3 = 'tap_sub3';
+$node_subscriber3->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub3
+	CONNECTION '$publisher_connstr
+	application_name=$appname3'
+	PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+$node_publisher->wait_for_catchup($appname1);
+$node_publisher->wait_for_catchup($appname2);
+$node_publisher->wait_for_catchup($appname3);
+
+# Interleave a pair of transactions, each exceeding the 64kB limit.
+my $in  = '';
+my $out = '';
+
+my $timer = IPC::Run::timeout(180);
+
+my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
+        on_error_stop => 0);
+
+$in .= q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+DELETE FROM test_tab WHERE mod(a,3) = 0;
+};
+$h->pump_nb;
+
+$node_publisher->safe_psql(
+        'postgres', q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(5001, 9999) s(i);
+DELETE FROM test_tab WHERE a > 5000;
+COMMIT;
+});
+
+$in .= q{
+COMMIT;
+\q
+};
+$h->finish;    # errors make the next test fail, so ignore them here
+
+$node_publisher->wait_for_catchup($appname1);
+$node_publisher->wait_for_catchup($appname2);
+$node_publisher->wait_for_catchup($appname3);
+
+# Verify data is replicated to the subscribers.
+my $result =
+  $node_subscriber1->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+$result =
+  $node_subscriber2->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+$result =
+  $node_subscriber3->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+$result = $node_publisher->safe_psql('postgres',
+	"SELECT slot_name,
+		spill_txns > 0 AS spill_txns,
+		spill_count > 0 AS spill_count,
+		spill_bytes > 0 AS spill_bytes,
+		stream_txns > 0 AS stream_txns,
+		stream_count > 0 AS stream_count,
+		stream_bytes > 0 AS stream_bytes,
+		stats_reset
+	FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|
+tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
+
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node_subscriber3->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub3;");
+
+$node_publisher->stop;
+$node_publisher->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node_publisher->safe_psql('postgres',
+        "SELECT slot_name,
+                spill_txns > 0 AS spill_txns,
+                spill_count > 0 AS spill_count,
+                spill_bytes > 0 AS spill_bytes,
+                stream_txns > 0 AS stream_txns,
+                stream_count > 0 AS stream_count,
+                stream_bytes > 0 AS stream_bytes,
+                stats_reset
+        FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|), 'check replication statistics are updated');
+
+# shutdown
+$node_subscriber1->stop;
+$node_subscriber2->stop;
+$node_subscriber3->stop;
+$node_publisher->stop;
-- 
2.25.1

v1-0003-Add-total-txns-and-total-txn-bytes-for-replicatio.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Add-total-txns-and-total-txn-bytes-for-replicatio.patchDownload
From 93c65c7e942e60c1b4f8363cb2fc521c49e47ee7 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 11:02:54 +0530
Subject: [PATCH v1 3/4] Add total txns and total txn bytes for replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 doc/src/sgml/monitoring.sgml                  | 24 +++++++++++++++++++
 src/backend/catalog/system_views.sql          |  2 ++
 src/backend/postmaster/pgstat.c               |  9 ++++++-
 src/backend/replication/logical/logical.c     | 16 ++++++++-----
 .../replication/logical/reorderbuffer.c       |  5 ++++
 src/backend/replication/slot.c                |  2 +-
 src/backend/utils/adt/pgstatfuncs.c           |  8 ++++---
 src/include/catalog/pg_proc.dat               |  6 ++---
 src/include/pgstat.h                          |  7 +++++-
 src/include/replication/reorderbuffer.h       |  4 ++++
 src/test/regress/expected/rules.out           |  4 +++-
 11 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index af540fb02f..d816c0525a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2689,6 +2689,30 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of in-progress transactions replicated to the decoding output
+        plugin.  This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded in-progress transaction data replicated to the decoding
+        output plugin while decoding changes from WAL for this slot. This and other
+        counters for this slot can be used to gauge the network I/O which occurred
+        during logical decoding and allow tuning <literal>logical_decoding_work_mem</literal>.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..e3991eb6f6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -874,6 +874,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 132fdef78c..b497fbed37 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1775,7 +1775,8 @@ pgstat_report_tempfile(size_t filesize)
  */
 void
 pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-					   int spillbytes, int streamtxns, int streamcount, int streambytes)
+					   int spillbytes, int streamtxns, int streamcount,
+					   int streambytes, int totaltxns, int totalbytes)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1791,6 +1792,8 @@ pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
 	msg.m_stream_txns = streamtxns;
 	msg.m_stream_count = streamcount;
 	msg.m_stream_bytes = streambytes;
+	msg.m_total_txns = totaltxns;
+	msg.m_total_bytes = totalbytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -7089,6 +7092,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -7360,6 +7365,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 2f6803637b..3a178bfc3d 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1765,28 +1765,32 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	ReorderBuffer *rb = ctx->reorder;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	pgstat_report_replslot(NameStr(ctx->slot->data.name),
 						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+						   rb->streamTxns, rb->streamCount, rb->streamBytes,
+						   rb->totalTxns, rb->totalBytes);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..5bc0b05a0e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
 			dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
 			AssertTXNLsnOrder(rb);
 		}
+
+		rb->totalTxns++;
 	}
 	else
 		txn = NULL;				/* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 	{
 		txn->size += sz;
 		rb->size += sz;
+		rb->totalBytes += sz;
 
 		/* Update the total size in the top transaction. */
 		if (toptxn)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..7dba2298b4 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0, 0, 0);
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2ce8e1fd2d..c229b37abd 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2279,7 +2279,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2330,11 +2330,13 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..8b2071c77b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5296,9 +5296,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 10784f947e..46445c3cee 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -515,6 +515,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 
@@ -879,6 +881,8 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
@@ -1451,7 +1455,8 @@ extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
 extern void pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-								   int spillbytes, int streamtxns, int streamcount, int streambytes);
+								   int spillbytes, int streamtxns, int streamcount,
+								   int streambytes, int totaltxns, int totalbytes);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..f0eea8b18f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2056,8 +2056,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v1-0004-Handle-overwriting-of-replication-slot-statistic-.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Handle-overwriting-of-replication-slot-statistic-.patchDownload
From 1924142ec8194dc661fd1f230e77bed6d3f9d201 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 20:50:10 +0530
Subject: [PATCH v1 4/4] Handle overwriting of replication slot statistic
 issue.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the publisher is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, fixed it by skipping the
replication slot statistic that are after max_replication_slot.
---
 src/backend/postmaster/pgstat.c           | 24 +++++++++-
 src/test/subscription/t/020_repl_stats.pl | 57 +++++++++++++++++++++--
 2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b497fbed37..c65f57889e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -5699,6 +5699,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	for (;;)
 	{
+		PgStat_ReplSlotStats replSlotStat;
 		switch (fgetc(fpin))
 		{
 				/*
@@ -5787,15 +5788,34 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
+				if (fread(&replSlotStat, 1, sizeof(PgStat_ReplSlotStats), fpin)
 					!= sizeof(PgStat_ReplSlotStats))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
+
+				/*
+				 * There is a remote scenario where one of the replication slots
+				 * is dropped and the drop slot statistics message is not
+				 * received by the statistic collector process, now if the
+				 * max_replication_slots is reduced to the actual number of
+				 * replication slots that are in use and the publisher is
+				 * re-started then the statistics process will not be aware of
+				 * this, to avoid writing beyond the max_replication_slots
+				 * these replication slot statistic information will be skipped.
+				 */
+				if (max_replication_slots == nReplSlotStats)
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("skipping \"%s\" replication slot's statistic as the statistic collector process does not have enough statistic slots",
+									replSlotStat.slotname)));
+					goto done;
+				}
+
+				memcpy(&replSlotStats[nReplSlotStats], &replSlotStat, sizeof(PgStat_ReplSlotStats));
 				nReplSlotStats++;
 				break;
 
diff --git a/src/test/subscription/t/020_repl_stats.pl b/src/test/subscription/t/020_repl_stats.pl
index 4be58d417a..b0674f2d41 100644
--- a/src/test/subscription/t/020_repl_stats.pl
+++ b/src/test/subscription/t/020_repl_stats.pl
@@ -2,9 +2,10 @@
 # view for logical replication.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 7;
 
 # Create publisher node.
 my $node_publisher = get_new_node('publisher');
@@ -25,6 +26,10 @@ my $node_subscriber3 = get_new_node('subscriber3');
 $node_subscriber3->init(allows_streaming => 'logical');
 $node_subscriber3->start;
 
+my $node_subscriber4 = get_new_node('subscriber4');
+$node_subscriber4->init(allows_streaming => 'logical');
+$node_subscriber4->start;
+
 # Create table on publisher.
 $node_publisher->safe_psql('postgres',
         "CREATE TABLE test_tab (a int primary key, b varchar)");
@@ -33,6 +38,7 @@ $node_publisher->safe_psql('postgres',
 $node_subscriber1->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
 $node_subscriber2->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
 $node_subscriber3->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber4->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
 
 # Setup logical replication.
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
@@ -62,9 +68,18 @@ $node_subscriber3->safe_psql('postgres',
 	PUBLICATION tap_pub WITH (streaming = off)"
 );
 
+my $appname4 = 'tap_sub4';
+$node_subscriber4->safe_psql('postgres',
+        "CREATE SUBSCRIPTION tap_sub4
+        CONNECTION '$publisher_connstr
+        application_name=$appname4'
+        PUBLICATION tap_pub WITH (streaming = off)"
+);
+
 $node_publisher->wait_for_catchup($appname1);
 $node_publisher->wait_for_catchup($appname2);
 $node_publisher->wait_for_catchup($appname3);
+$node_publisher->wait_for_catchup($appname4);
 
 # Interleave a pair of transactions, each exceeding the 64kB limit.
 my $in  = '';
@@ -100,6 +115,7 @@ $h->finish;    # errors make the next test fail, so ignore them here
 $node_publisher->wait_for_catchup($appname1);
 $node_publisher->wait_for_catchup($appname2);
 $node_publisher->wait_for_catchup($appname3);
+$node_publisher->wait_for_catchup($appname4);
 
 # Verify data is replicated to the subscribers.
 my $result =
@@ -114,6 +130,10 @@ $result =
   $node_subscriber3->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
 is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
 
+$result =
+  $node_subscriber4->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
 # Test to verify replication statistics data is updated in
 # pg_stat_replication_slots statistics view.
 $result = $node_publisher->safe_psql('postgres',
@@ -129,13 +149,43 @@ $result = $node_publisher->safe_psql('postgres',
 );
 is($result, qq(tap_sub1|f|f|f|t|t|t|
 tap_sub2|t|t|t|f|f|f|
-tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
+tap_sub3|t|t|t|f|f|f|
+tap_sub4|t|t|t|f|f|f|), 'check replication statistics are updated');
 
 # Test to drop one of the subscribers and verify replication statistics data is
 # fine after publisher is restarted.
-$node_subscriber3->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub3;");
+$node_subscriber4->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub4;");
+
+$node_publisher->stop;
+$node_publisher->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node_publisher->safe_psql('postgres',
+        "SELECT slot_name,
+                spill_txns > 0 AS spill_txns,
+                spill_count > 0 AS spill_count,
+                spill_bytes > 0 AS spill_bytes,
+                stream_txns > 0 AS stream_txns,
+                stream_count > 0 AS stream_count,
+                stream_bytes > 0 AS stream_bytes,
+                stats_reset
+        FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|
+tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after publisher is restarted.
 $node_publisher->stop;
+my $publisher_data = $node_publisher->data_dir;
+my $subscriber3_replslotdir = "$publisher_data/pg_replslot/tap_sub3";
+
+rmtree($subscriber3_replslotdir);
+
+$node_publisher->append_conf('postgresql.conf', 'max_replication_slots = 2');
 $node_publisher->start;
 
 # Verify statistics data present in pg_stat_replication_slots are sane after
@@ -158,4 +208,5 @@ tap_sub2|t|t|t|f|f|f|), 'check replication statistics are updated');
 $node_subscriber1->stop;
 $node_subscriber2->stop;
 $node_subscriber3->stop;
+$node_subscriber4->stop;
 $node_publisher->stop;
-- 
2.25.1

#28Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#27)
Re: Replication slot stats misgivings

On Fri, Apr 2, 2021 at 1:55 AM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

Thanks for your comments, Attached patch has the fix for the same.
Also attached a couple of more patches which addresses the comments
which Andres had listed i.e changing char to NameData type and also to
display the unspilled/unstreamed transaction information in the
replication statistics.
Thoughts?

Thank you for the patches!

I've looked at those patches and here are some comments on 0001, 0002,
and 0003 patch:

0001 patch:

-       values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+       values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));

We can use NameGetDatum() instead.

---
0002 patch:

The patch uses logical replication to test replication slots
statistics but I think it's necessarily necessary. It would be more
simple to use logical decoding. Maybe we can add TAP tests to
contrib/test_decoding.

---
0003 patch:

 void
 pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-                      int spillbytes, int streamtxns, int
streamcount, int streambytes)
+                      int spillbytes, int streamtxns, int streamcount,
+                      int streambytes, int totaltxns, int totalbytes)
 {

As Andreas pointed out, we should use a struct of stats updates rather
than adding more arguments to pgstat_report_replslot().

---
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded in-progress transaction data replicated to
the decoding
+        output plugin while decoding changes from WAL for this slot.
This and other
+        counters for this slot can be used to gauge the network I/O
which occurred
+        during logical decoding and allow tuning
<literal>logical_decoding_work_mem</literal>.
+       </para>
+      </entry>
+     </row>

As I mentioned in another reply, I think users should not gauge the
network I/O which occurred during logical decoding using by those
counters since the actual amount of network I/O is affected by table
filtering and row filtering discussed on another thread[1]/messages/by-id/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com. Also,
since this is total bytes I'm not sure how users can use this value to
tune logical_decoding_work_mem. I agree to track both the total bytes
and the total number of transactions passed to the decoding plugin but
I think the description needs to be updated. How about the following
description for example?

Amount of decoded transaction data sent to the decoding output plugin
while decoding changes from WAL for this slot. This and total_txn for
this slot can be used to gauge the total amount of data during logical
decoding.

---
I think we can merge 0001 and 0003 patches.

[1]: /messages/by-id/CAHE3wggb715X+mK_DitLXF25B=jE6xyNCH4YOwM860JR7HarGQ@mail.gmail.com

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#29vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#28)
Re: Replication slot stats misgivings

On Fri, Apr 2, 2021 at 9:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 2, 2021 at 1:55 AM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

Thanks for your comments, Attached patch has the fix for the same.
Also attached a couple of more patches which addresses the comments
which Andres had listed i.e changing char to NameData type and also to
display the unspilled/unstreamed transaction information in the
replication statistics.
Thoughts?

Thank you for the patches!

I've looked at those patches and here are some comments on 0001, 0002,
and 0003 patch:

0001 patch:

-       values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+       values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));

We can use NameGetDatum() instead.

---
0002 patch:

The patch uses logical replication to test replication slots
statistics but I think it's necessarily necessary. It would be more
simple to use logical decoding. Maybe we can add TAP tests to
contrib/test_decoding.

---
0003 patch:

void
pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-                      int spillbytes, int streamtxns, int
streamcount, int streambytes)
+                      int spillbytes, int streamtxns, int streamcount,
+                      int streambytes, int totaltxns, int totalbytes)
{

As Andreas pointed out, we should use a struct of stats updates rather
than adding more arguments to pgstat_report_replslot().

---
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded in-progress transaction data replicated to
the decoding
+        output plugin while decoding changes from WAL for this slot.
This and other
+        counters for this slot can be used to gauge the network I/O
which occurred
+        during logical decoding and allow tuning
<literal>logical_decoding_work_mem</literal>.
+       </para>
+      </entry>
+     </row>

As I mentioned in another reply, I think users should not gauge the
network I/O which occurred during logical decoding using by those
counters since the actual amount of network I/O is affected by table
filtering and row filtering discussed on another thread[1]. Also,
since this is total bytes I'm not sure how users can use this value to
tune logical_decoding_work_mem. I agree to track both the total bytes
and the total number of transactions passed to the decoding plugin but
I think the description needs to be updated. How about the following
description for example?

Amount of decoded transaction data sent to the decoding output plugin
while decoding changes from WAL for this slot. This and total_txn for
this slot can be used to gauge the total amount of data during logical
decoding.

---
I think we can merge 0001 and 0003 patches.

Thanks for the comments, I will fix the comments and provide a patch
for this soon.

Regards,
Vignesh

#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#26)
Re: Replication slot stats misgivings

On Thu, Apr 1, 2021 at 6:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 30, 2021 at 9:58 AM Andres Freund <andres@anarazel.de> wrote:

IMO, independent of the shutdown / startup issue, it'd be worth writing
a patch tracking the bytes sent independently of the slot stats storage
issues. That would also make the testing for the above cheaper...

Agreed.

I think the bytes sent should be recorded by the decoding plugin, not
by the core side. Given that table filtering and row filtering,
tracking the bytes passed to the decoding plugin would not help gauge
the actual network I/O. In that sense, the description of stream_bytes
in the doc seems not accurate:

---
This and other streaming counters for this slot can be used to gauge
the network I/O which occurred during logical decoding and allow
tuning logical_decoding_work_mem.
---

It can surely be used to allow tuning logical_decoding_work_mem but it
could not be true for gauging the network I/O which occurred during
logical decoding.

Agreed. I think we can adjust the wording accordingly.

--
With Regards,
Amit Kapila.

#31Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: vignesh C (#29)
Re: Replication slot stats misgivings

On Fri, Apr 2, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:

Thanks for the comments, I will fix the comments and provide a patch
for this soon.

Here are some comments:
1) How about something like below
+                            (errmsg("skipping \"%s\" replication slot
statistics as the statistic collector process does not have enough
statistic slots",
instead of
+                            (errmsg("skipping \"%s\" replication
slot's statistic as the statistic collector process does not have
enough statistic slots",

2) Does it mean "pg_statistic slots" when we say "statistic slots" in
the above warning? If yes, why can't we use "pg_statistic slots"
instead of "statistic slots" as with another existing message
"insufficient pg_statistic slots for array stats"?

3) Should we change the if condition to max_replication_slots <=
nReplSlotStats instead of max_replication_slots == nReplSlotStats? In
the scenario, it is mentioned that "one of the replication slots is
dropped", will this issue occur when multiple replication slots are
dropped?

4) Let's end the statement after this and start a new one, something like below
+                 * this. To avoid writing beyond the max_replication_slots
instead of
+                 * this, to avoid writing beyond the max_replication_slots
5) How about something like below
+                 * this. To avoid writing beyond the max_replication_slots,
+                 * this replication slot statistics information will
be skipped.
+                 */
instead of
+                 * this, to avoid writing beyond the max_replication_slots
+                 * these replication slot statistic information will
be skipped.
+                 */
6) Any specific reason to use a new local variable replSlotStat and
later memcpy into replSlotStats[nReplSlotStats]? Instead we could
directly fread into &replSlotStats[nReplSlotStats] and do
memset(&replSlotStats[nReplSlotStats], 0,
sizeof(PgStat_ReplSlotStats)); before the warnings. As warning
scenarios seem to be less frequent, we could avoid doing memcpy
always.
-                if (fread(&replSlotStats[nReplSlotStats], 1,
sizeof(PgStat_ReplSlotStats), fpin)
+                if (fread(&replSlotStat, 1, sizeof(PgStat_ReplSlotStats), fpin)

+ memcpy(&replSlotStats[nReplSlotStats], &replSlotStat,
sizeof(PgStat_ReplSlotStats));

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#32vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#28)
3 attachment(s)
Re: Replication slot stats misgivings

On Fri, Apr 2, 2021 at 9:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 2, 2021 at 1:55 AM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

Thanks for your comments, Attached patch has the fix for the same.
Also attached a couple of more patches which addresses the comments
which Andres had listed i.e changing char to NameData type and also to
display the unspilled/unstreamed transaction information in the
replication statistics.
Thoughts?

Thank you for the patches!

I've looked at those patches and here are some comments on 0001, 0002,
and 0003 patch:

Thanks for the comments.

0001 patch:

-       values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+       values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));

We can use NameGetDatum() instead.

I felt we will not be able to use NameGetDatum because this function
will not have access to the value throughout the loop and NameGetDatum
must ensure the pointed-to value has adequate lifetime.

---
0002 patch:

The patch uses logical replication to test replication slots
statistics but I think it's necessarily necessary. It would be more
simple to use logical decoding. Maybe we can add TAP tests to
contrib/test_decoding.

I will try to change it to test_decoding if feasible and post in the
next version.

---
0003 patch:

void
pgstat_report_replslot(const char *slotname, int spilltxns, int spillcount,
-                      int spillbytes, int streamtxns, int
streamcount, int streambytes)
+                      int spillbytes, int streamtxns, int streamcount,
+                      int streambytes, int totaltxns, int totalbytes)
{

As Andreas pointed out, we should use a struct of stats updates rather
than adding more arguments to pgstat_report_replslot().

Modified as suggested.

---
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded in-progress transaction data replicated to
the decoding
+        output plugin while decoding changes from WAL for this slot.
This and other
+        counters for this slot can be used to gauge the network I/O
which occurred
+        during logical decoding and allow tuning
<literal>logical_decoding_work_mem</literal>.
+       </para>
+      </entry>
+     </row>

As I mentioned in another reply, I think users should not gauge the
network I/O which occurred during logical decoding using by those
counters since the actual amount of network I/O is affected by table
filtering and row filtering discussed on another thread[1]. Also,
since this is total bytes I'm not sure how users can use this value to
tune logical_decoding_work_mem. I agree to track both the total bytes
and the total number of transactions passed to the decoding plugin but
I think the description needs to be updated. How about the following
description for example?

Amount of decoded transaction data sent to the decoding output plugin
while decoding changes from WAL for this slot. This and total_txn for
this slot can be used to gauge the total amount of data during logical
decoding.

Modified as suggested.

---
I think we can merge 0001 and 0003 patches.

I have merged them.
Attached V2 patch which has the fixes for the same.
Thoughts?

Regards,
Vignesh

Attachments:

v2-0003-Handle-overwriting-of-replication-slot-statistic-.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Handle-overwriting-of-replication-slot-statistic-.patchDownload
From 846b0fea8b1c88d3b818f37cd382cc8a2e3fb57d Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 20:50:10 +0530
Subject: [PATCH v2 3/3] Handle overwriting of replication slot statistic
 issue.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the publisher is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, fixed it by skipping the
replication slot statistic that are after max_replication_slot.
---
 src/backend/postmaster/pgstat.c           | 20 ++++++++
 src/test/subscription/t/020_repl_stats.pl | 57 +++++++++++++++++++++--
 2 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1a54477c83..e27cd4d339 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -5122,6 +5122,26 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
+
+				/*
+				 * There is a remote scenario where one of the replication slots
+				 * is dropped and the drop slot statistics message is not
+				 * received by the statistic collector process, now if the
+				 * max_replication_slots is reduced to the actual number of
+				 * replication slots that are in use and the publisher is
+				 * re-started then the statistics process will not be aware of
+				 * this. To avoid writing beyond the max_replication_slots
+				 * this replication slot statistic information will be skipped.
+				 */
+				if (max_replication_slots == nReplSlotStats)
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("skipping \"%s\" replication slot statistics as pg_stat_replication_slots does not have enough slots",
+									NameStr(replSlotStats[nReplSlotStats].slotname))));
+					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
+					goto done;
+				}
+
 				nReplSlotStats++;
 				break;
 
diff --git a/src/test/subscription/t/020_repl_stats.pl b/src/test/subscription/t/020_repl_stats.pl
index 4be58d417a..b0674f2d41 100644
--- a/src/test/subscription/t/020_repl_stats.pl
+++ b/src/test/subscription/t/020_repl_stats.pl
@@ -2,9 +2,10 @@
 # view for logical replication.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 7;
 
 # Create publisher node.
 my $node_publisher = get_new_node('publisher');
@@ -25,6 +26,10 @@ my $node_subscriber3 = get_new_node('subscriber3');
 $node_subscriber3->init(allows_streaming => 'logical');
 $node_subscriber3->start;
 
+my $node_subscriber4 = get_new_node('subscriber4');
+$node_subscriber4->init(allows_streaming => 'logical');
+$node_subscriber4->start;
+
 # Create table on publisher.
 $node_publisher->safe_psql('postgres',
         "CREATE TABLE test_tab (a int primary key, b varchar)");
@@ -33,6 +38,7 @@ $node_publisher->safe_psql('postgres',
 $node_subscriber1->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
 $node_subscriber2->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
 $node_subscriber3->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber4->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
 
 # Setup logical replication.
 my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
@@ -62,9 +68,18 @@ $node_subscriber3->safe_psql('postgres',
 	PUBLICATION tap_pub WITH (streaming = off)"
 );
 
+my $appname4 = 'tap_sub4';
+$node_subscriber4->safe_psql('postgres',
+        "CREATE SUBSCRIPTION tap_sub4
+        CONNECTION '$publisher_connstr
+        application_name=$appname4'
+        PUBLICATION tap_pub WITH (streaming = off)"
+);
+
 $node_publisher->wait_for_catchup($appname1);
 $node_publisher->wait_for_catchup($appname2);
 $node_publisher->wait_for_catchup($appname3);
+$node_publisher->wait_for_catchup($appname4);
 
 # Interleave a pair of transactions, each exceeding the 64kB limit.
 my $in  = '';
@@ -100,6 +115,7 @@ $h->finish;    # errors make the next test fail, so ignore them here
 $node_publisher->wait_for_catchup($appname1);
 $node_publisher->wait_for_catchup($appname2);
 $node_publisher->wait_for_catchup($appname3);
+$node_publisher->wait_for_catchup($appname4);
 
 # Verify data is replicated to the subscribers.
 my $result =
@@ -114,6 +130,10 @@ $result =
   $node_subscriber3->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
 is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
 
+$result =
+  $node_subscriber4->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
 # Test to verify replication statistics data is updated in
 # pg_stat_replication_slots statistics view.
 $result = $node_publisher->safe_psql('postgres',
@@ -129,13 +149,43 @@ $result = $node_publisher->safe_psql('postgres',
 );
 is($result, qq(tap_sub1|f|f|f|t|t|t|
 tap_sub2|t|t|t|f|f|f|
-tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
+tap_sub3|t|t|t|f|f|f|
+tap_sub4|t|t|t|f|f|f|), 'check replication statistics are updated');
 
 # Test to drop one of the subscribers and verify replication statistics data is
 # fine after publisher is restarted.
-$node_subscriber3->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub3;");
+$node_subscriber4->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub4;");
+
+$node_publisher->stop;
+$node_publisher->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node_publisher->safe_psql('postgres',
+        "SELECT slot_name,
+                spill_txns > 0 AS spill_txns,
+                spill_count > 0 AS spill_count,
+                spill_bytes > 0 AS spill_bytes,
+                stream_txns > 0 AS stream_txns,
+                stream_count > 0 AS stream_count,
+                stream_bytes > 0 AS stream_bytes,
+                stats_reset
+        FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|
+tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after publisher is restarted.
 $node_publisher->stop;
+my $publisher_data = $node_publisher->data_dir;
+my $subscriber3_replslotdir = "$publisher_data/pg_replslot/tap_sub3";
+
+rmtree($subscriber3_replslotdir);
+
+$node_publisher->append_conf('postgresql.conf', 'max_replication_slots = 2');
 $node_publisher->start;
 
 # Verify statistics data present in pg_stat_replication_slots are sane after
@@ -158,4 +208,5 @@ tap_sub2|t|t|t|f|f|f|), 'check replication statistics are updated');
 $node_subscriber1->stop;
 $node_subscriber2->stop;
 $node_subscriber3->stop;
+$node_subscriber4->stop;
 $node_publisher->stop;
-- 
2.25.1

v2-0002-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Added-tests-for-verification-of-logical-replicati.patchDownload
From 0db04f815adcdda32355c71a8ea03c71e1665e87 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 19:47:01 +0530
Subject: [PATCH v2 2/3] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 src/test/subscription/t/020_repl_stats.pl | 161 ++++++++++++++++++++++
 1 file changed, 161 insertions(+)
 create mode 100644 src/test/subscription/t/020_repl_stats.pl

diff --git a/src/test/subscription/t/020_repl_stats.pl b/src/test/subscription/t/020_repl_stats.pl
new file mode 100644
index 0000000000..4be58d417a
--- /dev/null
+++ b/src/test/subscription/t/020_repl_stats.pl
@@ -0,0 +1,161 @@
+# Test replication statistics data gets updated in pg_stat_replication_slots
+# view for logical replication.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 5;
+
+# Create publisher node.
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf', 'logical_decoding_work_mem = 64kB');
+$node_publisher->start;
+
+# Create subscriber nodes.
+my $node_subscriber1 = get_new_node('subscriber1');
+$node_subscriber1->init(allows_streaming => 'logical');
+$node_subscriber1->start;
+
+my $node_subscriber2 = get_new_node('subscriber2');
+$node_subscriber2->init(allows_streaming => 'logical');
+$node_subscriber2->start;
+
+my $node_subscriber3 = get_new_node('subscriber3');
+$node_subscriber3->init(allows_streaming => 'logical');
+$node_subscriber3->start;
+
+# Create table on publisher.
+$node_publisher->safe_psql('postgres',
+        "CREATE TABLE test_tab (a int primary key, b varchar)");
+
+# Create table on subscribers.
+$node_subscriber1->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber2->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+$node_subscriber3->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999)");
+
+# Setup logical replication.
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname1 = 'tap_sub1';
+$node_subscriber1->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub1
+	CONNECTION '$publisher_connstr
+	application_name=$appname1'
+	PUBLICATION tap_pub WITH (streaming = on)"
+);
+
+my $appname2 = 'tap_sub2';
+$node_subscriber2->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub2
+	CONNECTION '$publisher_connstr
+	application_name=$appname2'
+	PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+my $appname3 = 'tap_sub3';
+$node_subscriber3->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub3
+	CONNECTION '$publisher_connstr
+	application_name=$appname3'
+	PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+$node_publisher->wait_for_catchup($appname1);
+$node_publisher->wait_for_catchup($appname2);
+$node_publisher->wait_for_catchup($appname3);
+
+# Interleave a pair of transactions, each exceeding the 64kB limit.
+my $in  = '';
+my $out = '';
+
+my $timer = IPC::Run::timeout(180);
+
+my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
+        on_error_stop => 0);
+
+$in .= q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
+UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
+DELETE FROM test_tab WHERE mod(a,3) = 0;
+};
+$h->pump_nb;
+
+$node_publisher->safe_psql(
+        'postgres', q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(5001, 9999) s(i);
+DELETE FROM test_tab WHERE a > 5000;
+COMMIT;
+});
+
+$in .= q{
+COMMIT;
+\q
+};
+$h->finish;    # errors make the next test fail, so ignore them here
+
+$node_publisher->wait_for_catchup($appname1);
+$node_publisher->wait_for_catchup($appname2);
+$node_publisher->wait_for_catchup($appname3);
+
+# Verify data is replicated to the subscribers.
+my $result =
+  $node_subscriber1->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+$result =
+  $node_subscriber2->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+$result =
+  $node_subscriber3->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(3332|3332|3332), 'check publisher data is replicated to the subscriber');
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+$result = $node_publisher->safe_psql('postgres',
+	"SELECT slot_name,
+		spill_txns > 0 AS spill_txns,
+		spill_count > 0 AS spill_count,
+		spill_bytes > 0 AS spill_bytes,
+		stream_txns > 0 AS stream_txns,
+		stream_count > 0 AS stream_count,
+		stream_bytes > 0 AS stream_bytes,
+		stats_reset
+	FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|
+tap_sub3|t|t|t|f|f|f|), 'check replication statistics are updated');
+
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node_subscriber3->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub3;");
+
+$node_publisher->stop;
+$node_publisher->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node_publisher->safe_psql('postgres',
+        "SELECT slot_name,
+                spill_txns > 0 AS spill_txns,
+                spill_count > 0 AS spill_count,
+                spill_bytes > 0 AS spill_bytes,
+                stream_txns > 0 AS stream_txns,
+                stream_count > 0 AS stream_count,
+                stream_bytes > 0 AS stream_bytes,
+                stats_reset
+        FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(tap_sub1|f|f|f|t|t|t|
+tap_sub2|t|t|t|f|f|f|), 'check replication statistics are updated');
+
+# shutdown
+$node_subscriber1->stop;
+$node_subscriber2->stop;
+$node_subscriber3->stop;
+$node_publisher->stop;
-- 
2.25.1

v2-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From ed3af8884463d6858f8b880661d38dfc574a6514 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Thu, 1 Apr 2021 21:15:47 +0530
Subject: [PATCH v2 1/3] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
This also includes changing char datatype to NameData datatype for slotname.
---
 doc/src/sgml/monitoring.sgml                  | 24 ++++++++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               | 38 ++++++++++---------
 src/backend/replication/logical/logical.c     | 30 ++++++++++-----
 .../replication/logical/reorderbuffer.c       |  5 +++
 src/backend/replication/slot.c                |  7 +++-
 src/backend/utils/adt/pgstatfuncs.c           | 10 +++--
 src/include/catalog/pg_proc.dat               |  6 +--
 src/include/pgstat.h                          | 15 ++++----
 src/include/replication/reorderbuffer.h       |  4 ++
 src/test/regress/expected/rules.out           |  4 +-
 11 files changed, 103 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..25975d2228 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2689,6 +2689,30 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This and total_txns
+        for this slot can be used to gauge the total amount of data during
+        logical decoding.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..e3991eb6f6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -874,6 +874,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 498d6ee123..1a54477c83 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -65,6 +65,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "utils/ascii.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1550,7 +1551,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1768,10 +1769,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1779,14 +1777,16 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1802,7 +1802,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -6088,7 +6088,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -6386,7 +6386,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -6418,6 +6418,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -6655,7 +6657,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -6668,7 +6670,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
@@ -6689,6 +6691,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 2f6803637b..d0ad694477 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1763,30 +1763,42 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
-
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
+
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..5bc0b05a0e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
 			dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
 			AssertTXNLsnOrder(rb);
 		}
+
+		rb->totalTxns++;
 	}
 	else
 		txn = NULL;				/* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 	{
 		txn->size += sz;
 		rb->size += sz;
+		rb->totalBytes += sz;
 
 		/* Update the total size in the top transaction. */
 		if (toptxn)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9ffbca685c..54eb3d1538 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2279,7 +2279,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2323,18 +2323,20 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..8b2071c77b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5296,9 +5296,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3247c7b8ad..f794dadffc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -379,7 +379,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -507,7 +507,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -515,6 +515,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 
@@ -872,13 +874,15 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
@@ -1234,10 +1238,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..f0eea8b18f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2056,8 +2056,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

#33vignesh C
vignesh21@gmail.com
In reply to: Bharath Rupireddy (#31)
Re: Replication slot stats misgivings

On Fri, Apr 2, 2021 at 11:28 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Apr 2, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:

Thanks for the comments, I will fix the comments and provide a patch
for this soon.

Thanks for the comments.

Here are some comments:
1) How about something like below
+                            (errmsg("skipping \"%s\" replication slot
statistics as the statistic collector process does not have enough
statistic slots",
instead of
+                            (errmsg("skipping \"%s\" replication
slot's statistic as the statistic collector process does not have
enough statistic slots",

Modified.

2) Does it mean "pg_statistic slots" when we say "statistic slots" in
the above warning? If yes, why can't we use "pg_statistic slots"
instead of "statistic slots" as with another existing message
"insufficient pg_statistic slots for array stats"?

Here pg_stat_replication_slots will not have enought slots. I changed
it to below:
errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots"
Thoughts?

3) Should we change the if condition to max_replication_slots <=
nReplSlotStats instead of max_replication_slots == nReplSlotStats? In
the scenario, it is mentioned that "one of the replication slots is
dropped", will this issue occur when multiple replication slots are
dropped?

I felt it should be max_replication_slots == nReplSlotStats, if
max_replication_slots = 5, we will be able to store 5 replication slot
statistics from 0,1..4, from 5th we will not have space. I think this
need not be changed.

4) Let's end the statement after this and start a new one, something like below
+                 * this. To avoid writing beyond the max_replication_slots
instead of
+                 * this, to avoid writing beyond the max_replication_slots

Changed it.

5) How about something like below
+                 * this. To avoid writing beyond the max_replication_slots,
+                 * this replication slot statistics information will
be skipped.
+                 */
instead of
+                 * this, to avoid writing beyond the max_replication_slots
+                 * these replication slot statistic information will
be skipped.
+                 */

Changed it.

6) Any specific reason to use a new local variable replSlotStat and
later memcpy into replSlotStats[nReplSlotStats]? Instead we could
directly fread into &replSlotStats[nReplSlotStats] and do
memset(&replSlotStats[nReplSlotStats], 0,
sizeof(PgStat_ReplSlotStats)); before the warnings. As warning
scenarios seem to be less frequent, we could avoid doing memcpy
always.
-                if (fread(&replSlotStats[nReplSlotStats], 1,
sizeof(PgStat_ReplSlotStats), fpin)
+                if (fread(&replSlotStat, 1, sizeof(PgStat_ReplSlotStats), fpin)

+ memcpy(&replSlotStats[nReplSlotStats], &replSlotStat,
sizeof(PgStat_ReplSlotStats));

I wanted to avoid the memcpy instructions multiple times, but your
explanation makes sense to keep the memcpy in failure path so that the
positive flow can be faster. Changed it.
These comments are fixed in the v2 patch posted in my previous mail.

Regards,
Vignesh

#34Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: vignesh C (#33)
Re: Replication slot stats misgivings

On Sat, Apr 3, 2021 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote:

Here pg_stat_replication_slots will not have enought slots. I changed
it to below:
errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots"
Thoughts?

WFM.

3) Should we change the if condition to max_replication_slots <=
nReplSlotStats instead of max_replication_slots == nReplSlotStats? In
the scenario, it is mentioned that "one of the replication slots is
dropped", will this issue occur when multiple replication slots are
dropped?

I felt it should be max_replication_slots == nReplSlotStats, if
max_replication_slots = 5, we will be able to store 5 replication slot
statistics from 0,1..4, from 5th we will not have space. I think this
need not be changed.

I'm not sure whether we can have a situation where
max_replication_slots < nReplSlotStats i.e. max_replication_slots
getting set to lesser than nReplSlotStats. I think I didn't get the
above mentioned scenario i.e. max_replication_slots == nReplSlotStats
correctly. It will be great if you could throw some light on that
scenario and ensure that it's not possible to reach a situation where
max_replication_slots < nReplSlotStats.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#35vignesh C
vignesh21@gmail.com
In reply to: Bharath Rupireddy (#34)
Re: Replication slot stats misgivings

On Mon, Apr 5, 2021 at 12:44 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Sat, Apr 3, 2021 at 11:12 PM vignesh C <vignesh21@gmail.com> wrote:

Here pg_stat_replication_slots will not have enought slots. I changed
it to below:
errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots"
Thoughts?

WFM.

3) Should we change the if condition to max_replication_slots <=
nReplSlotStats instead of max_replication_slots == nReplSlotStats? In
the scenario, it is mentioned that "one of the replication slots is
dropped", will this issue occur when multiple replication slots are
dropped?

I felt it should be max_replication_slots == nReplSlotStats, if
max_replication_slots = 5, we will be able to store 5 replication slot
statistics from 0,1..4, from 5th we will not have space. I think this
need not be changed.

I'm not sure whether we can have a situation where
max_replication_slots < nReplSlotStats i.e. max_replication_slots
getting set to lesser than nReplSlotStats. I think I didn't get the
above mentioned scenario i.e. max_replication_slots == nReplSlotStats
correctly. It will be great if you could throw some light on that
scenario and ensure that it's not possible to reach a situation where
max_replication_slots < nReplSlotStats.

Usually this will not happen, there is a remote chance that will
happen in the below scenario:
When a replication slot is created, a slot is created in
pg_stat_replication_slot also by the statistics collector, this cannot
exceed max_replication_slots. Whenever a slot is dropped, the
corresponding slot will be deleted from pg_stat_replication_slot. The
statistics collector uses UDP protocol for communication hence there
is no guarantee that the message is received by the statistic
collector process. After the user has dropped the replication slot, if
the server is stopped(here statistic collector process has not yet
received the drop replication slot statistic). Then the user reduces
the max_repllication_slot and starts the server. In this scenario the
statistic collector process will have more replication slot
statistics(as the drop was not received) than max_replication_slot.

Regards,
Vignesh

#36vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#32)
3 attachment(s)
Re: Replication slot stats misgivings

On Sat, Apr 3, 2021 at 11:07 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Apr 2, 2021 at 9:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 2, 2021 at 1:55 AM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 1, 2021 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 1, 2021 at 3:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Mar 31, 2021 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 30, 2021 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-30 10:13:29 +0530, vignesh C wrote:

On Tue, Mar 30, 2021 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:

Any chance you could write a tap test exercising a few of these cases?

I can try to write a patch for this if nobody objects.

Cool!

Attached a patch which has the test for the first scenario.

E.g. things like:

- create a few slots, drop one of them, shut down, start up, verify
stats are still sane
- create a few slots, shut down, manually remove a slot, lower
max_replication_slots, start up

Here by "manually remove a slot", do you mean to remove the slot
manually from the pg_replslot folder?

Yep - thereby allowing max_replication_slots after the shutdown/start to
be lower than the number of slots-stats objects.

I have not included the 2nd test in the patch as the test fails with
following warnings and also displays the statistics of the removed
slot:
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438
WARNING: problem in alloc set Statistics snapshot: detected write
past chunk end in block 0x55d038b8e410, chunk 0x55d038b8e438

This happens because the statistics file has an additional slot
present even though the replication slot was removed. I felt this
issue should be fixed. I will try to fix this issue and send the
second test along with the fix.

I felt from the statistics collector process, there is no way in which
we can identify if the replication slot is present or not because the
statistic collector process does not have access to shared memory.
Anything that the statistic collector process does independently by
traversing and removing the statistics of the replication slot
exceeding the max_replication_slot has its drawback of removing some
valid replication slot's statistics data.
Any thoughts on how we can identify the replication slot which has been dropped?
Can someone point me to the shared stats patch link with which message
loss can be avoided. I wanted to see a scenario where something like
the slot is dropped but the statistics are not updated because of an
immediate shutdown or server going down abruptly can occur or not with
the shared stats patch.

I don't think it is easy to simulate a scenario where the 'drop'
message is dropped and I think that is why the test contains the step
to manually remove the slot. At this stage, you can probably provide a
test patch and a code-fix patch where it just drops the extra slots
from the stats file. That will allow us to test it with a shared
memory stats patch on which Andres and Horiguchi-San are working. If
we still continue to pursue with current approach then as Andres
suggested we might send additional information from
RestoreSlotFromDisk to keep it in sync.

Thanks for your comments, Attached patch has the fix for the same.
Also attached a couple of more patches which addresses the comments
which Andres had listed i.e changing char to NameData type and also to
display the unspilled/unstreamed transaction information in the
replication statistics.
Thoughts?

Thank you for the patches!

I've looked at those patches and here are some comments on 0001, 0002,
and 0003 patch:

Thanks for the comments.

0001 patch:

-       values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+       values[0] = PointerGetDatum(cstring_to_text(s->slotname.data));

We can use NameGetDatum() instead.

I felt we will not be able to use NameGetDatum because this function
will not have access to the value throughout the loop and NameGetDatum
must ensure the pointed-to value has adequate lifetime.

---
0002 patch:

The patch uses logical replication to test replication slots
statistics but I think it's necessarily necessary. It would be more
simple to use logical decoding. Maybe we can add TAP tests to
contrib/test_decoding.

I will try to change it to test_decoding if feasible and post in the
next version.

I have modified the patch to include tap tests in contrib/test_decoding.
Attached v3 patch has the changes for the same.
Thoughts?

Regards,
Vignesh

Attachments:

v3-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patchapplication/x-patch; name=v3-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From ac894e550f624f21f1ac08cc48d1dbad767e9a5b Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 19:01:03 +0530
Subject: [PATCH v3 1/3] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
This also includes changing char datatype to NameData datatype for slotname.
---
 doc/src/sgml/monitoring.sgml                  | 24 ++++++++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               | 38 ++++++++++---------
 src/backend/replication/logical/logical.c     | 30 ++++++++++-----
 .../replication/logical/reorderbuffer.c       |  5 +++
 src/backend/replication/slot.c                |  7 +++-
 src/backend/utils/adt/pgstatfuncs.c           | 10 +++--
 src/include/catalog/pg_proc.dat               |  6 +--
 src/include/pgstat.h                          | 15 ++++----
 src/include/replication/reorderbuffer.h       |  4 ++
 src/test/regress/expected/rules.out           |  4 +-
 11 files changed, 103 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..25975d2228 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2689,6 +2689,30 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This and total_txns
+        for this slot can be used to gauge the total amount of data during
+        logical decoding.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..e3991eb6f6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -874,6 +874,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 4c4b072068..1d8626d17c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -62,6 +62,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1525,7 +1526,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1743,10 +1744,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1754,14 +1752,16 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1777,7 +1777,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5049,7 +5049,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5347,7 +5347,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5379,6 +5379,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5572,7 +5574,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5585,7 +5587,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
@@ -5606,6 +5608,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 2f6803637b..d0ad694477 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1763,30 +1763,42 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
-
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
+
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..5bc0b05a0e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
 			dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
 			AssertTXNLsnOrder(rb);
 		}
+
+		rb->totalTxns++;
 	}
 	else
 		txn = NULL;				/* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 	{
 		txn->size += sz;
 		rb->size += sz;
+		rb->totalBytes += sz;
 
 		/* Update the total size in the top transaction. */
 		if (toptxn)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9ffbca685c..54eb3d1538 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2279,7 +2279,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2323,18 +2323,20 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 69ffd0c3f4..8b2071c77b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5296,9 +5296,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 7cd137506e..63b740ab40 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -378,7 +378,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -506,7 +506,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -514,6 +514,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 
@@ -871,13 +873,15 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
@@ -980,10 +984,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..f0eea8b18f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2056,8 +2056,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v3-0002-Added-tests-for-verification-of-logical-replicati.patchapplication/x-patch; name=v3-0002-Added-tests-for-verification-of-logical-replicati.patchDownload
From d4be3220eaba71b8c6916a85314c43682bea9a33 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v3 2/3] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |   2 +
 contrib/test_decoding/t/001_repl_stats.pl | 112 ++++++++++++++++++++++
 2 files changed, 114 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..4f50804bf4
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,112 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+$node->safe_psql('postgres', q(CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_slot_name text) RETURNS void AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  txn_count bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 5 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT (total_txns > 0) INTO txn_count
+    FROM pg_stat_replication_slots WHERE slot_name=check_slot_name;
+
+    exit WHEN txn_count;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+    -- reset stats snapshot so we can test again
+    perform pg_stat_clear_snapshot();
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_decode_stats delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+));
+
+# Create replication slots.
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot3', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot4', 'test_decoding')");
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+# Wait for the statistics to be updated.
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats(true, 'regression_slot1')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats(true, 'regression_slot2')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats(true, 'regression_slot3')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats(true, 'regression_slot4')");
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+my $result = $node->safe_psql('postgres',
+		"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+			FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t
+regression_slot4|t|t), 'check replication statistics are updated');
+
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+		"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+			FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.25.1

v3-0003-Handle-overwriting-of-replication-slot-statistic-.patchapplication/x-patch; name=v3-0003-Handle-overwriting-of-replication-slot-statistic-.patchDownload
From 9c39345b701bf8794bb5ef8fa4f1de2c33f8700a Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 19:09:25 +0530
Subject: [PATCH v3 3/3] Handle overwriting of replication slot statistic
 issue.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the publisher is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, fixed it by skipping the
replication slot statistic that are after max_replication_slot.
---
 contrib/test_decoding/t/001_repl_stats.pl | 24 +++++++++++++++++++++--
 src/backend/postmaster/pgstat.c           | 20 +++++++++++++++++++
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 4f50804bf4..53283f55bd 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -5,7 +5,7 @@ use warnings;
 use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 3;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -101,12 +101,32 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after publisher is restarted.
+$node->stop;
+my $publisher_data = $node->data_dir;
+my $slot3_replslotdir = "$publisher_data/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+		"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+			FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1d8626d17c..ede97362b7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4078,6 +4078,26 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
+
+				/*
+				 * There is a remote scenario where one of the replication slots
+				 * is dropped and the drop slot statistics message is not
+				 * received by the statistic collector process, now if the
+				 * max_replication_slots is reduced to the actual number of
+				 * replication slots that are in use and the publisher is
+				 * re-started then the statistics process will not be aware of
+				 * this. To avoid writing beyond the max_replication_slots
+				 * this replication slot statistic information will be skipped.
+				 */
+				if (max_replication_slots == nReplSlotStats)
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("skipping \"%s\" replication slot statistics as pg_stat_replication_slots does not have enough slots",
+									NameStr(replSlotStats[nReplSlotStats].slotname))));
+					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
+					goto done;
+				}
+
 				nReplSlotStats++;
 				break;
 
-- 
2.25.1

#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#10)
Re: Replication slot stats misgivings

On Mon, Mar 22, 2021 at 9:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

That's because we followed other definitions in pgstat.h that use
char[NAMEDATALEN]. I'm okay with using NameData.

I see that at many places in code we use char[NAMEDATALEN] for names.
However, for slotname, we use NameData, see:
typedef struct ReplicationSlotPersistentData
{
/* The slot's identifier */
NameData name;

So, it will be better to use the same for pgstat purposes as well. In
other words, I also agree with this decision and I see that Vignesh
has already used NameData for slot_name in his recent patch.

--
With Regards,
Amit Kapila.

#38Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#36)
Re: Replication slot stats misgivings

On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21@gmail.com> wrote:

Few comments on the latest patches:
Comments on 0001
--------------------------------
1.
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
TransactionId xid, bool create,
dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
AssertTXNLsnOrder(rb);
}
+
+ rb->totalTxns++;
}
else
txn = NULL; /* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
{
txn->size += sz;
rb->size += sz;
+ rb->totalBytes += sz;

I think this will include the txns that are aborted and for which we
don't send anything. It might be better to update these stats in
ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
have sent the data. We can probably use size/total_size in txn. We
need to be careful to not double include the totaltxn or totalBytes
for streaming xacts as we might process the same txn multiple times.

2.
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This and total_txns
+        for this slot can be used to gauge the total amount of data during
+        logical decoding.

I think we can slightly modify the second line here: "This can be used
to gauge the total amount of data sent during logical decoding.". Why
we need to include total_txns along with it.

0002
----------
3.
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 5 LOOP
+
...
...
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');

I think this loop needs to be executed 300 times instead of 5 times,
if the above comments and code needs to do what is expected here?

4.
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node->safe_psql('postgres', "SELECT
pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+ "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+ FROM pg_stat_replication_slots ORDER BY slot_name"

Various comments in the 0002 refer to publisher/subscriber which is
not what we are using here.

5.
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can save the above calls to pg_logical_slot_get_changes if
we create table before creating the slots in this test.

0003
---------
6. In the tests/code, publisher is used at multiple places. I think
that is not required because this can happen via plugin as well.
7.
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));

Do we need memset here? Isn't this location is past the max location?

--
With Regards,
Amit Kapila.

#39vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#38)
Re: Replication slot stats misgivings

On Tue, Apr 6, 2021 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21@gmail.com> wrote:

Few comments on the latest patches:
Comments on 0001
--------------------------------
1.
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
TransactionId xid, bool create,
dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
AssertTXNLsnOrder(rb);
}
+
+ rb->totalTxns++;
}
else
txn = NULL; /* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
{
txn->size += sz;
rb->size += sz;
+ rb->totalBytes += sz;

I think this will include the txns that are aborted and for which we
don't send anything. It might be better to update these stats in
ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
have sent the data. We can probably use size/total_size in txn. We
need to be careful to not double include the totaltxn or totalBytes
for streaming xacts as we might process the same txn multiple times.

2.
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This and total_txns
+        for this slot can be used to gauge the total amount of data during
+        logical decoding.

I think we can slightly modify the second line here: "This can be used
to gauge the total amount of data sent during logical decoding.". Why
we need to include total_txns along with it.

0002
----------
3.
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 5 LOOP
+
...
...
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');

I think this loop needs to be executed 300 times instead of 5 times,
if the above comments and code needs to do what is expected here?

4.
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node->safe_psql('postgres', "SELECT
pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+ "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+ FROM pg_stat_replication_slots ORDER BY slot_name"

Various comments in the 0002 refer to publisher/subscriber which is
not what we are using here.

5.
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can save the above calls to pg_logical_slot_get_changes if
we create table before creating the slots in this test.

0003
---------
6. In the tests/code, publisher is used at multiple places. I think
that is not required because this can happen via plugin as well.
7.
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));

Do we need memset here? Isn't this location is past the max location?

Thanks for the comments, I will fix and post a patch for this soon.

Regards,
Vignesh

#40vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#38)
3 attachment(s)
Re: Replication slot stats misgivings

On Tue, Apr 6, 2021 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21@gmail.com> wrote:

Few comments on the latest patches:
Comments on 0001
--------------------------------
1.
@@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
TransactionId xid, bool create,
dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
AssertTXNLsnOrder(rb);
}
+
+ rb->totalTxns++;
}
else
txn = NULL; /* not found and not asked to create */
@@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
{
txn->size += sz;
rb->size += sz;
+ rb->totalBytes += sz;

I think this will include the txns that are aborted and for which we
don't send anything. It might be better to update these stats in
ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
have sent the data. We can probably use size/total_size in txn. We
need to be careful to not double include the totaltxn or totalBytes
for streaming xacts as we might process the same txn multiple times.

Modified it to update total_byte for spilled transactions and streamed
transactions where spill_bytes and stream_bytes are updated. For
non-stream/spilled transactions, total_bytes is updated in
ReorderBufferProcessTXN.

2.
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This and total_txns
+        for this slot can be used to gauge the total amount of data during
+        logical decoding.

I think we can slightly modify the second line here: "This can be used
to gauge the total amount of data sent during logical decoding.". Why
we need to include total_txns along with it.

Modified it.

0002
----------
3.
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 5 LOOP
+
...
...
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');

I think this loop needs to be executed 300 times instead of 5 times,
if the above comments and code needs to do what is expected here?

Modified it.

4.
+# Test to drop one of the subscribers and verify replication statistics data is
+# fine after publisher is restarted.
+$node->safe_psql('postgres', "SELECT
pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# publisher is restarted
+$result = $node->safe_psql('postgres',
+ "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+ FROM pg_stat_replication_slots ORDER BY slot_name"

Various comments in the 0002 refer to publisher/subscriber which is
not what we are using here.

Removed references to publisher/subscriber.

5.
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can save the above calls to pg_logical_slot_get_changes if
we create table before creating the slots in this test.

Modified it.

0003
---------
6. In the tests/code, publisher is used at multiple places. I think
that is not required because this can happen via plugin as well.

Removed references to publisher.

7.
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));

Do we need memset here? Isn't this location is past the max location?

That is not required, I have modified it.
Attached v4 patch has the fixes for the same.

Regards,
Vignesh

Attachments:

v4-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From f7ffa65330038e1e41b4385f05a8b20c9ada02e2 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 19:01:03 +0530
Subject: [PATCH v4 1/3] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
This also includes changing char datatype to NameData datatype for slotname.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 23 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               | 38 +++++----
 src/backend/replication/logical/logical.c     | 30 ++++---
 .../replication/logical/reorderbuffer.c       | 18 +++++
 src/backend/replication/slot.c                |  7 +-
 src/backend/utils/adt/pgstatfuncs.c           | 10 ++-
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          | 15 ++--
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 13 files changed, 205 insertions(+), 79 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..a727f32fb2 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2689,6 +2689,29 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..e3991eb6f6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -874,6 +874,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 5ba776e789..89b9315af6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -62,6 +62,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1525,7 +1526,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1743,10 +1744,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1754,14 +1752,16 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1777,7 +1777,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5049,7 +5049,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5347,7 +5347,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5379,6 +5379,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5572,7 +5574,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5585,7 +5587,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
@@ -5606,6 +5608,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 2f6803637b..d0ad694477 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1763,30 +1763,42 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
-
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
+
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..7899060f02 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -659,6 +661,7 @@ ReorderBufferTXNByXid(ReorderBuffer *rb, TransactionId xid, bool create,
 			dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
 			AssertTXNLsnOrder(rb);
 		}
+
 	}
 	else
 		txn = NULL;				/* not found and not asked to create */
@@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				rb->begin(rb, txn);
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes, if
+		 * transaction is streamed or spilled it will be updated while the
+		 * transaction gets spilled or streamed.
+		 */
+		if (!rb->streamBytes && !rb->spillBytes)
+		{
+			rb->totalTxns++;
+			rb->totalBytes += rb->size;
+		}
+
 		ReorderBufferIterTXNInit(rb, txn, &iterstate);
 		while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
 		{
@@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	{
 		rb->spillCount += 1;
 		rb->spillBytes += size;
+		rb->totalBytes += size;
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+		rb->totalTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3892,9 +3908,11 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 	rb->streamCount += 1;
 	rb->streamBytes += stream_bytes;
+	rb->totalBytes += stream_bytes;
 
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
+	rb->totalTxns += (txn_is_streamed) ? 0 : 1;
 
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9ffbca685c..54eb3d1538 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2279,7 +2279,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2323,18 +2323,20 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4309fa40dd..95be4db2b2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,9 +5311,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 7cd137506e..63b740ab40 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -378,7 +378,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -506,7 +506,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -514,6 +514,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 
@@ -871,13 +873,15 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
@@ -980,10 +984,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..f0eea8b18f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2056,8 +2056,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v4-0002-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Added-tests-for-verification-of-logical-replicati.patchDownload
From a616512f2d30d9211f21f5405485a6848d529381 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v4 2/3] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |   2 +
 contrib/test_decoding/t/001_repl_stats.pl | 113 ++++++++++++++++++++++
 2 files changed, 115 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..11ffe0ca3e
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,113 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+$node->safe_psql('postgres', q(CREATE FUNCTION wait_for_decode_stats(check_slot_name text) RETURNS void AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  txn_count bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT (total_txns > 0) INTO txn_count
+    FROM pg_stat_replication_slots WHERE slot_name=check_slot_name;
+
+    exit WHEN txn_count;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+    -- reset stats snapshot so we can test again
+    perform pg_stat_clear_snapshot();
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_decode_stats delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+));
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot3', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot4', 'test_decoding')");
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+
+# Wait for the statistics to be updated.
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot1')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot2')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot3')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot4')");
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+my $result = $node->safe_psql('postgres',
+		"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+			FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t
+regression_slot4|t|t), 'check replication statistics are updated');
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+		"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+			FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.25.1

v4-0003-Handle-overwriting-of-replication-slot-statistic-.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Handle-overwriting-of-replication-slot-statistic-.patchDownload
From d3d93d7a604102d77b4e220a6453d39686eb271d Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 7 Apr 2021 13:07:26 +0530
Subject: [PATCH v4 3/3] Handle overwriting of replication slot statistic
 issue.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, fixed it by skipping the
replication slot statistic that are after max_replication_slot.
---
 contrib/test_decoding/t/001_repl_stats.pl | 24 +++++++++++++++++++++--
 src/backend/postmaster/pgstat.c           | 18 +++++++++++++++++
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11ffe0ca3e..4a1b113c41 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -5,7 +5,7 @@ use warnings;
 use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 3;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -102,12 +102,32 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after restart.
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+               "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
+                       FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 89b9315af6..c292d4ab94 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4069,6 +4069,24 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				 * slot follows.
 				 */
 			case 'R':
+				/*
+				 * There is a remote scenario where one of the replication slots
+				 * is dropped and the drop slot statistics message is not
+				 * received by the statistic collector process, now if the
+				 * max_replication_slots is reduced to the actual number of
+				 * replication slots that are in use and the server is
+				 * re-started then the statistics process will not be aware of
+				 * this. To avoid writing beyond the max_replication_slots
+				 * this replication slot statistic information will be skipped.
+				 */
+				if (max_replication_slots == nReplSlotStats)
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("skipping \"%s\" replication slot statistics as pg_stat_replication_slots does not have enough slots",
+									NameStr(replSlotStats[nReplSlotStats].slotname))));
+					goto done;
+				}
+
 				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
 					!= sizeof(PgStat_ReplSlotStats))
 				{
-- 
2.25.1

#41Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#40)
Re: Replication slot stats misgivings

On Wed, Apr 7, 2021 at 2:51 PM vignesh C <vignesh21@gmail.com> wrote:

@@ -4069,6 +4069,24 @@ pgstat_read_statsfiles(Oid onlydb, bool
permanent, bool deep)
  * slot follows.
  */
  case 'R':
+ /*
+ * There is a remote scenario where one of the replication slots
+ * is dropped and the drop slot statistics message is not
+ * received by the statistic collector process, now if the
+ * max_replication_slots is reduced to the actual number of
+ * replication slots that are in use and the server is
+ * re-started then the statistics process will not be aware of
+ * this. To avoid writing beyond the max_replication_slots
+ * this replication slot statistic information will be skipped.
+ */
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ goto done;
+ }

I think we might truncate some valid slots here. I have another idea
to fix this case which is that while writing, we first write the
'nReplSlotStats' and then write each slot info. Then while reading we
can allocate memory based on the required number of slots. Later when
startup process sends the slots, we can remove the already dropped
slots from this array. What do you think?

--
With Regards,
Amit Kapila.

#42vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#41)
Re: Replication slot stats misgivings

On Thu, Apr 8, 2021 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 7, 2021 at 2:51 PM vignesh C <vignesh21@gmail.com> wrote:

@@ -4069,6 +4069,24 @@ pgstat_read_statsfiles(Oid onlydb, bool
permanent, bool deep)
* slot follows.
*/
case 'R':
+ /*
+ * There is a remote scenario where one of the replication slots
+ * is dropped and the drop slot statistics message is not
+ * received by the statistic collector process, now if the
+ * max_replication_slots is reduced to the actual number of
+ * replication slots that are in use and the server is
+ * re-started then the statistics process will not be aware of
+ * this. To avoid writing beyond the max_replication_slots
+ * this replication slot statistic information will be skipped.
+ */
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ goto done;
+ }

I think we might truncate some valid slots here. I have another idea
to fix this case which is that while writing, we first write the
'nReplSlotStats' and then write each slot info. Then while reading we
can allocate memory based on the required number of slots. Later when
startup process sends the slots, we can remove the already dropped
slots from this array. What do you think?

I felt this idea is better, the reason being in the earlier idea we
might end up deleting some valid replication slot statistics and that
slot's statistics will never be available to the user.

Regards,
Vignesh

#43Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#40)
Re: Replication slot stats misgivings

On Wed, Apr 7, 2021 at 2:51 PM vignesh C <vignesh21@gmail.com> wrote:

That is not required, I have modified it.
Attached v4 patch has the fixes for the same.

Few comments:

0001
------
1. The first patch includes changing char datatype to NameData
datatype for slotname. I feel this can be a separate patch from adding
new stats in the view. I think we can also move the change related to
moving stats to a structure rather than sending them individually in
the same patch.

2.
@@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
rb->begin(rb, txn);
}

+ /*
+ * Update total transaction count and total transaction bytes, if
+ * transaction is streamed or spilled it will be updated while the
+ * transaction gets spilled or streamed.
+ */
+ if (!rb->streamBytes && !rb->spillBytes)
+ {
+ rb->totalTxns++;
+ rb->totalBytes += rb->size;
+ }

I think this will skip a transaction if it is interleaved between a
streaming transaction. Assume, two transactions t1 and t2. t1 sends
changes in multiple streams and t2 sends all changes in one go at
commit time. So, now, if t2 is interleaved between multiple streams
then I think the above won't count t2.

3.
@@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
{
rb->spillCount += 1;
rb->spillBytes += size;
+ rb->totalBytes += size;

/* don't consider already serialized transactions */
rb->spillTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+ rb->totalTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
}

We do serialize each subtransaction separately. So totalTxns will
include subtransaction count as well when serialized, otherwise not.
The description of totalTxns also says that it doesn't include
subtransactions. So, I think updating rb->totalTxns here is wrong.

0002
-----
1.
+$node->safe_psql('postgres',
+ "SELECT data FROM pg_logical_slot_get_changes('regression_slot2',
NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

The indentation of the second SELECT seems to bit off.

--
With Regards,
Amit Kapila.

#44Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#41)
1 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 8, 2021 at 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 7, 2021 at 2:51 PM vignesh C <vignesh21@gmail.com> wrote:

@@ -4069,6 +4069,24 @@ pgstat_read_statsfiles(Oid onlydb, bool
permanent, bool deep)
* slot follows.
*/
case 'R':
+ /*
+ * There is a remote scenario where one of the replication slots
+ * is dropped and the drop slot statistics message is not
+ * received by the statistic collector process, now if the
+ * max_replication_slots is reduced to the actual number of
+ * replication slots that are in use and the server is
+ * re-started then the statistics process will not be aware of
+ * this. To avoid writing beyond the max_replication_slots
+ * this replication slot statistic information will be skipped.
+ */
+ if (max_replication_slots == nReplSlotStats)
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("skipping \"%s\" replication slot statistics as
pg_stat_replication_slots does not have enough slots",
+ NameStr(replSlotStats[nReplSlotStats].slotname))));
+ goto done;
+ }

I think we might truncate some valid slots here. I have another idea
to fix this case which is that while writing, we first write the
'nReplSlotStats' and then write each slot info. Then while reading we
can allocate memory based on the required number of slots. Later when
startup process sends the slots, we can remove the already dropped
slots from this array. What do you think?

IIUC there are two problems in the case where the drop message is lost:

1. Writing beyond the end of replSlotStats.
This can happen if after restarting the number of slots whose stats
are stored in the stats file exceeds max_replication_slots. Vignesh's
patch addresses this problem.

2. The stats for the new slot are not recorded.
If the stats for already-dropped slots remain in replSlotStats, the
stats for the new slot cannot be registered due to the full of
replSlotStats. This can happen even when after restarting the number
of slots whose stats are stored in the stat file does NOT exceed
max_replication_slots as well as even during the server running. The
patch doesn’t address this problem. (If this happens, we will have to
reset all slot stats since pg_stat_reset_replication_slot() cannot
remove the slot stats with the non-existing name).

I think we can use HTAB to store slot stats and have
pg_stat_get_replication_slot() inquire about stats by the slot name,
resolving both problems. By using HTAB we're no longer concerned about
the problem of writing stats beyond the end of the replSlotStats
array. Instead, we have to consider how and when to clean up the stats
for already-dropped slots. We can have the startup process send slot
names at startup time, which borrows the idea proposed by Amit. But
maybe we need to consider the case again where the message from the
startup process is lost? Another idea would be to have
pgstat_vacuum_stat() check the existing slots and call
pgstat_report_replslot_drop() if the slot in the stats file doesn't
exist. That way, we can continuously check the stats for
already-dropped slots.

I've written a PoC patch for the above idea; using HTAB and cleaning
up slot stats at pgstat_vacuum_stat(). The patch can be applied on top
of 0001 patch Vignesh proposed before[1]/messages/by-id/CALDaNm195xL1bZq4VHKt=-wmXJ5kC4jxKh7LXK+pN7ESFjHO+w@mail.gmail.com.

Please note that this cannot resolve the problem of ending up
accumulating the stats to the old slot if the slot is re-created with
the same name and the drop message is lost. To deal with this problem
I think we would need to use something unique identifier for each slot
instead of slot name.

[1]: /messages/by-id/CALDaNm195xL1bZq4VHKt=-wmXJ5kC4jxKh7LXK+pN7ESFjHO+w@mail.gmail.com

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

0001-POC-Use-HTAB-for-replication-slot-stats.patchapplication/octet-stream; name=0001-POC-Use-HTAB-for-replication-slot-stats.patchDownload
From 8184f054562f747387beb62f11b7508c4d5c80b3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 9 Apr 2021 21:46:36 +0900
Subject: [PATCH] POC: Use HTAB for replication slot stats.

---
 src/backend/catalog/system_views.sql      |  30 +--
 src/backend/postmaster/pgstat.c           | 273 +++++++++++-----------
 src/backend/replication/logical/logical.c |  16 +-
 src/backend/replication/slot.c            |  17 +-
 src/backend/utils/adt/pgstatfuncs.c       | 134 ++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |  12 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   8 +-
 9 files changed, 267 insertions(+), 239 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6d78b33590..e5591794c2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.total_txns,
+            s.total_bytes,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e5b1fb045e..46eb4507c1 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_ReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hash table if they still exist.
+	 *
+	 * XXX: maybe we can skip this check until the number of entries in the
+	 * replication stats hash exceeds max_replication_slots.
+	 */
+	if (replSlotStats)
+	{
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1817,14 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	msg.m_spill_txns = repSlotStat->spill_txns;
 	msg.m_spill_count = repSlotStat->spill_count;;
 	msg.m_spill_bytes = repSlotStat->spill_bytes;
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
-	msg.m_total_txns = repSlotStat->total_txns;
-	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -2872,17 +2866,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_ReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_ReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3654,7 +3650,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3739,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_ReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3976,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4000,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4186,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_ReplSlotEntry slotstats;
+				PgStat_ReplSlotEntry *slotent;
+
+				if (fread(&slotstats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotstats.slotname, true);
+				memcpy(slotent, &slotstats, sizeof(PgStat_ReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4424,7 +4419,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_ReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4548,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4760,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5183,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_ReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_ReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5210,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5532,29 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_ReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
 	}
 }
 
@@ -5749,59 +5732,79 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_ReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_ReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_ReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_ReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
+
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_ReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 20a99d3035..cc2a28a045 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
 	 * Nothing to do if we don't have any replication stats to be sent.
@@ -1783,32 +1783,32 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 
 	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes,
-		 (long long) rb->totalTxns,
-		 (long long) rb->totalBytes);
+		 (long long) rb->streamBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 	repSlotStat.spill_txns = rb->spillTxns;
 	repSlotStat.spill_count = rb->spillCount;
 	repSlotStat.spill_bytes = rb->spillBytes;
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
-	repSlotStat.total_txns = rb->totalTxns;
-	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
-	rb->totalTxns = 0;
-	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..ea8e0305f8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_ReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..f92c50dd27 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,68 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
-	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
+	BlessTupleDesc(tupdesc);
 
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
+	if (!slotent)
+		PG_RETURN_NULL();
 
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-	}
+	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
+	values[1] = Int64GetDatum(slotent->total_txns);
+	values[2] = Int64GetDatum(slotent->total_bytes);
+	values[3] = Int64GetDatum(slotent->spill_txns);
+	values[4] = Int64GetDatum(slotent->spill_count);
+	values[5] = Int64GetDatum(slotent->spill_bytes);
+	values[6] = Int64GetDatum(slotent->stream_txns);
+	values[7] = Int64GetDatum(slotent->stream_count);
+	values[8] = Int64GetDatum(slotent->stream_bytes);
 
-	tuplestore_donestoring(tupstore);
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 591753fe81..85720dcb0c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,14 +5311,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,total_txns,total_bytes,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2aeb3cded4..a6de486291 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,19 +917,19 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry
 {
 	NameData	slotname;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
-	PgStat_Counter total_txns;
-	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_ReplSlotEntry;
 
 
 /*
@@ -1031,7 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1129,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_ReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..4380577b7d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2062,16 +2062,18 @@ pg_stat_replication| SELECT s.pid,
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
+    s.total_txns,
+    s.total_bytes,
     s.spill_txns,
     s.spill_count,
     s.spill_bytes,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
-    s.total_txns,
-    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, total_txns, total_bytes, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.24.3 (Apple Git-128)

#45Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#43)
Re: Replication slot stats misgivings

On Fri, Apr 9, 2021 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

2.
@@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
rb->begin(rb, txn);
}

+ /*
+ * Update total transaction count and total transaction bytes, if
+ * transaction is streamed or spilled it will be updated while the
+ * transaction gets spilled or streamed.
+ */
+ if (!rb->streamBytes && !rb->spillBytes)
+ {
+ rb->totalTxns++;
+ rb->totalBytes += rb->size;
+ }

I think this will skip a transaction if it is interleaved between a
streaming transaction. Assume, two transactions t1 and t2. t1 sends
changes in multiple streams and t2 sends all changes in one go at
commit time. So, now, if t2 is interleaved between multiple streams
then I think the above won't count t2.

3.
@@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
{
rb->spillCount += 1;
rb->spillBytes += size;
+ rb->totalBytes += size;

/* don't consider already serialized transactions */
rb->spillTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+ rb->totalTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
}

We do serialize each subtransaction separately. So totalTxns will
include subtransaction count as well when serialized, otherwise not.
The description of totalTxns also says that it doesn't include
subtransactions. So, I think updating rb->totalTxns here is wrong.

The attached patch should fix the above two comments. I think it
should be sufficient if we just update the stats after processing the
TXN. We need to ensure that don't count streamed transactions multiple
times. I have not tested the attached, can you please review/test it
and include it in the next set of patches if you agree with this
change.

--
With Regards,
Amit Kapila.

#46Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#45)
1 attachment(s)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 9:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 9, 2021 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

2.
@@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
rb->begin(rb, txn);
}

+ /*
+ * Update total transaction count and total transaction bytes, if
+ * transaction is streamed or spilled it will be updated while the
+ * transaction gets spilled or streamed.
+ */
+ if (!rb->streamBytes && !rb->spillBytes)
+ {
+ rb->totalTxns++;
+ rb->totalBytes += rb->size;
+ }

I think this will skip a transaction if it is interleaved between a
streaming transaction. Assume, two transactions t1 and t2. t1 sends
changes in multiple streams and t2 sends all changes in one go at
commit time. So, now, if t2 is interleaved between multiple streams
then I think the above won't count t2.

3.
@@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
{
rb->spillCount += 1;
rb->spillBytes += size;
+ rb->totalBytes += size;

/* don't consider already serialized transactions */
rb->spillTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+ rb->totalTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
}

We do serialize each subtransaction separately. So totalTxns will
include subtransaction count as well when serialized, otherwise not.
The description of totalTxns also says that it doesn't include
subtransactions. So, I think updating rb->totalTxns here is wrong.

The attached patch should fix the above two comments. I think it
should be sufficient if we just update the stats after processing the
TXN. We need to ensure that don't count streamed transactions multiple
times. I have not tested the attached, can you please review/test it
and include it in the next set of patches if you agree with this
change.

oops, forgot to attach. Attaching now.

--
With Regards,
Amit Kapila.

Attachments:

fix_stats_1.patchapplication/octet-stream; name=fix_stats_1.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7899060f02f..a692030d6ba 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2054,17 +2054,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				rb->begin(rb, txn);
 		}
 
-		/*
-		 * Update total transaction count and total transaction bytes, if
-		 * transaction is streamed or spilled it will be updated while the
-		 * transaction gets spilled or streamed.
-		 */
-		if (!rb->streamBytes && !rb->spillBytes)
-		{
-			rb->totalTxns++;
-			rb->totalBytes += rb->size;
-		}
-
 		ReorderBufferIterTXNInit(rb, txn, &iterstate);
 		while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
 		{
@@ -2373,6 +2362,15 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		rb->totalBytes += rb->size;
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
@@ -3538,11 +3536,9 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	{
 		rb->spillCount += 1;
 		rb->spillBytes += size;
-		rb->totalBytes += size;
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
-		rb->totalTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3908,11 +3904,9 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 	rb->streamCount += 1;
 	rb->streamBytes += stream_bytes;
-	rb->totalBytes += stream_bytes;
 
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
-	rb->totalTxns += (txn_is_streamed) ? 0 : 1;
 
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
#47vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#45)
5 attachment(s)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 9:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 9, 2021 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

2.
@@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
rb->begin(rb, txn);
}

+ /*
+ * Update total transaction count and total transaction bytes, if
+ * transaction is streamed or spilled it will be updated while the
+ * transaction gets spilled or streamed.
+ */
+ if (!rb->streamBytes && !rb->spillBytes)
+ {
+ rb->totalTxns++;
+ rb->totalBytes += rb->size;
+ }

I think this will skip a transaction if it is interleaved between a
streaming transaction. Assume, two transactions t1 and t2. t1 sends
changes in multiple streams and t2 sends all changes in one go at
commit time. So, now, if t2 is interleaved between multiple streams
then I think the above won't count t2.

3.
@@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
{
rb->spillCount += 1;
rb->spillBytes += size;
+ rb->totalBytes += size;

/* don't consider already serialized transactions */
rb->spillTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+ rb->totalTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
}

We do serialize each subtransaction separately. So totalTxns will
include subtransaction count as well when serialized, otherwise not.
The description of totalTxns also says that it doesn't include
subtransactions. So, I think updating rb->totalTxns here is wrong.

The attached patch should fix the above two comments. I think it
should be sufficient if we just update the stats after processing the
TXN. We need to ensure that don't count streamed transactions multiple
times. I have not tested the attached, can you please review/test it
and include it in the next set of patches if you agree with this
change.

Thanks Amit for your Patch. I have merged your changes into my
patchset. I did not find any issues in my testing.
Thoughts?

Regards,
Vignesh

Attachments:

v5-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchDownload
From bb176d2399f4f550ceb0b733ee23d513cd835db9 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v5 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c           | 32 +++++++++++------------
 src/backend/replication/logical/logical.c | 13 ++++++---
 src/backend/replication/slot.c            |  7 ++++-
 src/backend/utils/adt/pgstatfuncs.c       |  2 +-
 src/include/pgstat.h                      | 15 ++++++-----
 5 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..1becff09d0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4f6e87f18d..68e210ce12 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,6 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
@@ -1790,9 +1791,15 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 182b15e3f2..521ba73614 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2328,7 +2328,7 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9a87e7cd88..2aeb3cded4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -393,7 +393,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -540,7 +540,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -917,13 +919,15 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
@@ -1027,10 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.25.1

v5-0002-Added-total-txns-and-total-txn-bytes-to-replicati.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From cf6fbb7a582f89390acc8c003269435ddefc4ce7 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:57:05 +0530
Subject: [PATCH v5 2/5] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 23 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               |  6 ++
 src/backend/replication/logical/logical.c     | 16 ++--
 .../replication/logical/reorderbuffer.c       | 12 +++
 src/backend/utils/adt/pgstatfuncs.c           |  8 +-
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 11 files changed, 159 insertions(+), 49 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587f61..d5e2012b38 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2716,6 +2716,29 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2ee0a..6d78b33590 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -875,6 +875,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1becff09d0..e5b1fb045e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1829,6 +1829,8 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5568,6 +5570,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5795,6 +5799,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210ce12..ed9a7c8489 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1776,20 +1776,22 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	PgStat_ReplSlotStats repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 &&
+		rb->totalBytes <= 0 && rb->totalTxns <=0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,6 +1800,8 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
@@ -1806,4 +1810,6 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..bc251adfda 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -2359,6 +2361,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73614..2680190a40 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2284,7 +2284,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2335,11 +2335,13 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f4957653ae..591753fe81 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c966c..6399f3feef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,8 +2068,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v5-0003-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Added-tests-for-verification-of-logical-replicati.patchDownload
From d971aa9ec544ce6614b8b369dbb342ed132a06a8 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v5 3/5] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |   2 +
 contrib/test_decoding/t/001_repl_stats.pl | 111 ++++++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..194f764d18
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,111 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+$node->safe_psql('postgres', q(CREATE FUNCTION wait_for_decode_stats(check_slot_name text) RETURNS void AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  txn_count bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT (total_txns > 0) INTO txn_count
+    FROM pg_stat_replication_slots WHERE slot_name=check_slot_name;
+
+    exit WHEN txn_count;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+    -- reset stats snapshot so we can test again
+    perform pg_stat_clear_snapshot();
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_decode_stats delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+));
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot3', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot4', 'test_decoding')");
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+
+# Wait for the statistics to be updated.
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot1')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot2')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot3')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot4')");
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t
+regression_slot4|t|t), 'check replication statistics are updated');
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.25.1

v5-0004-Handle-overwriting-of-replication-slot-statistic-.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Handle-overwriting-of-replication-slot-statistic-.patchDownload
From 51b4c60e7c47947bc08b72c822c8f37552aa7ad6 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 11:33:01 +0530
Subject: [PATCH v5 4/5] Handle overwriting of replication slot statistic
 issue.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, fixed it by writing the
replication slot count in statistics file and increasing
max_replication_slots if required at statup time base on the value
updated in the statistics file.
---
 src/backend/postmaster/pgstat.c | 36 +++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e5b1fb045e..d7923bfde4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3741,6 +3741,13 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 		(void) rc;				/* we'll check for error with ferror */
 	}
 
+	if (nReplSlotStats > 0)
+	{
+		fputc('N', fpout);
+		rc = fwrite(&nReplSlotStats, sizeof(nReplSlotStats), 1, fpout);
+		(void) rc;				/* we'll check for error with ferror */
+	}
+
 	/*
 	 * Write replication slot stats struct
 	 */
@@ -3960,6 +3967,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	bool		found;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			i;
+	int			replslotcount;
 
 	/*
 	 * The tables will live in pgStatLocalContext.
@@ -4196,6 +4204,34 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 
 				break;
 
+				/*
+				 * 'N'	Indicates the number of replication slot statistics
+				 * present.
+				 */
+			case 'N':
+				if (fread(&replslotcount, 1, sizeof(replslotcount), fpin)
+					!=sizeof(replslotcount))
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("corrupted statistics file \"%s\"",
+									statfile)));
+					goto done;
+				}
+
+				if (replslotcount > max_replication_slots)
+				{
+					max_replication_slots = replslotcount;
+					replSlotStats = repalloc(replSlotStats,
+											 max_replication_slots
+											 * sizeof(PgStat_ReplSlotStats));
+					/*
+					 * Set the same reset timestamp for all replication slots too.
+					 */
+					for (i = 0; i < max_replication_slots; i++)
+						replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
+				}
+				break;
+
 				/*
 				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
 				 * slot follows.
-- 
2.25.1

v5-0005-Test-where-there-are-more-replication-slot-statis.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Test-where-there-are-more-replication-slot-statis.patchDownload
From 03250a6bdeb93979b069ab0ab063cd6afbc55118 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 7 Apr 2021 13:07:26 +0530
Subject: [PATCH v5 5/5] Test where there are more replication slot statistics
 that max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 24 +++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 194f764d18..932acee380 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -5,7 +5,7 @@ use warnings;
 use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 3;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -100,12 +100,32 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after restart.
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.25.1

#48Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#44)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 7:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

IIUC there are two problems in the case where the drop message is lost:

1. Writing beyond the end of replSlotStats.
This can happen if after restarting the number of slots whose stats
are stored in the stats file exceeds max_replication_slots. Vignesh's
patch addresses this problem.

2. The stats for the new slot are not recorded.
If the stats for already-dropped slots remain in replSlotStats, the
stats for the new slot cannot be registered due to the full of
replSlotStats. This can happen even when after restarting the number
of slots whose stats are stored in the stat file does NOT exceed
max_replication_slots as well as even during the server running. The
patch doesn’t address this problem. (If this happens, we will have to
reset all slot stats since pg_stat_reset_replication_slot() cannot
remove the slot stats with the non-existing name).

I think we can use HTAB to store slot stats and have
pg_stat_get_replication_slot() inquire about stats by the slot name,
resolving both problems. By using HTAB we're no longer concerned about
the problem of writing stats beyond the end of the replSlotStats
array. Instead, we have to consider how and when to clean up the stats
for already-dropped slots. We can have the startup process send slot
names at startup time, which borrows the idea proposed by Amit. But
maybe we need to consider the case again where the message from the
startup process is lost? Another idea would be to have
pgstat_vacuum_stat() check the existing slots and call
pgstat_report_replslot_drop() if the slot in the stats file doesn't
exist. That way, we can continuously check the stats for
already-dropped slots.

Agreed, I think checking periodically via pgstat_vacuum_stat is a
better idea then sending once at start up time. I also think using
slot_name is better than using 'idx' (index in
ReplicationSlotCtl->replication_slots) in this scheme because even
after startup 'idx' changes we will be able to drop the dead slot.

I've written a PoC patch for the above idea; using HTAB and cleaning
up slot stats at pgstat_vacuum_stat(). The patch can be applied on top
of 0001 patch Vignesh proposed before[1].

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

Please note that this cannot resolve the problem of ending up
accumulating the stats to the old slot if the slot is re-created with
the same name and the drop message is lost. To deal with this problem
I think we would need to use something unique identifier for each slot
instead of slot name.

Right, we can probably write it in comments and or docs about this
caveat and the user can probably use pg_stat_reset_replication_slot
for such slots.

--
With Regards,
Amit Kapila.

#49Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#47)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 1:06 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks Amit for your Patch. I have merged your changes into my
patchset. I did not find any issues in my testing.
Thoughts?

0001
------
  PgStat_Counter m_stream_bytes;
+ PgStat_Counter m_total_txns;
+ PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;

..
..

+ PgStat_Counter total_txns;
+ PgStat_Counter total_bytes;
  TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;

Doesn't this change belong to the second patch?

--
With Regards,
Amit Kapila.

#50vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#49)
5 attachment(s)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 6:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 10, 2021 at 1:06 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks Amit for your Patch. I have merged your changes into my
patchset. I did not find any issues in my testing.
Thoughts?

0001
------
PgStat_Counter m_stream_bytes;
+ PgStat_Counter m_total_txns;
+ PgStat_Counter m_total_bytes;
} PgStat_MsgReplSlot;

..
..

+ PgStat_Counter total_txns;
+ PgStat_Counter total_bytes;
TimestampTz stat_reset_timestamp;
} PgStat_ReplSlotStats;

Doesn't this change belong to the second patch?

Missed it while splitting the patches, it is fixed in the attached patch,

Regards,
Vignesh

Attachments:

v6-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchDownload
From 7b9cbe94ee1398b149ccd20a10d97c6fd7920442 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v6 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c           | 32 +++++++++++------------
 src/backend/replication/logical/logical.c | 13 ++++++---
 src/backend/replication/slot.c            |  7 ++++-
 src/backend/utils/adt/pgstatfuncs.c       |  2 +-
 src/include/pgstat.h                      | 11 +++-----
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..666ce95d08 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4f6e87f18d..68e210ce12 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,6 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
@@ -1790,9 +1791,15 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 182b15e3f2..521ba73614 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2328,7 +2328,7 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9a87e7cd88..8e11215058 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -393,7 +393,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -540,7 +540,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
@@ -1027,10 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.25.1

v6-0002-Added-total-txns-and-total-txn-bytes-to-replicati.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From 7e74a6645eee85035c96547bdc11527208388666 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:57:05 +0530
Subject: [PATCH v6 2/5] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 23 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               |  6 ++
 src/backend/replication/logical/logical.c     | 16 ++--
 .../replication/logical/reorderbuffer.c       | 12 +++
 src/backend/utils/adt/pgstatfuncs.c           |  8 +-
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          |  4 +
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 12 files changed, 163 insertions(+), 49 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587f61..d5e2012b38 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2716,6 +2716,29 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2ee0a..6d78b33590 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -875,6 +875,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95d08..e1ec7d8b7d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1829,6 +1829,8 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5568,6 +5570,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5795,6 +5799,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210ce12..ed9a7c8489 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1776,20 +1776,22 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	PgStat_ReplSlotStats repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 &&
+		rb->totalBytes <= 0 && rb->totalTxns <=0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,6 +1800,8 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
@@ -1806,4 +1810,6 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..bc251adfda 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -2359,6 +2361,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73614..2680190a40 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2284,7 +2284,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2335,11 +2335,13 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f4957653ae..591753fe81 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215058..2aeb3cded4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c966c..6399f3feef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,8 +2068,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v6-0003-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Added-tests-for-verification-of-logical-replicati.patchDownload
From c3a547f79f99fde9ff7074912fe8ecd80e9b0da7 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v6 3/5] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |   2 +
 contrib/test_decoding/t/001_repl_stats.pl | 111 ++++++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..194f764d18
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,111 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+$node->safe_psql('postgres', q(CREATE FUNCTION wait_for_decode_stats(check_slot_name text) RETURNS void AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  txn_count bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT (total_txns > 0) INTO txn_count
+    FROM pg_stat_replication_slots WHERE slot_name=check_slot_name;
+
+    exit WHEN txn_count;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+    -- reset stats snapshot so we can test again
+    perform pg_stat_clear_snapshot();
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_decode_stats delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+));
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot3', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot4', 'test_decoding')");
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+
+# Wait for the statistics to be updated.
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot1')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot2')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot3')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot4')");
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t
+regression_slot4|t|t), 'check replication statistics are updated');
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.25.1

v6-0004-Handle-overwriting-of-replication-slot-statistic-.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Handle-overwriting-of-replication-slot-statistic-.patchDownload
From 740fc090763da20ba7988683d5481feed35b6545 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 11:33:01 +0530
Subject: [PATCH v6 4/5] Handle overwriting of replication slot statistic
 issue.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, fixed it by writing the
replication slot count in statistics file and increasing
max_replication_slots if required at statup time base on the value
updated in the statistics file.
---
 src/backend/postmaster/pgstat.c | 36 +++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..066829234b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3741,6 +3741,13 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 		(void) rc;				/* we'll check for error with ferror */
 	}
 
+	if (nReplSlotStats > 0)
+	{
+		fputc('N', fpout);
+		rc = fwrite(&nReplSlotStats, sizeof(nReplSlotStats), 1, fpout);
+		(void) rc;				/* we'll check for error with ferror */
+	}
+
 	/*
 	 * Write replication slot stats struct
 	 */
@@ -3960,6 +3967,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	bool		found;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			i;
+	int			replslotcount;
 
 	/*
 	 * The tables will live in pgStatLocalContext.
@@ -4196,6 +4204,34 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 
 				break;
 
+				/*
+				 * 'N'	Indicates the number of replication slot statistics
+				 * present.
+				 */
+			case 'N':
+				if (fread(&replslotcount, 1, sizeof(replslotcount), fpin)
+					!=sizeof(replslotcount))
+				{
+					ereport(pgStatRunningInCollector ? LOG : WARNING,
+							(errmsg("corrupted statistics file \"%s\"",
+									statfile)));
+					goto done;
+				}
+
+				if (replslotcount > max_replication_slots)
+				{
+					max_replication_slots = replslotcount;
+					replSlotStats = repalloc(replSlotStats,
+											 max_replication_slots
+											 * sizeof(PgStat_ReplSlotStats));
+					/*
+					 * Set the same reset timestamp for all replication slots too.
+					 */
+					for (i = 0; i < max_replication_slots; i++)
+						replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
+				}
+				break;
+
 				/*
 				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
 				 * slot follows.
-- 
2.25.1

v6-0005-Test-where-there-are-more-replication-slot-statis.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Test-where-there-are-more-replication-slot-statis.patchDownload
From f0453e6baefd0cf1ba0cd2537fdb46ac9371ce6b Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 7 Apr 2021 13:07:26 +0530
Subject: [PATCH v6 5/5] Test where there are more replication slot statistics
 that max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 24 +++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 194f764d18..932acee380 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -5,7 +5,7 @@ use warnings;
 use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 3;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -100,12 +100,32 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after restart.
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.25.1

#51vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#43)
Re: Replication slot stats misgivings

Thanks for the comments.

On Fri, Apr 9, 2021 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 7, 2021 at 2:51 PM vignesh C <vignesh21@gmail.com> wrote:

That is not required, I have modified it.
Attached v4 patch has the fixes for the same.

Few comments:

0001
------
1. The first patch includes changing char datatype to NameData
datatype for slotname. I feel this can be a separate patch from adding
new stats in the view. I think we can also move the change related to
moving stats to a structure rather than sending them individually in
the same patch.

I have split the patch as suggested.

2.
@@ -2051,6 +2054,17 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
rb->begin(rb, txn);
}

+ /*
+ * Update total transaction count and total transaction bytes, if
+ * transaction is streamed or spilled it will be updated while the
+ * transaction gets spilled or streamed.
+ */
+ if (!rb->streamBytes && !rb->spillBytes)
+ {
+ rb->totalTxns++;
+ rb->totalBytes += rb->size;
+ }

I think this will skip a transaction if it is interleaved between a
streaming transaction. Assume, two transactions t1 and t2. t1 sends
changes in multiple streams and t2 sends all changes in one go at
commit time. So, now, if t2 is interleaved between multiple streams
then I think the above won't count t2.

Modified it.

3.
@@ -3524,9 +3538,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)
{
rb->spillCount += 1;
rb->spillBytes += size;
+ rb->totalBytes += size;

/* don't consider already serialized transactions */
rb->spillTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+ rb->totalTxns += (rbtxn_is_serialized(txn) ||
rbtxn_is_serialized_clear(txn)) ? 0 : 1;
}

We do serialize each subtransaction separately. So totalTxns will
include subtransaction count as well when serialized, otherwise not.
The description of totalTxns also says that it doesn't include
subtransactions. So, I think updating rb->totalTxns here is wrong.

Modified it.

0002
-----
1.
+$node->safe_psql('postgres',
+ "SELECT data FROM pg_logical_slot_get_changes('regression_slot2',
NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

The indentation of the second SELECT seems to bit off.

Modified it.
These comments are fixed in the patch available at [1]/messages/by-id/CALDaNm1A=bjSrQjBNwNsOtTig+6pZpunmAj_P7Au0H0XjtvCyA@mail.gmail.com.

[1]: /messages/by-id/CALDaNm1A=bjSrQjBNwNsOtTig+6pZpunmAj_P7Au0H0XjtvCyA@mail.gmail.com
/messages/by-id/CALDaNm1A=bjSrQjBNwNsOtTig+6pZpunmAj_P7Au0H0XjtvCyA@mail.gmail.com

Regards,
Vignesh

#52Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#48)
5 attachment(s)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 10, 2021 at 7:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

IIUC there are two problems in the case where the drop message is lost:

1. Writing beyond the end of replSlotStats.
This can happen if after restarting the number of slots whose stats
are stored in the stats file exceeds max_replication_slots. Vignesh's
patch addresses this problem.

2. The stats for the new slot are not recorded.
If the stats for already-dropped slots remain in replSlotStats, the
stats for the new slot cannot be registered due to the full of
replSlotStats. This can happen even when after restarting the number
of slots whose stats are stored in the stat file does NOT exceed
max_replication_slots as well as even during the server running. The
patch doesn’t address this problem. (If this happens, we will have to
reset all slot stats since pg_stat_reset_replication_slot() cannot
remove the slot stats with the non-existing name).

I think we can use HTAB to store slot stats and have
pg_stat_get_replication_slot() inquire about stats by the slot name,
resolving both problems. By using HTAB we're no longer concerned about
the problem of writing stats beyond the end of the replSlotStats
array. Instead, we have to consider how and when to clean up the stats
for already-dropped slots. We can have the startup process send slot
names at startup time, which borrows the idea proposed by Amit. But
maybe we need to consider the case again where the message from the
startup process is lost? Another idea would be to have
pgstat_vacuum_stat() check the existing slots and call
pgstat_report_replslot_drop() if the slot in the stats file doesn't
exist. That way, we can continuously check the stats for
already-dropped slots.

Thanks for your comments.

Agreed, I think checking periodically via pgstat_vacuum_stat is a
better idea then sending once at start up time. I also think using
slot_name is better than using 'idx' (index in
ReplicationSlotCtl->replication_slots) in this scheme because even
after startup 'idx' changes we will be able to drop the dead slot.

I've written a PoC patch for the above idea; using HTAB and cleaning
up slot stats at pgstat_vacuum_stat(). The patch can be applied on top
of 0001 patch Vignesh proposed before[1].

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests. In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me. Since my patch resolved
the issue of writing stats beyond the end of the array, I've removed
the patch that writes the number of stats into the stats file
(v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).

Apart from the above updates, the
contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
function during testing but I think we can use poll_query_until()
instead. Also, I think we can merge 0004 and 0005 patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v7-0005-Test-where-there-are-more-replication-slot-statis.patchapplication/octet-stream; name=v7-0005-Test-where-there-are-more-replication-slot-statis.patchDownload
From 7cc4ff5bef751b5b3c0e6a5a320617382ac0ad2f Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 7 Apr 2021 13:07:26 +0530
Subject: [PATCH v7 5/5] Test where there are more replication slot statistics
 that max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 194f764d18..3433461bf0 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -5,7 +5,7 @@ use warnings;
 use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 3;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -100,12 +100,31 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after restart.
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.27.0

v7-0004-Added-tests-for-verification-of-logical-replicati.patchapplication/octet-stream; name=v7-0004-Added-tests-for-verification-of-logical-replicati.patchDownload
From 2e3d70dc3c679b2f795aeba7faee21c91a66d899 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v7 4/5] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |   2 +
 contrib/test_decoding/t/001_repl_stats.pl | 111 ++++++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..194f764d18
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,111 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use File::Path qw(rmtree);
+use PostgresNode;
+use TestLib;
+use Test::More tests => 2;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+$node->safe_psql('postgres', q(CREATE FUNCTION wait_for_decode_stats(check_slot_name text) RETURNS void AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  txn_count bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+
+    -- check to see if all updates have been reset/updated
+    SELECT (total_txns > 0) INTO txn_count
+    FROM pg_stat_replication_slots WHERE slot_name=check_slot_name;
+
+    exit WHEN txn_count;
+
+    -- wait a little
+    perform pg_sleep_for('100 milliseconds');
+
+    -- reset stats snapshot so we can test again
+    perform pg_stat_clear_snapshot();
+
+  END LOOP;
+
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE LOG 'wait_for_decode_stats delayed % seconds',
+    extract(epoch from clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+));
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot3', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot4', 'test_decoding')");
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+
+# Wait for the statistics to be updated.
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot1')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot2')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot3')");
+$node->safe_psql('postgres', "SELECT wait_for_decode_stats('regression_slot4')");
+
+# Test to verify replication statistics data is updated in
+# pg_stat_replication_slots statistics view.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t
+regression_slot4|t|t), 'check replication statistics are updated');
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "DROP FUNCTION wait_for_decode_stats(TEXT)");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.27.0

v7-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchapplication/octet-stream; name=v7-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchDownload
From 210f5b87f8a6b1daf910270a3fe308c04dfefeec Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v7 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c           | 32 +++++++++++------------
 src/backend/replication/logical/logical.c | 13 ++++++---
 src/backend/replication/slot.c            |  7 ++++-
 src/backend/utils/adt/pgstatfuncs.c       |  2 +-
 src/include/pgstat.h                      | 11 +++-----
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..666ce95d08 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4f6e87f18d..68e210ce12 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,6 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
@@ -1790,9 +1791,15 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 182b15e3f2..521ba73614 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2328,7 +2328,7 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9a87e7cd88..8e11215058 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -393,7 +393,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -540,7 +540,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
@@ -1027,10 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.27.0

v7-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patchapplication/octet-stream; name=v7-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From addfde9c14329cc4693631b67e2a1b6bd8e56dd4 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:57:05 +0530
Subject: [PATCH v7 3/5] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 23 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               |  6 ++
 src/backend/replication/logical/logical.c     | 16 ++--
 .../replication/logical/reorderbuffer.c       | 12 +++
 src/backend/utils/adt/pgstatfuncs.c           | 38 +++++----
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          |  4 +
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 12 files changed, 180 insertions(+), 62 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587f61..d5e2012b38 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2716,6 +2716,29 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1bceb25571..0636a5d5f1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -973,6 +973,8 @@ CREATE VIEW pg_replication_slots AS
 CREATE VIEW pg_stat_replication_slots AS
     SELECT
             s.slot_name,
+	    s.total_txns,
+            s.total_bytes,
             s.spill_txns,
             s.spill_count,
             s.spill_bytes,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 142567e654..0b3b8ae90b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1823,6 +1823,8 @@ pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5545,6 +5547,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		Assert(slotent);
 
 		/* Update the replication slot statistics */
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 		slotent->spill_txns += msg->m_spill_txns;
 		slotent->spill_count += msg->m_spill_count;
 		slotent->spill_bytes += msg->m_spill_bytes;
@@ -5792,6 +5796,8 @@ static void
 pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
 	slotent->spill_txns = 0;
 	slotent->spill_count = 0;
 	slotent->spill_bytes = 0;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d66846faab..73ef806530 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1776,20 +1776,22 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 &&
+		rb->totalBytes <= 0 && rb->totalTxns <=0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,6 +1800,8 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
@@ -1806,4 +1810,6 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..bc251adfda 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -2359,6 +2361,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 569abce27e..be069cf54c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2309,7 +2309,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	text *slotname_text = PG_GETARG_TEXT_P(0);
 	NameData slotname;
 	TupleDesc	tupdesc;
@@ -2325,19 +2325,23 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
 					   TEXTOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "total_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "total_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "spill_count",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "spill_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 	BlessTupleDesc(tupdesc);
 
@@ -2348,17 +2352,19 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 
 	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
-	values[1] = Int64GetDatum(slotent->spill_txns);
-	values[2] = Int64GetDatum(slotent->spill_count);
-	values[3] = Int64GetDatum(slotent->spill_bytes);
-	values[4] = Int64GetDatum(slotent->stream_txns);
-	values[5] = Int64GetDatum(slotent->stream_count);
-	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[1] = Int64GetDatum(slotent->total_txns);
+	values[2] = Int64GetDatum(slotent->total_bytes);
+	values[3] = Int64GetDatum(slotent->spill_txns);
+	values[4] = Int64GetDatum(slotent->spill_count);
+	values[5] = Int64GetDatum(slotent->spill_bytes);
+	values[6] = Int64GetDatum(slotent->stream_txns);
+	values[7] = Int64GetDatum(slotent->stream_count);
+	values[8] = Int64GetDatum(slotent->stream_bytes);
 
 	if (slotent->stat_reset_timestamp == 0)
-		nulls[7] = true;
+		nulls[9] = true;
 	else
-		values[7] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6c9521603f..b9156b873e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'text',
-  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,total_txns,total_bytes,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c2d192fc55..e9e16e5e2d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotEntry
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotEntry;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index eb741b9ff1..4380577b7d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2062,6 +2062,8 @@ pg_stat_replication| SELECT s.pid,
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
+    s.total_txns,
+    s.total_bytes,
     s.spill_txns,
     s.spill_count,
     s.spill_bytes,
@@ -2070,7 +2072,7 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_bytes,
     s.stats_reset
    FROM pg_replication_slots r,
-    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, total_txns, total_bytes, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
   WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
-- 
2.27.0

v7-0002-Use-HTAB-for-replication-slot-statistics.patchapplication/octet-stream; name=v7-0002-Use-HTAB-for-replication-slot-statistics.patchDownload
From c0f892fbde509b16cfd4dee03a770ee9899f0e63 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 12 Apr 2021 10:31:52 +0900
Subject: [PATCH v7 2/5] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where drop-slot-stats message
is lost: 1) the stats for the new slot are not recorded and 2) writing
beyond the end of the array when after restring the number of slots
whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes to use HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 src/backend/catalog/system_views.sql      |  26 ++-
 src/backend/postmaster/pgstat.c           | 261 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 127 ++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 9 files changed, 247 insertions(+), 220 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2ee0a..1bceb25571 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,18 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -982,6 +970,20 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95d08..142567e654 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_ReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hash table. We do this check
+	 * when replSlotStats has more than max_replication_slots entries, i.g,
+	 * when there are stats for the already-dropped slot, to avoid frequent
+	 * call SearchNamedReplicationSlot() which acquires LWLock.
+	 */
+	if (replSlotStats && hash_get_num_entries(replSlotStats) > max_replication_slots)
+	{
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2870,17 +2864,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_ReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_ReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3652,7 +3648,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3742,11 +3737,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_ReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3973,12 +3974,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4003,12 +3998,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4195,21 +4184,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_ReplSlotEntry slotstats;
+				PgStat_ReplSlotEntry *slotent;
+
+				if (fread(&slotstats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotstats.slotname, true);
+				memcpy(slotent, &slotstats, sizeof(PgStat_ReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4422,7 +4417,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_ReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4551,12 +4546,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4763,7 +4758,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5187,20 +5181,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_ReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_ReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5208,11 +5208,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5530,44 +5530,27 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_ReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
 	}
 }
 
@@ -5745,57 +5728,77 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_ReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_ReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_ReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_ReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
+
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_ReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210ce12..d66846faab 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..f75e7e95f9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_ReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same is
+	 * created, the stats will be accumulated into the old slots since we
+	 * use the slot name as the key. In that case, user can reset the particular
+	 * stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73614..569abce27e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,71 +2305,61 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 8
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
-	MemoryContextSwitchTo(oldcontext);
-
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
-	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
-		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+	if (!slotent)
+		PG_RETURN_NULL();
 
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-	}
+	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
 
-	tuplestore_donestoring(tupstore);
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[7] = true;
+	else
+		values[7] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f4957653ae..6c9521603f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,14 +5311,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215058..c2d192fc55 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -915,7 +915,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -925,7 +925,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_ReplSlotEntry;
 
 
 /*
@@ -1027,7 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1125,7 +1125,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_ReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c966c..eb741b9ff1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2069,7 +2069,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_count,
     s.stream_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.27.0

#53Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#52)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes). So probably
keeping these new counters at the end makes more sense to me.

Since my patch resolved
the issue of writing stats beyond the end of the array, I've removed
the patch that writes the number of stats into the stats file
(v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).

Okay, but I think it might be better to keep 0001, 0002, 0003 as
Vignesh had because those are agreed upon changes and are
straightforward. We can push those and then further review HTAB
implementation and also see if Andres has any suggestions on the same.

Apart from the above updates, the
contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
function during testing but I think we can use poll_query_until()
instead.

+1. Can you please change it in the next version?

--
With Regards,
Amit Kapila.

#54vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#2)
1 attachment(s)
Re: Replication slot stats misgivings

On Sat, Mar 20, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

I felt we can update the replication slot statistics data each time we
spill/stream the transaction data instead of accumulating the
statistics and updating at the end. I have tried this in the attached
patch and the statistics data were getting updated.
Thoughts?

Regards,
Vignesh

Attachments:

Update_replication_slot_statistics_after_spill_stream.patchtext/x-patch; charset=US-ASCII; name=Update_replication_slot_statistics_after_spill_stream.patchDownload
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 99f2afc73c..9905e2b8ad 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -748,7 +748,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 /*
@@ -830,7 +830,7 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 
@@ -887,7 +887,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/* update the decoding stats */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 954dbb5554..00c481e1d2 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -219,6 +219,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder->apply_truncate = truncate_cb_wrapper;
 	ctx->reorder->commit = commit_cb_wrapper;
 	ctx->reorder->message = message_cb_wrapper;
+	namestrcpy(&ctx->reorder->slotname, NameStr(ctx->slot->data.name));
 
 	/*
 	 * To support streaming, we require start/stop/abort/commit/change
@@ -1791,15 +1792,14 @@ ResetLogicalStreamingState(void)
  * Report stats for a slot.
  */
 void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)
 {
-	ReorderBuffer *rb = ctx->reorder;
-
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
 	 * time the stats has been sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->spillCount <=0 &&
+		rb->streamBytes <= 0 && rb->streamCount <=0)
 		return;
 
 	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
@@ -1811,7 +1811,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
+	pgstat_report_replslot(NameStr(rb->slotname),
 						   rb->spillTxns, rb->spillCount, rb->spillBytes,
 						   rb->streamTxns, rb->streamCount, rb->streamBytes);
 	rb->spillTxns = 0;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 92c7fa7993..b60a864004 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3528,6 +3528,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+		UpdateDecodingStats(rb);
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3897,6 +3898,8 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
 
+	UpdateDecodingStats(rb);
+
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
 	Assert(txn->nentries_mem == 0);
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 72f049b347..71d93058d2 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -139,6 +139,6 @@ extern bool filter_prepare_cb_wrapper(LogicalDecodingContext *ctx,
 									  TransactionId xid, const char *gid);
 extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id);
 extern void ResetLogicalStreamingState(void);
-extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
+extern void UpdateDecodingStats(ReorderBuffer *rb);
 
 #endif
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 6c9f2c6c77..bcc4788480 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,8 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	NameData	slotname;		/* Slot name for updating statistics */
 };
 
 
#55Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#53)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

Okay.

So probably
keeping these new counters at the end makes more sense to me.

But I think all of those counters are new for users since
pg_stat_replication_slots view will be introduced to PG14, no?

Since my patch resolved
the issue of writing stats beyond the end of the array, I've removed
the patch that writes the number of stats into the stats file
(v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).

Okay, but I think it might be better to keep 0001, 0002, 0003 as
Vignesh had because those are agreed upon changes and are
straightforward. We can push those and then further review HTAB
implementation and also see if Andres has any suggestions on the same.

Makes sense. Maybe it should have written my patch as 0004 (i.g.,
applied on top of the patch that adds total_txn and tota_bytes).

Apart from the above updates, the
contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
function during testing but I think we can use poll_query_until()
instead.

+1. Can you please change it in the next version?

Sure, I'll update the patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#56Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#55)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

Okay.

So probably
keeping these new counters at the end makes more sense to me.

But I think all of those counters are new for users since
pg_stat_replication_slots view will be introduced to PG14, no?

Right, I was referring to total_txns and total_bytes attributes. I
think keeping them at end after spill and stream counters should be
okay.

--
With Regards,
Amit Kapila.

#57vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#55)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

Okay.

So probably
keeping these new counters at the end makes more sense to me.

But I think all of those counters are new for users since
pg_stat_replication_slots view will be introduced to PG14, no?

Since my patch resolved
the issue of writing stats beyond the end of the array, I've removed
the patch that writes the number of stats into the stats file
(v6-0004-Handle-overwriting-of-replication-slot-statistic-.patch).

Okay, but I think it might be better to keep 0001, 0002, 0003 as
Vignesh had because those are agreed upon changes and are
straightforward. We can push those and then further review HTAB
implementation and also see if Andres has any suggestions on the same.

Makes sense. Maybe it should have written my patch as 0004 (i.g.,
applied on top of the patch that adds total_txn and tota_bytes).

Apart from the above updates, the
contrib/test_decoding/001_repl_stats.pl add wait_for_decode_stats()
function during testing but I think we can use poll_query_until()
instead.

+1. Can you please change it in the next version?

Sure, I'll update the patches.

I had started working on poll_query_until comment, I will test and
post a patch for that comment shortly.

Regards,
Vignesh

#58Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#50)
Re: Replication slot stats misgivings

On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.

<entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.

Can we slightly extend it to say something like: Note that this
includes the bytes streamed and or spilled. Similarly, we can extend
it for total_txns.

--
With Regards,
Amit Kapila.

#59Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#56)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right? The doc says "Amount of
decoded transactions data sent to the decoding output plugin while
decoding the changes from WAL for this slot" but I think we also send
decoded data that had been spilled to the decoding output plugin.

Okay.

So probably
keeping these new counters at the end makes more sense to me.

But I think all of those counters are new for users since
pg_stat_replication_slots view will be introduced to PG14, no?

Right, I was referring to total_txns and total_bytes attributes. I
think keeping them at end after spill and stream counters should be
okay.

Okay, understood.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#60vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#58)
5 attachment(s)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.

<entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.

Can we slightly extend it to say something like: Note that this
includes the bytes streamed and or spilled. Similarly, we can extend
it for total_txns.

Thanks for the comments, the comments are fixed in the v8 patch attached.
Thoughts?

Regards,
Vignesh

Attachments:

v8-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchDownload
From b3288d575d9b8c26381fa8da773960d16ae2a1f3 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v8 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c           | 32 +++++++++++------------
 src/backend/replication/logical/logical.c | 13 ++++++---
 src/backend/replication/slot.c            |  7 ++++-
 src/backend/utils/adt/pgstatfuncs.c       |  2 +-
 src/include/pgstat.h                      | 11 +++-----
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..666ce95d08 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4f6e87f18d..68e210ce12 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,6 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
@@ -1790,9 +1791,15 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 182b15e3f2..521ba73614 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2328,7 +2328,7 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9a87e7cd88..8e11215058 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -393,7 +393,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -540,7 +540,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
@@ -1027,10 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.25.1

v8-0002-Use-HTAB-for-replication-slot-statistics.patchtext/x-patch; charset=US-ASCII; name=v8-0002-Use-HTAB-for-replication-slot-statistics.patchDownload
From 1b7771c1c6d53fd76b59dd3ff7687c96d1086598 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 12 Apr 2021 10:31:52 +0900
Subject: [PATCH v8 2/5] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where drop-slot-stats message
is lost: 1) the stats for the new slot are not recorded and 2) writing
beyond the end of the array when after restring the number of slots
whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes to use HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 src/backend/catalog/system_views.sql      |  26 ++-
 src/backend/postmaster/pgstat.c           | 261 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 127 ++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 9 files changed, 247 insertions(+), 220 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2ee0a..1bceb25571 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,18 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -982,6 +970,20 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95d08..142567e654 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_ReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hash table. We do this check
+	 * when replSlotStats has more than max_replication_slots entries, i.g,
+	 * when there are stats for the already-dropped slot, to avoid frequent
+	 * call SearchNamedReplicationSlot() which acquires LWLock.
+	 */
+	if (replSlotStats && hash_get_num_entries(replSlotStats) > max_replication_slots)
+	{
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2870,17 +2864,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_ReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_ReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3652,7 +3648,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3742,11 +3737,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_ReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3973,12 +3974,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4003,12 +3998,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4195,21 +4184,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_ReplSlotEntry slotstats;
+				PgStat_ReplSlotEntry *slotent;
+
+				if (fread(&slotstats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotstats.slotname, true);
+				memcpy(slotent, &slotstats, sizeof(PgStat_ReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4422,7 +4417,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_ReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4551,12 +4546,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4763,7 +4758,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5187,20 +5181,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_ReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_ReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5208,11 +5208,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5530,44 +5530,27 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_ReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
 	}
 }
 
@@ -5745,57 +5728,77 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_ReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_ReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_ReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_ReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
+
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_ReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210ce12..d66846faab 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..f75e7e95f9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_ReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same is
+	 * created, the stats will be accumulated into the old slots since we
+	 * use the slot name as the key. In that case, user can reset the particular
+	 * stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73614..569abce27e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,71 +2305,61 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 8
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
-	MemoryContextSwitchTo(oldcontext);
-
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
-	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
-		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+	if (!slotent)
+		PG_RETURN_NULL();
 
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-	}
+	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
 
-	tuplestore_donestoring(tupstore);
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[7] = true;
+	else
+		values[7] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f4957653ae..6c9521603f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,14 +5311,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215058..c2d192fc55 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -915,7 +915,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -925,7 +925,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_ReplSlotEntry;
 
 
 /*
@@ -1027,7 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1125,7 +1125,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_ReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c966c..eb741b9ff1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2069,7 +2069,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_count,
     s.stream_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v8-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patchtext/x-patch; charset=US-ASCII; name=v8-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From 4526498234141f19fd54bed7e4aa07ff783a139d Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:57:05 +0530
Subject: [PATCH v8 3/5] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 25 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               |  6 ++
 src/backend/replication/logical/logical.c     | 16 ++--
 .../replication/logical/reorderbuffer.c       | 12 +++
 src/backend/utils/adt/pgstatfuncs.c           | 38 +++++----
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          |  4 +
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 12 files changed, 182 insertions(+), 62 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587f61..25024dfc7c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2716,6 +2716,31 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding. Note that
+        this includes the data streamed and or spilled. 
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1bceb25571..0636a5d5f1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -973,6 +973,8 @@ CREATE VIEW pg_replication_slots AS
 CREATE VIEW pg_stat_replication_slots AS
     SELECT
             s.slot_name,
+	    s.total_txns,
+            s.total_bytes,
             s.spill_txns,
             s.spill_count,
             s.spill_bytes,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 142567e654..0b3b8ae90b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1823,6 +1823,8 @@ pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5545,6 +5547,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		Assert(slotent);
 
 		/* Update the replication slot statistics */
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 		slotent->spill_txns += msg->m_spill_txns;
 		slotent->spill_count += msg->m_spill_count;
 		slotent->spill_bytes += msg->m_spill_bytes;
@@ -5792,6 +5796,8 @@ static void
 pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
 	slotent->spill_txns = 0;
 	slotent->spill_count = 0;
 	slotent->spill_bytes = 0;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d66846faab..73ef806530 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1776,20 +1776,22 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 &&
+		rb->totalBytes <= 0 && rb->totalTxns <=0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,6 +1800,8 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
@@ -1806,4 +1810,6 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..bc251adfda 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -2359,6 +2361,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 569abce27e..be069cf54c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2309,7 +2309,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	text *slotname_text = PG_GETARG_TEXT_P(0);
 	NameData slotname;
 	TupleDesc	tupdesc;
@@ -2325,19 +2325,23 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
 					   TEXTOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "total_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "total_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "spill_count",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "spill_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 	BlessTupleDesc(tupdesc);
 
@@ -2348,17 +2352,19 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 
 	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
-	values[1] = Int64GetDatum(slotent->spill_txns);
-	values[2] = Int64GetDatum(slotent->spill_count);
-	values[3] = Int64GetDatum(slotent->spill_bytes);
-	values[4] = Int64GetDatum(slotent->stream_txns);
-	values[5] = Int64GetDatum(slotent->stream_count);
-	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[1] = Int64GetDatum(slotent->total_txns);
+	values[2] = Int64GetDatum(slotent->total_bytes);
+	values[3] = Int64GetDatum(slotent->spill_txns);
+	values[4] = Int64GetDatum(slotent->spill_count);
+	values[5] = Int64GetDatum(slotent->spill_bytes);
+	values[6] = Int64GetDatum(slotent->stream_txns);
+	values[7] = Int64GetDatum(slotent->stream_count);
+	values[8] = Int64GetDatum(slotent->stream_bytes);
 
 	if (slotent->stat_reset_timestamp == 0)
-		nulls[7] = true;
+		nulls[9] = true;
 	else
-		values[7] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6c9521603f..b9156b873e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'text',
-  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,total_txns,total_bytes,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c2d192fc55..e9e16e5e2d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotEntry
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotEntry;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index eb741b9ff1..4380577b7d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2062,6 +2062,8 @@ pg_stat_replication| SELECT s.pid,
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
+    s.total_txns,
+    s.total_bytes,
     s.spill_txns,
     s.spill_count,
     s.spill_bytes,
@@ -2070,7 +2072,7 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_bytes,
     s.stats_reset
    FROM pg_replication_slots r,
-    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, total_txns, total_bytes, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
   WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
-- 
2.25.1

v8-0004-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v8-0004-Added-tests-for-verification-of-logical-replicati.patchDownload
From 49f7f534e3cc474cbc1c420fdba28972ec785aca Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v8 4/5] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |  2 +
 contrib/test_decoding/t/001_repl_stats.pl | 84 +++++++++++++++++++++++
 2 files changed, 86 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..70cb4cf055
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,84 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot3', 'test_decoding')");
+$node->safe_psql('postgres',
+	"SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot4', 'test_decoding')");
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+	"SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1')");
+
+# Wait for the statistics to be updated.
+my $slot1_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;";
+my $slot2_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;";
+my $slot3_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;";
+my $slot4_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;";
+
+# Verify that the statistics have been updated.
+$node->poll_query_until('postgres', $slot1_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot2_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot3_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot4_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.25.1

v8-0005-Test-where-there-are-more-replication-slot-statis.patchtext/x-patch; charset=US-ASCII; name=v8-0005-Test-where-there-are-more-replication-slot-statis.patchDownload
From e049c8551791ff25bb9ea4a0c41e8a37d5956275 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 12 Apr 2021 16:10:11 +0530
Subject: [PATCH v8 5/5] Test where there are more replication slot statistics
 than max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 24 +++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 70cb4cf055..9f82da3818 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -74,11 +75,30 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after restart.
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.25.1

#61Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#59)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right?

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

--
With Regards,
Amit Kapila.

#62Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#61)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right?

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 100000);
INSERT 0 100000
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
count
--------
100004
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 2 | 1808 | 1 | 202 |
13200000 | 0 | 0 | 0 |
(1 row)

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#63vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#62)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right?

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

I will check this issue and post my analysis.

Regards,
Vignesh

#64Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#60)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 9:16 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.

<entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding.

Can we slightly extend it to say something like: Note that this
includes the bytes streamed and or spilled. Similarly, we can extend
it for total_txns.

Thanks for the comments, the comments are fixed in the v8 patch attached.
Thoughts?

Here are review comments on new TAP tests:

+# Create replication slots.
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot1',
'test_decoding')");
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot2',
'test_decoding')");
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot3',
'test_decoding')");
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot4',
'test_decoding')");

and

+
+$node->safe_psql('postgres',
+       "SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+       "SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+       "SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can do those similar queries in a single psql connection
like follows:

# Create replication slots.
$node->safe_psql('postgres',
qq[
SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
]);

and

$node->safe_psql('postgres',
qq[
SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
]);

---
+# Wait for the statistics to be updated.
+my $slot1_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;";
+my $slot2_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;";
+my $slot3_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;";
+my $slot4_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;";
+
+# Verify that the statistics have been updated.
+$node->poll_query_until('postgres', $slot1_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot2_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot3_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot4_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";

We can simplify the above code to something like:

$node->poll_query_until(
'postgres', qq[
SELECT count(slot_name) >= 4
FROM pg_stat_replication_slots
WHERE slot_name ~ 'regression_slot'
AND total_txns > 0
AND total_bytes > 0;
]) or die "Timed out while waiting for statistics to be updated";

---
+# Test to remove one of the replication slots and adjust max_replication_slots
+# accordingly to the number of slots and verify replication statistics data is
+# fine after restart.

I think it's better if we explain in detail what cases we're trying to
test. How about the following description?

Test to remove one of the replication slots and adjust
max_replication_slots accordingly to the number of slots. This leads
to a mismatch of the number of slots between in the stats file and on
shared memory, simulating the message for dropping a slot got lost. We
verify replication statistics data is fine after restart.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#65vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#62)
5 attachment(s)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right?

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 100000);
INSERT 0 100000
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
count
--------
100004
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 2 | 1808 | 1 | 202 |
13200000 | 0 | 0 | 0 |
(1 row)

Thanks for identifying this issue, while spilling the transactions
reorder buffer changes gets released, we will not be able to get the
total size for spilled transactions from reorderbuffer size. I have
fixed it by including spilledbytes to totalbytes in case of spilled
transactions. Attached patch has the fix for this.
Thoughts?

Regards,
Vignesh

Attachments:

v9-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchtext/x-patch; charset=US-ASCII; name=v9-0001-Changed-char-datatype-to-NameData-datatype-for-sl.patchDownload
From f22ecb366a461a5203ec07dde1450f2c269ee675 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:14:52 +0530
Subject: [PATCH v9 1/5] Changed char datatype to NameData datatype for
 slotname.

Changed char datatype to NameData datatype for slotname.
---
 src/backend/postmaster/pgstat.c           | 32 +++++++++++------------
 src/backend/replication/logical/logical.c | 13 ++++++---
 src/backend/replication/slot.c            |  7 ++++-
 src/backend/utils/adt/pgstatfuncs.c       |  2 +-
 src/include/pgstat.h                      | 11 +++-----
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f4467625f7..666ce95d08 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,6 +64,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
+#include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -1539,7 +1540,7 @@ pgstat_reset_replslot_counter(const char *name)
 		if (SlotIsPhysical(slot))
 			return;
 
-		strlcpy(msg.m_slotname, name, NAMEDATALEN);
+		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
 	else
@@ -1812,10 +1813,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-					   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-					   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-					   PgStat_Counter streambytes)
+pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1823,14 +1821,14 @@ pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
 	 * Prepare and send the message
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
 	msg.m_drop = false;
-	msg.m_spill_txns = spilltxns;
-	msg.m_spill_count = spillcount;
-	msg.m_spill_bytes = spillbytes;
-	msg.m_stream_txns = streamtxns;
-	msg.m_stream_count = streamcount;
-	msg.m_stream_bytes = streambytes;
+	msg.m_spill_txns = repSlotStat->spill_txns;
+	msg.m_spill_count = repSlotStat->spill_count;
+	msg.m_spill_bytes = repSlotStat->spill_bytes;
+	msg.m_stream_txns = repSlotStat->stream_txns;
+	msg.m_stream_count = repSlotStat->stream_count;
+	msg.m_stream_bytes = repSlotStat->stream_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -1846,7 +1844,7 @@ pgstat_report_replslot_drop(const char *slotname)
 	PgStat_MsgReplSlot msg;
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
-	strlcpy(msg.m_slotname, slotname, NAMEDATALEN);
+	namestrcpy(&msg.m_slotname, slotname);
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -5202,7 +5200,7 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(msg->m_slotname, false);
+		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5538,7 +5536,7 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 	 * Get the index of replication slot statistics.  On dropping, we don't
 	 * create the new statistics.
 	 */
-	idx = pgstat_replslot_index(msg->m_slotname, !msg->m_drop);
+	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
 
 	/*
 	 * The slot entry is not found or there is no space to accommodate the new
@@ -5763,7 +5761,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 	Assert(nReplSlotStats <= max_replication_slots);
 	for (i = 0; i < nReplSlotStats; i++)
 	{
-		if (strcmp(replSlotStats[i].slotname, name) == 0)
+		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
 			return i;			/* found */
 	}
 
@@ -5776,7 +5774,7 @@ pgstat_replslot_index(const char *name, bool create_it)
 
 	/* Register new slot */
 	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	strlcpy(replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN);
+	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
 
 	return nReplSlotStats++;
 }
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4f6e87f18d..68e210ce12 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,6 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
+	PgStat_ReplSlotStats repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
@@ -1790,9 +1791,15 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes);
 
-	pgstat_report_replslot(NameStr(ctx->slot->data.name),
-						   rb->spillTxns, rb->spillCount, rb->spillBytes,
-						   rb->streamTxns, rb->streamCount, rb->streamBytes);
+	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	repSlotStat.spill_txns = rb->spillTxns;
+	repSlotStat.spill_count = rb->spillCount;
+	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.stream_txns = rb->streamTxns;
+	repSlotStat.stream_count = rb->streamCount;
+	repSlotStat.stream_bytes = rb->streamBytes;
+
+	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 75a087c2f9..f61b163f78 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,7 +328,12 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-		pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0, 0, 0, 0);
+	{
+		PgStat_ReplSlotStats repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+		pgstat_report_replslot(&repSlotStat);
+	}
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 182b15e3f2..521ba73614 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2328,7 +2328,7 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		MemSet(values, 0, sizeof(values));
 		MemSet(nulls, 0, sizeof(nulls));
 
-		values[0] = PointerGetDatum(cstring_to_text(s->slotname));
+		values[0] = CStringGetTextDatum(NameStr(s->slotname));
 		values[1] = Int64GetDatum(s->spill_txns);
 		values[2] = Int64GetDatum(s->spill_count);
 		values[3] = Int64GetDatum(s->spill_bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9a87e7cd88..8e11215058 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -393,7 +393,7 @@ typedef struct PgStat_MsgResetslrucounter
 typedef struct PgStat_MsgResetreplslotcounter
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData      m_slotname;
 	bool		clearall;
 } PgStat_MsgResetreplslotcounter;
 
@@ -540,7 +540,7 @@ typedef struct PgStat_MsgSLRU
 typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
-	char		m_slotname[NAMEDATALEN];
+	NameData	m_slotname;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
  */
 typedef struct PgStat_ReplSlotStats
 {
-	char		slotname[NAMEDATALEN];
+	NameData	slotname;
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
@@ -1027,10 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const char *slotname, PgStat_Counter spilltxns,
-								   PgStat_Counter spillcount, PgStat_Counter spillbytes,
-								   PgStat_Counter streamtxns, PgStat_Counter streamcount,
-								   PgStat_Counter streambytes);
+extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
-- 
2.25.1

v9-0002-Use-HTAB-for-replication-slot-statistics.patchtext/x-patch; charset=US-ASCII; name=v9-0002-Use-HTAB-for-replication-slot-statistics.patchDownload
From 44fd5024098cb0dd42655cb3cd9af46b41357791 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 12 Apr 2021 10:31:52 +0900
Subject: [PATCH v9 2/5] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where drop-slot-stats message
is lost: 1) the stats for the new slot are not recorded and 2) writing
beyond the end of the array when after restring the number of slots
whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes to use HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 src/backend/catalog/system_views.sql      |  26 ++-
 src/backend/postmaster/pgstat.c           | 261 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 127 ++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 9 files changed, 247 insertions(+), 220 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2ee0a..1bceb25571 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,18 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -982,6 +970,20 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95d08..142567e654 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_ReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hash table. We do this check
+	 * when replSlotStats has more than max_replication_slots entries, i.g,
+	 * when there are stats for the already-dropped slot, to avoid frequent
+	 * call SearchNamedReplicationSlot() which acquires LWLock.
+	 */
+	if (replSlotStats && hash_get_num_entries(replSlotStats) > max_replication_slots)
+	{
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2870,17 +2864,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_ReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_ReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3652,7 +3648,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3742,11 +3737,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_ReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3973,12 +3974,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4003,12 +3998,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4195,21 +4184,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_ReplSlotEntry slotstats;
+				PgStat_ReplSlotEntry *slotent;
+
+				if (fread(&slotstats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotstats.slotname, true);
+				memcpy(slotent, &slotstats, sizeof(PgStat_ReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4422,7 +4417,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_ReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4551,12 +4546,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4763,7 +4758,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5187,20 +5181,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_ReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_ReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5208,11 +5208,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5530,44 +5530,27 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_ReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
 	}
 }
 
@@ -5745,57 +5728,77 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_ReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_ReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_ReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_ReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
+
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_ReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210ce12..d66846faab 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
 	 * Nothing to do if we haven't spilled or streamed anything since the last
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..f75e7e95f9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_ReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same is
+	 * created, the stats will be accumulated into the old slots since we
+	 * use the slot name as the key. In that case, user can reset the particular
+	 * stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73614..569abce27e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,71 +2305,61 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 8
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
-	MemoryContextSwitchTo(oldcontext);
-
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
-	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
-		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+	if (!slotent)
+		PG_RETURN_NULL();
 
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-	}
+	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
 
-	tuplestore_donestoring(tupstore);
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[7] = true;
+	else
+		values[7] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f4957653ae..6c9521603f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,14 +5311,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215058..c2d192fc55 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -915,7 +915,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -925,7 +925,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_ReplSlotEntry;
 
 
 /*
@@ -1027,7 +1027,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1125,7 +1125,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_ReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c966c..eb741b9ff1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2069,7 +2069,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_count,
     s.stream_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v9-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patchtext/x-patch; charset=US-ASCII; name=v9-0003-Added-total-txns-and-total-txn-bytes-to-replicati.patchDownload
From a6edc3a34a5369c8e204a55237887a7c477adc5b Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Sat, 10 Apr 2021 08:57:05 +0530
Subject: [PATCH v9 3/5] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 25 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               |  6 ++
 src/backend/replication/logical/logical.c     | 16 ++--
 .../replication/logical/reorderbuffer.c       | 12 +++
 src/backend/utils/adt/pgstatfuncs.c           | 38 +++++----
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          |  4 +
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 12 files changed, 182 insertions(+), 62 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587f61..25024dfc7c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2716,6 +2716,31 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding. Note that
+        this includes the data streamed and or spilled. 
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1bceb25571..0636a5d5f1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -973,6 +973,8 @@ CREATE VIEW pg_replication_slots AS
 CREATE VIEW pg_stat_replication_slots AS
     SELECT
             s.slot_name,
+	    s.total_txns,
+            s.total_bytes,
             s.spill_txns,
             s.spill_count,
             s.spill_bytes,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 142567e654..0b3b8ae90b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1823,6 +1823,8 @@ pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5545,6 +5547,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		Assert(slotent);
 
 		/* Update the replication slot statistics */
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 		slotent->spill_txns += msg->m_spill_txns;
 		slotent->spill_count += msg->m_spill_count;
 		slotent->spill_bytes += msg->m_spill_bytes;
@@ -5792,6 +5796,8 @@ static void
 pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
 	slotent->spill_txns = 0;
 	slotent->spill_count = 0;
 	slotent->spill_bytes = 0;
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d66846faab..73ef806530 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1776,20 +1776,22 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 &&
+		rb->totalBytes <= 0 && rb->totalTxns <=0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,6 +1800,8 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
@@ -1806,4 +1810,6 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..daaf424252 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -2359,6 +2361,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += (rbtxn_is_serialized(txn)) ? rb->spillBytes : rb->size;
+
 		/* clean up the iterator */
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 569abce27e..be069cf54c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2309,7 +2309,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	text *slotname_text = PG_GETARG_TEXT_P(0);
 	NameData slotname;
 	TupleDesc	tupdesc;
@@ -2325,19 +2325,23 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
 					   TEXTOID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "total_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "total_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "spill_count",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "spill_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 	BlessTupleDesc(tupdesc);
 
@@ -2348,17 +2352,19 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 		PG_RETURN_NULL();
 
 	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
-	values[1] = Int64GetDatum(slotent->spill_txns);
-	values[2] = Int64GetDatum(slotent->spill_count);
-	values[3] = Int64GetDatum(slotent->spill_bytes);
-	values[4] = Int64GetDatum(slotent->stream_txns);
-	values[5] = Int64GetDatum(slotent->stream_count);
-	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[1] = Int64GetDatum(slotent->total_txns);
+	values[2] = Int64GetDatum(slotent->total_bytes);
+	values[3] = Int64GetDatum(slotent->spill_txns);
+	values[4] = Int64GetDatum(slotent->spill_count);
+	values[5] = Int64GetDatum(slotent->spill_bytes);
+	values[6] = Int64GetDatum(slotent->stream_txns);
+	values[7] = Int64GetDatum(slotent->stream_count);
+	values[8] = Int64GetDatum(slotent->stream_bytes);
 
 	if (slotent->stat_reset_timestamp == 0)
-		nulls[7] = true;
+		nulls[9] = true;
 	else
-		values[7] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6c9521603f..b9156b873e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'text',
-  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,total_txns,total_bytes,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c2d192fc55..e9e16e5e2d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotEntry
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotEntry;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index eb741b9ff1..4380577b7d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2062,6 +2062,8 @@ pg_stat_replication| SELECT s.pid,
      JOIN pg_stat_get_wal_senders() w(pid, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag, sync_priority, sync_state, reply_time) ON ((s.pid = w.pid)))
      LEFT JOIN pg_authid u ON ((s.usesysid = u.oid)));
 pg_stat_replication_slots| SELECT s.slot_name,
+    s.total_txns,
+    s.total_bytes,
     s.spill_txns,
     s.spill_count,
     s.spill_bytes,
@@ -2070,7 +2072,7 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_bytes,
     s.stats_reset
    FROM pg_replication_slots r,
-    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, total_txns, total_bytes, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset)
   WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
-- 
2.25.1

v9-0004-Added-tests-for-verification-of-logical-replicati.patchtext/x-patch; charset=US-ASCII; name=v9-0004-Added-tests-for-verification-of-logical-replicati.patchDownload
From 7dff1b29c0b2574add76222aad7eec23462c89d5 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v9 4/5] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |  2 +
 contrib/test_decoding/t/001_repl_stats.pl | 76 +++++++++++++++++++++++
 2 files changed, 78 insertions(+)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..11b6cd9b9c
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,76 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
+]);
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql(
+	'postgres', qq[
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+]);
+
+# Wait for the statistics to be updated.
+$node->poll_query_until(
+	'postgres', qq[
+	SELECT count(slot_name) >= 4 FROM pg_stat_replication_slots
+	WHERE slot_name ~ 'regression_slot'
+	AND total_txns > 0 AND total_bytes > 0;
+]) or die "Timed out while waiting for statistics to be updated";
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
-- 
2.25.1

v9-0005-Test-where-there-are-more-replication-slot-statis.patchtext/x-patch; charset=US-ASCII; name=v9-0005-Test-where-there-are-more-replication-slot-statis.patchDownload
From c48604aab51ac585081f1bc63bdc2fcc78105113 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 12 Apr 2021 16:10:11 +0530
Subject: [PATCH v9 5/5] Test where there are more replication slot statistics
 than max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 30 +++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11b6cd9b9c..5ca4d3d7a7 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -66,11 +67,36 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust
+# max_replication_slots accordingly to the number of slots. This leads
+# to a mismatch between the number of slots present in the stats file and the
+# number of stats present in the shared memory, simulating the scenario for
+# drop slot message lost by the statistics collector process. We verify
+# replication statistics data is fine after restart.
+
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.25.1

#66vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#64)
Re: Replication slot stats misgivings

On Tue, Apr 13, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Mon, Apr 12, 2021 at 9:16 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks, 0001 and 0002 look good to me. I have a minor comment for

0002.

<entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding

output plugin

+ while decoding the changes from WAL for this slot. This can

be used to

+ gauge the total amount of data sent during logical decoding.

Can we slightly extend it to say something like: Note that this
includes the bytes streamed and or spilled. Similarly, we can extend
it for total_txns.

Thanks for the comments, the comments are fixed in the v8 patch

attached.

Thoughts?

Here are review comments on new TAP tests:

Thanks for the comments.

+# Create replication slots.
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot1',
'test_decoding')");
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot2',
'test_decoding')");
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot3',
'test_decoding')");
+$node->safe_psql('postgres',
+       "SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot4',
'test_decoding')");

and

+
+$node->safe_psql('postgres',
+       "SELECT data FROM
pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+       "SELECT data FROM
pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+        "SELECT data FROM
pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");
+$node->safe_psql('postgres',
+       "SELECT data FROM
pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1')");

I think we can do those similar queries in a single psql connection
like follows:

# Create replication slots.
$node->safe_psql('postgres',
qq[
SELECT pg_create_logical_replication_slot('regression_slot1',

'test_decoding');

SELECT pg_create_logical_replication_slot('regression_slot2',

'test_decoding');

SELECT pg_create_logical_replication_slot('regression_slot3',

'test_decoding');

SELECT pg_create_logical_replication_slot('regression_slot4',

'test_decoding');

]);

and

$node->safe_psql('postgres',
qq[
SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
]);

Modified.

---
+# Wait for the statistics to be updated.
+my $slot1_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;";
+my $slot2_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;";
+my $slot3_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;";
+my $slot4_stat_check_query =
+  "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
= 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;";
+
+# Verify that the statistics have been updated.
+$node->poll_query_until('postgres', $slot1_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot2_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot3_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";
+$node->poll_query_until('postgres', $slot4_stat_check_query)
+  or die "Timed out while waiting for statistics to be updated";

We can simplify the above code to something like:

$node->poll_query_until(
'postgres', qq[
SELECT count(slot_name) >= 4
FROM pg_stat_replication_slots
WHERE slot_name ~ 'regression_slot'
AND total_txns > 0
AND total_bytes > 0;
]) or die "Timed out while waiting for statistics to be updated";

Modified.

---
+# Test to remove one of the replication slots and adjust

max_replication_slots

+# accordingly to the number of slots and verify replication statistics

data is

+# fine after restart.

I think it's better if we explain in detail what cases we're trying to
test. How about the following description?

Test to remove one of the replication slots and adjust
max_replication_slots accordingly to the number of slots. This leads
to a mismatch of the number of slots between in the stats file and on
shared memory, simulating the message for dropping a slot got lost. We
verify replication statistics data is fine after restart.

Slightly reworded and modified it.

These comments are fixed as part of the v9 patch posted at [1]/messages/by-id/CALDaNm3CtPUYkFjPhzX0AcuRiK2MzdCR+_w8ok1kCcykveuL2Q@mail.gmail.com.
[1]: /messages/by-id/CALDaNm3CtPUYkFjPhzX0AcuRiK2MzdCR+_w8ok1kCcykveuL2Q@mail.gmail.com
/messages/by-id/CALDaNm3CtPUYkFjPhzX0AcuRiK2MzdCR+_w8ok1kCcykveuL2Q@mail.gmail.com

Regards,
Vignesh

#67Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#65)
Re: Replication slot stats misgivings

On Tue, Apr 13, 2021 at 5:07 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right?

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 100000);
INSERT 0 100000
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
count
--------
100004
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 2 | 1808 | 1 | 202 |
13200000 | 0 | 0 | 0 |
(1 row)

Thanks for identifying this issue, while spilling the transactions
reorder buffer changes gets released, we will not be able to get the
total size for spilled transactions from reorderbuffer size. I have
fixed it by including spilledbytes to totalbytes in case of spilled
transactions. Attached patch has the fix for this.
Thoughts?

I've not looked at the patches yet but as Amit mentioned before[1]/messages/by-id/CAA4eK1Kd4ag6Vc6jO+ntYmTMiR70x3t_+YQRMDP=9T5a2uzUHg@mail.gmail.com,
it's better to move 0002 patch to after 0004. That is, 0001 patch
changes data type to NameData, 0002 patch adds total_txn and
total_bytes, and 0003 patch adds regression tests. 0004 patch will be
the patch using HTAB (was 0002 patch) and get reviewed after pushing
0001, 0002, and 0003 patches. 0005 patch adds more regression tests
for the problem 0004 patch addresses.

Regards,

[1]: /messages/by-id/CAA4eK1Kd4ag6Vc6jO+ntYmTMiR70x3t_+YQRMDP=9T5a2uzUHg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#68vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#67)
Re: Replication slot stats misgivings

On Wed, Apr 14, 2021 at 7:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 13, 2021 at 5:07 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It seems Vignesh has changed patches based on the latest set of
comments so you might want to rebase.

I've merged my patch into the v6 patch set Vignesh submitted.

I've attached the updated version of the patches. I didn't change
anything in the patch that changes char[NAMEDATALEN] to NameData (0001
patch) and patches that add tests.

I think we can push 0001. What do you think?

+1

In 0003 patch I reordered the
output parameters of pg_stat_replication_slots; showing total number
of transactions and total bytes followed by statistics for spilled and
streamed transactions seems appropriate to me.

I am not sure about this because I think we might want to add some
info of stream/spill bytes in total_bytes description (something like
stream/spill bytes are not in addition to total_bytes).

BTW doesn't it confuse users that stream/spill bytes are not in
addition to total_bytes? User will need to do "total_bytes +
spill/stream_bytes" to know the actual total amount of data sent to
the decoding output plugin, is that right?

No, total_bytes includes the spill/stream bytes. So, the user doesn't
need to do any calculation to compute totel_bytes sent to output
plugin.

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 100000);
INSERT 0 100000
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
count
--------
100004
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 2 | 1808 | 1 | 202 |
13200000 | 0 | 0 | 0 |
(1 row)

Thanks for identifying this issue, while spilling the transactions
reorder buffer changes gets released, we will not be able to get the
total size for spilled transactions from reorderbuffer size. I have
fixed it by including spilledbytes to totalbytes in case of spilled
transactions. Attached patch has the fix for this.
Thoughts?

I've not looked at the patches yet but as Amit mentioned before[1],
it's better to move 0002 patch to after 0004. That is, 0001 patch
changes data type to NameData, 0002 patch adds total_txn and
total_bytes, and 0003 patch adds regression tests. 0004 patch will be
the patch using HTAB (was 0002 patch) and get reviewed after pushing
0001, 0002, and 0003 patches. 0005 patch adds more regression tests
for the problem 0004 patch addresses.

I will make the change for this and post a patch for this.
Currently we have kept total_txns and total_bytes at the beginning of
pg_stat_replication_slots, I did not see any conclusion on this. I
preferred it to be at the beginning.
Thoughts?

Regards,
Vignesh

#69Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#68)
Re: Replication slot stats misgivings

On Wed, Apr 14, 2021 at 8:04 AM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Apr 14, 2021 at 7:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've not looked at the patches yet but as Amit mentioned before[1],
it's better to move 0002 patch to after 0004. That is, 0001 patch
changes data type to NameData, 0002 patch adds total_txn and
total_bytes, and 0003 patch adds regression tests. 0004 patch will be
the patch using HTAB (was 0002 patch) and get reviewed after pushing
0001, 0002, and 0003 patches. 0005 patch adds more regression tests
for the problem 0004 patch addresses.

I will make the change for this and post a patch for this.
Currently we have kept total_txns and total_bytes at the beginning of
pg_stat_replication_slots, I did not see any conclusion on this. I
preferred it to be at the beginning.
Thoughts?

I prefer those two fields after spill and stream fields. I have
mentioned the same in one of the emails above.

--
With Regards,
Amit Kapila.

#70Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#65)
1 attachment(s)
Re: Replication slot stats misgivings

On Tue, Apr 13, 2021 at 1:37 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 100000);
INSERT 0 100000
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
count
--------
100004
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 2 | 1808 | 1 | 202 |
13200000 | 0 | 0 | 0 |
(1 row)

Thanks for identifying this issue, while spilling the transactions
reorder buffer changes gets released, we will not be able to get the
total size for spilled transactions from reorderbuffer size. I have
fixed it by including spilledbytes to totalbytes in case of spilled
transactions. Attached patch has the fix for this.
Thoughts?

I am not sure if that is the best way to fix it because sometimes we
clear the serialized flag in which case it might not give the correct
answer. Another way to fix it could be that before we try to restore a
new set of changes, we update totalBytes counter. See, the attached
patch atop your v6-0002-* patch.

--
With Regards,
Amit Kapila.

Attachments:

fix_spilled_stats_1.patchapplication/octet-stream; name=fix_spilled_stats_1.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index bc251adfda0..0f981898191 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1365,6 +1365,11 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		dlist_delete(&change->node);
 		dlist_push_tail(&state->old_change, &change->node);
 
+		/*
+		 * Update the total bytes processed before releasing the current set of
+		 * changes and restoring the new set of changes.
+		 */
+		rb->totalBytes += rb->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2361,6 +2366,10 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			specinsert = NULL;
 		}
 
+		/* clean up the iterator */
+		ReorderBufferIterTXNFinish(rb, iterstate);
+		iterstate = NULL;
+
 		/*
 		 * Update total transaction count and total transaction bytes
 		 * processed. Ensure to not count the streamed transaction multiple
@@ -2371,10 +2380,6 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 		rb->totalBytes += rb->size;
 
-		/* clean up the iterator */
-		ReorderBufferIterTXNFinish(rb, iterstate);
-		iterstate = NULL;
-
 		/*
 		 * Done with current changes, send the last message for this set of
 		 * changes depending upon streaming mode.
#71vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#70)
4 attachment(s)
Re: Replication slot stats misgivings

On Wed, Apr 14, 2021 at 12:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 13, 2021 at 1:37 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The following test for the latest v8 patch seems to show different.
total_bytes is 1808 whereas spill_bytes is 13200000. Am I missing
something?

postgres(1:85969)=# select pg_create_logical_replication_slot('s',
'test_decoding');
pg_create_logical_replication_slot
------------------------------------
(s,0/1884468)
(1 row)

postgres(1:85969)=# create table a (i int);
CREATE TABLE
postgres(1:85969)=# insert into a select generate_series(1, 100000);
INSERT 0 100000
postgres(1:85969)=# set logical_decoding_work_mem to 64;
SET
postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
(1 row)

postgres(1:85969)=# select count(*) from
pg_logical_slot_peek_changes('s', NULL, NULL);
count
--------
100004
(1 row)

postgres(1:85969)=# select * from pg_stat_replication_slots ;
slot_name | total_txns | total_bytes | spill_txns | spill_count |
spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset
-----------+------------+-------------+------------+-------------+-------------+-------------+--------------+--------------+-------------
s | 2 | 1808 | 1 | 202 |
13200000 | 0 | 0 | 0 |
(1 row)

Thanks for identifying this issue, while spilling the transactions
reorder buffer changes gets released, we will not be able to get the
total size for spilled transactions from reorderbuffer size. I have
fixed it by including spilledbytes to totalbytes in case of spilled
transactions. Attached patch has the fix for this.
Thoughts?

I am not sure if that is the best way to fix it because sometimes we
clear the serialized flag in which case it might not give the correct
answer. Another way to fix it could be that before we try to restore a
new set of changes, we update totalBytes counter. See, the attached
patch atop your v6-0002-* patch.

I felt calculating totalbytes this way is better than depending on
spill_bytes. I have taken your changes. Attached patch includes the
changes suggested.
Thoughts?

Regards,
Vignesh

Attachments:

v10-0001-Added-total-txns-and-total-txn-bytes-to-replicat.patchtext/x-patch; charset=US-ASCII; name=v10-0001-Added-total-txns-and-total-txn-bytes-to-replicat.patchDownload
From 9a4ca0fcef85d000856339cfcb2a58eb87ee5e72 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 14 Apr 2021 10:08:13 +0530
Subject: [PATCH v10 1/4] Added total txns and total txn bytes to replication
 statistics.

This adds the statistics about total transactions count and total transaction
data logically replicated to the decoding output plugin from ReorderBuffer.
Users can query the pg_stat_replication_slots view to check these stats.
---
 contrib/test_decoding/expected/stats.out      | 79 +++++++++++++------
 contrib/test_decoding/sql/stats.sql           | 48 +++++++----
 doc/src/sgml/monitoring.sgml                  | 25 ++++++
 src/backend/catalog/system_views.sql          |  2 +
 src/backend/postmaster/pgstat.c               |  6 ++
 src/backend/replication/logical/logical.c     | 16 ++--
 .../replication/logical/reorderbuffer.c       | 17 ++++
 src/backend/utils/adt/pgstatfuncs.c           |  8 +-
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          |  4 +
 src/include/replication/reorderbuffer.h       |  4 +
 src/test/regress/expected/rules.out           |  4 +-
 12 files changed, 170 insertions(+), 49 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa903..bc8e601eab 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e48e8..8c34aeced1 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587f61..25024dfc7c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2716,6 +2716,31 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding. Note that
+        this includes the data streamed and or spilled. 
+       </para>
+      </entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2ee0a..6d78b33590 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -875,6 +875,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95d08..e1ec7d8b7d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1829,6 +1829,8 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5568,6 +5570,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5795,6 +5799,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210ce12..ed9a7c8489 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1776,20 +1776,22 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	PgStat_ReplSlotStats repSlotStat;
 
 	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
+	 * Nothing to do if we don't have any replication stats to be sent.
 	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 &&
+		rb->totalBytes <= 0 && rb->totalTxns <=0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,6 +1800,8 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 	rb->spillTxns = 0;
@@ -1806,4 +1810,6 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d06285a2..0f98189819 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -1363,6 +1365,11 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		dlist_delete(&change->node);
 		dlist_push_tail(&state->old_change, &change->node);
 
+		/*
+		 * Update the total bytes processed before releasing the current set of
+		 * changes and restoring the new set of changes.
+		 */
+		rb->totalBytes += rb->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2363,6 +2370,16 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		ReorderBufferIterTXNFinish(rb, iterstate);
 		iterstate = NULL;
 
+		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
 		/*
 		 * Done with current changes, send the last message for this set of
 		 * changes depending upon streaming mode.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73614..2680190a40 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2284,7 +2284,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2335,11 +2335,13 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f4957653ae..591753fe81 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215058..2aeb3cded4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961d6a..a372b70b7d 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,10 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/* Statistics about all the replicated transactions */
+	int64		totalTxns;		/* total number of transactions replicated */
+	int64		totalBytes;		/* total amount of data replicated */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c966c..6399f3feef 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,8 +2068,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v10-0002-Added-tests-for-verification-of-logical-replicat.patchtext/x-patch; charset=US-ASCII; name=v10-0002-Added-tests-for-verification-of-logical-replicat.patchDownload
From 8e78a79588a38137bc8ba15daf75a4b417b1c86b Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 5 Apr 2021 18:17:21 +0530
Subject: [PATCH v10 2/4] Added tests for verification of logical replication
 statistics.

Added tests for verification of logical replication statistics after
restart of server.
---
 contrib/test_decoding/Makefile            |  2 +
 contrib/test_decoding/t/001_repl_stats.pl | 76 +++++++++++++++++++++++
 src/backend/catalog/system_views.sql      |  5 +-
 3 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce5cc..9a31e0b879 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000000..11b6cd9b9c
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,76 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
+]);
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql(
+	'postgres', qq[
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+]);
+
+# Wait for the statistics to be updated.
+$node->poll_query_until(
+	'postgres', qq[
+	SELECT count(slot_name) >= 4 FROM pg_stat_replication_slots
+	WHERE slot_name ~ 'regression_slot'
+	AND total_txns > 0 AND total_bytes > 0;
+]) or die "Timed out while waiting for statistics to be updated";
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6d78b33590..9fdf445579 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -878,7 +878,10 @@ CREATE VIEW pg_stat_replication_slots AS
             s.total_txns,
             s.total_bytes,
             s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 
 CREATE VIEW pg_stat_slru AS
     SELECT
-- 
2.25.1

v10-0003-Use-HTAB-for-replication-slot-statistics.patchtext/x-patch; charset=US-ASCII; name=v10-0003-Use-HTAB-for-replication-slot-statistics.patchDownload
From e10381b3a7a5b3dd412b7df8fb6667f3076cf28a Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Wed, 14 Apr 2021 16:56:39 +0530
Subject: [PATCH v10 3/4] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where drop-slot-stats message
is lost: 1) the stats for the new slot are not recorded and 2) writing
beyond the end of the array when after restring the number of slots
whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes to use HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 src/backend/catalog/system_views.sql      |   1 -
 src/backend/postmaster/pgstat.c           | 268 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 135 ++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 9 files changed, 242 insertions(+), 215 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9fdf445579..c95c900a12 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -882,7 +882,6 @@ CREATE VIEW pg_stat_replication_slots AS
         LATERAL pg_stat_get_replication_slot(slot_name) as s
     WHERE r.datoid IS NOT NULL; -- excluding physical slots
 
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..27937e07b9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_ReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hash table. We do this check
+	 * when replSlotStats has more than max_replication_slots entries, i.g,
+	 * when there are stats for the already-dropped slot, to avoid frequent
+	 * call SearchNamedReplicationSlot() which acquires LWLock.
+	 */
+	if (replSlotStats && hash_get_num_entries(replSlotStats) > max_replication_slots)
+	{
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2872,17 +2866,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_ReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_ReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3654,7 +3650,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3739,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_ReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3976,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4000,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4186,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_ReplSlotEntry slotstats;
+				PgStat_ReplSlotEntry *slotent;
+
+				if (fread(&slotstats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotstats.slotname, true);
+				memcpy(slotent, &slotstats, sizeof(PgStat_ReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4424,7 +4419,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_ReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4548,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4760,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5183,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_ReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_ReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5210,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5532,29 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_ReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
 	}
 }
 
@@ -5749,59 +5732,78 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_get_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_ReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_ReplSlotEntry *slotent;
+	bool    found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL         hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_ReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_ReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_ReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index ed9a7c8489..73ef806530 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/*
 	 * Nothing to do if we don't have any replication stats to be sent.
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..f75e7e95f9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_ReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same is
+	 * created, the stats will be accumulated into the old slots since we
+	 * use the slot name as the key. In that case, user can reset the particular
+	 * stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..b1899d8c68 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,67 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
-
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
-	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
+	if (!slotent)
+		PG_RETURN_NULL();
 
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-	}
+	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	tuplestore_donestoring(tupstore);
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 591753fe81..a37cb8a426 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,14 +5311,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2aeb3cded4..e9e16e5e2d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +929,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_ReplSlotEntry;
 
 
 /*
@@ -1031,7 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1129,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_ReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v10-0004-Test-where-there-are-more-replication-slot-stati.patchtext/x-patch; charset=US-ASCII; name=v10-0004-Test-where-there-are-more-replication-slot-stati.patchDownload
From 6bcc1b29c7adffaf2b613ab5cc7ba3d3d9fc85a4 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 12 Apr 2021 16:10:11 +0530
Subject: [PATCH v10 4/4] Test where there are more replication slot statistics
 than max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 30 +++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11b6cd9b9c..5ca4d3d7a7 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -66,11 +67,36 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust
+# max_replication_slots accordingly to the number of slots. This leads
+# to a mismatch between the number of slots present in the stats file and the
+# number of stats present in the shared memory, simulating the scenario for
+# drop slot message lost by the statistics collector process. We verify
+# replication statistics data is fine after restart.
+
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.25.1

#72Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#71)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, Apr 14, 2021 at 5:52 PM vignesh C <vignesh21@gmail.com> wrote:

I have made minor changes to the 0001 and 0002 patches. Attached is
the combined patch for them, I think we can push them as one patch.
Changes made are (a) minor editing in comments, (b) changed the
condition when to report stats such that unless we have processed any
bytes, we shouldn't send those, (c) removed some unrelated changes
from 0002, (d) ran pgindent.

Let me know what you think of the attached?

--
With Regards,
Amit Kapila.

Attachments:

v11-0001-Add-information-of-total-data-processed-to-repli.patchapplication/octet-stream; name=v11-0001-Add-information-of-total-data-processed-to-repli.patchDownload
From 012f2cc6088d6ccb0efaefef048e5d81b8c477c6 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 15 Apr 2021 10:46:11 +0530
Subject: [PATCH v11] Add information of total data processed to replication
 slot stats.

This adds the statistics about total transactions count and total
transaction data logically sent to the decoding output plugin from
ReorderBuffer. Users can query the pg_stat_replication_slots view to check
these stats.

Suggested-by: Andres Freund
Author: Vignesh C and Amit Kapila
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
---
 contrib/test_decoding/Makefile                  |  2 +
 contrib/test_decoding/expected/stats.out        | 79 ++++++++++++++++++-------
 contrib/test_decoding/sql/stats.sql             | 48 ++++++++++-----
 contrib/test_decoding/t/001_repl_stats.pl       | 76 ++++++++++++++++++++++++
 doc/src/sgml/monitoring.sgml                    | 25 ++++++++
 src/backend/catalog/system_views.sql            |  2 +
 src/backend/postmaster/pgstat.c                 |  6 ++
 src/backend/replication/logical/logical.c       | 18 +++---
 src/backend/replication/logical/reorderbuffer.c | 21 +++++++
 src/backend/utils/adt/pgstatfuncs.c             |  8 ++-
 src/include/catalog/pg_proc.dat                 |  6 +-
 src/include/pgstat.h                            |  4 ++
 src/include/replication/reorderbuffer.h         |  7 +++
 src/test/regress/expected/rules.out             |  4 +-
 14 files changed, 255 insertions(+), 51 deletions(-)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce..9a31e0b 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa..bc8e601 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e4..8c34aec 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000..11b6cd9
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,76 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
+]);
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql(
+	'postgres', qq[
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+]);
+
+# Wait for the statistics to be updated.
+$node->poll_query_until(
+	'postgres', qq[
+	SELECT count(slot_name) >= 4 FROM pg_stat_replication_slots
+	WHERE slot_name ~ 'regression_slot'
+	AND total_txns > 0 AND total_bytes > 0;
+]) or die "Timed out while waiting for statistics to be updated";
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587..25024df 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2718,6 +2718,31 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding. Note that
+        this includes the data streamed and or spilled. 
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
        </para>
        <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2e..6d78b33 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -875,6 +875,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95..e1ec7d8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1829,6 +1829,8 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5568,6 +5570,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5795,6 +5799,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210c..35b0c67 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1775,21 +1775,20 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	ReorderBuffer *rb = ctx->reorder;
 	PgStat_ReplSlotStats repSlotStat;
 
-	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
-	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	/* Nothing to do if we don't have any replication stats to be sent. */
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,12 +1797,17 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
+
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d0628..5cb484f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -1363,6 +1365,11 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		dlist_delete(&change->node);
 		dlist_push_tail(&state->old_change, &change->node);
 
+		/*
+		 * Update the total bytes processed before releasing the current set
+		 * of changes and restoring the new set of changes.
+		 */
+		rb->totalBytes += rb->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2364,6 +2371,20 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		iterstate = NULL;
 
 		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 *
+		 * Note that the statistics computation has to be done after
+		 * ReorderBufferIterTXNFinish as it releases the serialized change
+		 * which we have already accounted in ReorderBufferIterTXNNext.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
+		/*
 		 * Done with current changes, send the last message for this set of
 		 * changes depending upon streaming mode.
 		 */
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73..2680190 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2284,7 +2284,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2335,11 +2335,13 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f495765..591753f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215..2aeb3cd 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961..bfab830 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,13 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/*
+	 * Statistics about all the transactions sent to the decoding output
+	 * plugin
+	 */
+	int64		totalTxns;		/* total number of transactions sent */
+	int64		totalBytes;		/* total amount of data sent */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c9..6399f3f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,8 +2068,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
1.8.3.1

#73vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#72)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 5:52 PM vignesh C <vignesh21@gmail.com> wrote:

I have made minor changes to the 0001 and 0002 patches. Attached is
the combined patch for them, I think we can push them as one patch.
Changes made are (a) minor editing in comments, (b) changed the
condition when to report stats such that unless we have processed any
bytes, we shouldn't send those, (c) removed some unrelated changes
from 0002, (d) ran pgindent.

Let me know what you think of the attached?

Changes look fine to me, the patch applies neatly and make check-world passes.

Regards,
Vignesh

#74Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#73)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 12:45 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 15, 2021 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 5:52 PM vignesh C <vignesh21@gmail.com> wrote:

I have made minor changes to the 0001 and 0002 patches. Attached is
the combined patch for them, I think we can push them as one patch.
Changes made are (a) minor editing in comments, (b) changed the
condition when to report stats such that unless we have processed any
bytes, we shouldn't send those, (c) removed some unrelated changes
from 0002, (d) ran pgindent.

Let me know what you think of the attached?

Changes look fine to me, the patch applies neatly and make check-world passes.

Thanks! Sawada-San, others, unless you have any suggestions, I am
planning to push
v11-0001-Add-information-of-total-data-processed-to-repli.patch
tomorrow.

--
With Regards,
Amit Kapila.

#75Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#72)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 14, 2021 at 5:52 PM vignesh C <vignesh21@gmail.com> wrote:

I have made minor changes to the 0001 and 0002 patches. Attached is
the combined patch for them, I think we can push them as one patch.
Changes made are (a) minor editing in comments, (b) changed the
condition when to report stats such that unless we have processed any
bytes, we shouldn't send those, (c) removed some unrelated changes
from 0002, (d) ran pgindent.

Let me know what you think of the attached?

Thank you for updating the patch.

I have one question on the doc change:

+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#76Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#75)
1 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch.

I have one question on the doc change:

+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

I changed it to 'and/or' and made another minor change.

--
With Regards,
Amit Kapila.

Attachments:

v12-0001-Add-information-of-total-data-processed-to-repli.patchapplication/octet-stream; name=v12-0001-Add-information-of-total-data-processed-to-repli.patchDownload
From eb5f9586f566de513e60791e507b6cf968eee8aa Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 15 Apr 2021 10:46:11 +0530
Subject: [PATCH v12] Add information of total data processed to replication
 slot stats.

This adds the statistics about total transactions count and total
transaction data logically sent to the decoding output plugin from
ReorderBuffer. Users can query the pg_stat_replication_slots view to check
these stats.

Suggested-by: Andres Freund
Author: Vignesh C and Amit Kapila
Reviewed-by: Sawada Masahiko, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
---
 contrib/test_decoding/Makefile                  |  2 +
 contrib/test_decoding/expected/stats.out        | 79 ++++++++++++++++++-------
 contrib/test_decoding/sql/stats.sql             | 48 ++++++++++-----
 contrib/test_decoding/t/001_repl_stats.pl       | 76 ++++++++++++++++++++++++
 doc/src/sgml/monitoring.sgml                    | 25 ++++++++
 src/backend/catalog/system_views.sql            |  2 +
 src/backend/postmaster/pgstat.c                 |  6 ++
 src/backend/replication/logical/logical.c       | 18 +++---
 src/backend/replication/logical/reorderbuffer.c | 21 +++++++
 src/backend/utils/adt/pgstatfuncs.c             |  8 ++-
 src/include/catalog/pg_proc.dat                 |  6 +-
 src/include/pgstat.h                            |  4 ++
 src/include/replication/reorderbuffer.h         |  7 +++
 src/test/regress/expected/rules.out             |  4 +-
 14 files changed, 255 insertions(+), 51 deletions(-)
 create mode 100644 contrib/test_decoding/t/001_repl_stats.pl

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c5e28ce..9a31e0b 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -17,6 +17,8 @@ ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 # typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
 
+TAP_TESTS = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bca36fa..bc8e601 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -8,7 +8,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 
 CREATE TABLE stats_test(data text);
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -16,12 +16,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -51,16 +64,16 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -70,16 +83,16 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
-SELECT wait_for_decode_stats(true);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot |          0 |           0
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
@@ -89,16 +102,36 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
   5002
 (1 row)
 
-SELECT wait_for_decode_stats(false);
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
+(1 row)
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+ pg_stat_reset_replication_slot 
+--------------------------------
+ 
+(1 row)
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count 
------------------+------------+-------------
- regression_slot | t          | t
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | f          | f           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
@@ -117,7 +150,7 @@ SELECT slot_name FROM pg_stat_replication_slots;
 (1 row)
 
 COMMIT;
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
  pg_drop_replication_slot 
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 51294e4..8c34aec 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -6,7 +6,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
 CREATE TABLE stats_test(data text);
 
 -- function to wait for counters to advance
-CREATE FUNCTION wait_for_decode_stats(check_reset bool) RETURNS void AS $$
+CREATE FUNCTION wait_for_decode_stats(check_reset bool, check_spill_txns bool) RETURNS void AS $$
 DECLARE
   start_time timestamptz := clock_timestamp();
   updated bool;
@@ -14,12 +14,25 @@ BEGIN
   -- we don't want to wait forever; loop will exit after 30 seconds
   FOR i IN 1 .. 300 LOOP
 
-    -- check to see if all updates have been reset/updated
-    SELECT CASE WHEN check_reset THEN (spill_txns = 0)
-                ELSE (spill_txns > 0)
-           END
-    INTO updated
-    FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+    IF check_spill_txns THEN
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (spill_txns = 0)
+                  ELSE (spill_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    ELSE
+
+      -- check to see if all updates have been reset/updated
+      SELECT CASE WHEN check_reset THEN (total_txns = 0)
+                  ELSE (total_txns > 0)
+             END
+      INTO updated
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+
+    END IF;
 
     exit WHEN updated;
 
@@ -46,18 +59,25 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
-SELECT wait_for_decode_stats(true);
-SELECT slot_name, spill_txns, spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(true, true);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+SELECT wait_for_decode_stats(false, true);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+SELECT pg_stat_reset_replication_slot('regression_slot');
+
+-- non-spilled xact
+INSERT INTO stats_test values(generate_series(1, 10));
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
@@ -66,6 +86,6 @@ SELECT slot_name FROM pg_stat_replication_slots;
 SELECT slot_name FROM pg_stat_replication_slots;
 COMMIT;
 
-DROP FUNCTION wait_for_decode_stats(bool);
+DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
new file mode 100644
index 0000000..11b6cd9
--- /dev/null
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -0,0 +1,76 @@
+# Test replication statistics data in pg_stat_replication_slots is sane after
+# drop replication slot and restart.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Test set-up
+my $node = get_new_node('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', 'synchronous_commit = on');
+$node->start;
+
+# Create table.
+$node->safe_psql('postgres',
+        "CREATE TABLE test_repl_stat(col1 int)");
+
+# Create replication slots.
+$node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
+	SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
+]);
+
+# Insert some data.
+$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+
+$node->safe_psql(
+	'postgres', qq[
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+	SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
+	NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+]);
+
+# Wait for the statistics to be updated.
+$node->poll_query_until(
+	'postgres', qq[
+	SELECT count(slot_name) >= 4 FROM pg_stat_replication_slots
+	WHERE slot_name ~ 'regression_slot'
+	AND total_txns > 0 AND total_bytes > 0;
+]) or die "Timed out while waiting for statistics to be updated";
+
+# Test to drop one of the replication slot and verify replication statistics data is
+# fine after restart.
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+
+$node->stop;
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+my $result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t), 'check replication statistics are updated');
+
+# cleanup
+$node->safe_psql('postgres', "DROP TABLE test_repl_stat");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
+$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+
+# shutdown
+$node->stop;
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8287587..c44d087 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2718,6 +2718,31 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_txns</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Number of decoded transactions sent to the decoding output plugin for
+        this slot. This counter is used to maintain the top level transactions,
+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions that are streamed and/or spilled.
+       </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>total_bytes</structfield><type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transactions data sent to the decoding output plugin
+        while decoding the changes from WAL for this slot. This can be used to
+        gauge the total amount of data sent during logical decoding. Note that
+        this includes the data that is streamed and/or spilled.
+       </para>
+      </entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
        </para>
        <para>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 451db2e..6d78b33 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -875,6 +875,8 @@ CREATE VIEW pg_stat_replication_slots AS
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
             s.stats_reset
     FROM pg_stat_get_replication_slots() AS s;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 666ce95..e1ec7d8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1829,6 +1829,8 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	msg.m_stream_txns = repSlotStat->stream_txns;
 	msg.m_stream_count = repSlotStat->stream_count;
 	msg.m_stream_bytes = repSlotStat->stream_bytes;
+	msg.m_total_txns = repSlotStat->total_txns;
+	msg.m_total_bytes = repSlotStat->total_bytes;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
@@ -5568,6 +5570,8 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 		replSlotStats[idx].stream_txns += msg->m_stream_txns;
 		replSlotStats[idx].stream_count += msg->m_stream_count;
 		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
+		replSlotStats[idx].total_txns += msg->m_total_txns;
+		replSlotStats[idx].total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5795,6 +5799,8 @@ pgstat_reset_replslot(int i, TimestampTz ts)
 	replSlotStats[i].stream_txns = 0;
 	replSlotStats[i].stream_count = 0;
 	replSlotStats[i].stream_bytes = 0;
+	replSlotStats[i].total_txns = 0;
+	replSlotStats[i].total_bytes = 0;
 	replSlotStats[i].stat_reset_timestamp = ts;
 }
 
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 68e210c..35b0c67 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1775,21 +1775,20 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	ReorderBuffer *rb = ctx->reorder;
 	PgStat_ReplSlotStats repSlotStat;
 
-	/*
-	 * Nothing to do if we haven't spilled or streamed anything since the last
-	 * time the stats has been sent.
-	 */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0)
+	/* Nothing to do if we don't have any replication stats to be sent. */
+	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes);
+		 (long long) rb->streamBytes,
+		 (long long) rb->totalTxns,
+		 (long long) rb->totalBytes);
 
 	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
@@ -1798,12 +1797,17 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
+	repSlotStat.total_txns = rb->totalTxns;
+	repSlotStat.total_bytes = rb->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
+
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
+	rb->totalTxns = 0;
+	rb->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 52d0628..5cb484f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -350,6 +350,8 @@ ReorderBufferAllocate(void)
 	buffer->streamTxns = 0;
 	buffer->streamCount = 0;
 	buffer->streamBytes = 0;
+	buffer->totalTxns = 0;
+	buffer->totalBytes = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
@@ -1363,6 +1365,11 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		dlist_delete(&change->node);
 		dlist_push_tail(&state->old_change, &change->node);
 
+		/*
+		 * Update the total bytes processed before releasing the current set
+		 * of changes and restoring the new set of changes.
+		 */
+		rb->totalBytes += rb->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2364,6 +2371,20 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		iterstate = NULL;
 
 		/*
+		 * Update total transaction count and total transaction bytes
+		 * processed. Ensure to not count the streamed transaction multiple
+		 * times.
+		 *
+		 * Note that the statistics computation has to be done after
+		 * ReorderBufferIterTXNFinish as it releases the serialized change
+		 * which we have already accounted in ReorderBufferIterTXNNext.
+		 */
+		if (!rbtxn_is_streamed(txn))
+			rb->totalTxns++;
+
+		rb->totalBytes += rb->size;
+
+		/*
 		 * Done with current changes, send the last message for this set of
 		 * changes depending upon streaming mode.
 		 */
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 521ba73..2680190 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2284,7 +2284,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 8
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -2335,11 +2335,13 @@ pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
 		values[4] = Int64GetDatum(s->stream_txns);
 		values[5] = Int64GetDatum(s->stream_count);
 		values[6] = Int64GetDatum(s->stream_bytes);
+		values[7] = Int64GetDatum(s->total_txns);
+		values[8] = Int64GetDatum(s->total_bytes);
 
 		if (s->stat_reset_timestamp == 0)
-			nulls[7] = true;
+			nulls[9] = true;
 		else
-			values[7] = TimestampTzGetDatum(s->stat_reset_timestamp);
+			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f495765..591753f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5315,9 +5315,9 @@
   proname => 'pg_stat_get_replication_slots', prorows => '10',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,stats_reset}',
+  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slots' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e11215..2aeb3cd 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -548,6 +548,8 @@ typedef struct PgStat_MsgReplSlot
 	PgStat_Counter m_stream_txns;
 	PgStat_Counter m_stream_count;
 	PgStat_Counter m_stream_bytes;
+	PgStat_Counter m_total_txns;
+	PgStat_Counter m_total_bytes;
 } PgStat_MsgReplSlot;
 
 /* ----------
@@ -924,6 +926,8 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
+	PgStat_Counter total_txns;
+	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
 } PgStat_ReplSlotStats;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 565a961..bfab830 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -618,6 +618,13 @@ struct ReorderBuffer
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
 	int64		streamBytes;	/* amount of data streamed */
+
+	/*
+	 * Statistics about all the transactions sent to the decoding output
+	 * plugin
+	 */
+	int64		totalTxns;		/* total number of transactions sent */
+	int64		totalBytes;		/* total amount of data sent */
 };
 
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 186e6c9..6399f3f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2068,8 +2068,10 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
+    s.total_txns,
+    s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, stats_reset);
+   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
1.8.3.1

#77Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#76)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch.

I have one question on the doc change:

+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

I changed it to 'and/or' and made another minor change.

Thank you for the update! The patch looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#78Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#76)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 02:46:35PM +0530, Amit Kapila wrote:

On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch.

I have one question on the doc change:

+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

I changed it to 'and/or' and made another minor change.

I'm suggesting some doc changes. If these are fine, I'll include in my next
round of doc review, in case you don't want to make another commit just for
that.

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d90eb0f21..18c5bba254 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2720,9 +2720,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
        </para>
        <para>
         Number of decoded transactions sent to the decoding output plugin for
-        this slot. This counter is used to maintain the top level transactions,
-        so the counter is not incremented for subtransactions. Note that this
-        includes the transactions that are streamed and/or spilled.
+        this slot. This counts top-level transactions only,
+        and is not incremented for subtransactions. Note that this
+        includes transactions that are streamed and/or spilled.
        </para></entry>
      </row>
@@ -2731,10 +2731,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         <structfield>total_bytes</structfield><type>bigint</type>
        </para>
        <para>
-        Amount of decoded transactions data sent to the decoding output plugin
+        Amount of decoded transaction data sent to the decoding output plugin
         while decoding the changes from WAL for this slot. This can be used to
         gauge the total amount of data sent during logical decoding. Note that
-        this includes the data that is streamed and/or spilled.
+        this includes data that is streamed and/or spilled.
        </para>
       </entry>
      </row>
-- 
2.17.0
#79Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#78)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 8:22 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Thu, Apr 15, 2021 at 02:46:35PM +0530, Amit Kapila wrote:

On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch.

I have one question on the doc change:

+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

I changed it to 'and/or' and made another minor change.

I'm suggesting some doc changes. If these are fine, I'll include in my next
round of doc review, in case you don't want to make another commit just for
that.

I am fine with your proposed changes. There are one or two more
patches in this area. I can include your suggestions along with those
if you don't mind?

--
With Regards,
Amit Kapila.

#80Justin Pryzby
pryzby@telsasoft.com
In reply to: Amit Kapila (#79)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 08:48:29AM +0530, Amit Kapila wrote:

I am fine with your proposed changes. There are one or two more
patches in this area. I can include your suggestions along with those
if you don't mind?

However's convenient is fine

--
Justin

#81vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#76)
2 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 15, 2021 at 3:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch.

I have one question on the doc change:

+        so the counter is not incremented for subtransactions. Note that this
+        includes the transactions streamed and or spilled.
+       </para></entry>

The patch uses the sentence "streamed and or spilled" in two places.
You meant “streamed and spilled”? Even if it actually means “and or”,
using "and or” (i.g., connecting “and” to “or” by a space) is general?
I could not find we use it other places in the doc but found we're
using "and/or" instead.

I changed it to 'and/or' and made another minor change.

I have rebased the remaining patches on top of head. Attached the
patches for the same.
Thoughts?

Regards,
Vignesh

Attachments:

v13-0001-Use-HTAB-for-replication-slot-statistics.patchtext/x-patch; charset=US-ASCII; name=v13-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From e66015cc7ce9c67280f6779bca56a257652a4342 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Fri, 16 Apr 2021 08:33:40 +0530
Subject: [PATCH v13 1/2] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where drop-slot-stats message
is lost: 1) the stats for the new slot are not recorded and 2) writing
beyond the end of the array when after restring the number of slots
whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes to use HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 src/backend/catalog/system_views.sql      |  30 +--
 src/backend/postmaster/pgstat.c           | 268 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 135 ++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 9 files changed, 258 insertions(+), 228 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6d78b33590..fba45473c1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..9ded0a75e9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_ReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_ReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hash table. We do this check
+	 * when replSlotStats has more than max_replication_slots entries, i.e,
+	 * when there are stats for the already-dropped slot, to avoid frequent
+	 * call SearchNamedReplicationSlot() which acquires LWLock.
+	 */
+	if (replSlotStats && hash_get_num_entries(replSlotStats) > max_replication_slots)
+	{
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2872,17 +2866,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_ReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_ReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3654,7 +3650,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3739,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_ReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_ReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3976,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4000,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4186,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_ReplSlotEntry slotstats;
+				PgStat_ReplSlotEntry *slotent;
+
+				if (fread(&slotstats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotstats.slotname, true);
+				memcpy(slotent, &slotstats, sizeof(PgStat_ReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4424,7 +4419,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_ReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4548,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_ReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotEntry), fpin)
+					!= sizeof(PgStat_ReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4760,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5183,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_ReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_ReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5210,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5532,29 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_ReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5749,59 +5732,78 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_get_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_ReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_ReplSlotEntry *slotent;
+	bool    found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL         hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_ReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_ReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_ReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_ReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..93b31a46f6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..f75e7e95f9 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_ReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same is
+	 * created, the stats will be accumulated into the old slots since we
+	 * use the slot name as the key. In that case, user can reset the particular
+	 * stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..b1899d8c68 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,67 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
-
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
-	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
+	if (!slotent)
+		PG_RETURN_NULL();
 
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-	}
+	values[0] = CStringGetTextDatum(NameStr(slotent->slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	tuplestore_donestoring(tupstore);
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 591753fe81..a37cb8a426 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5311,14 +5311,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2aeb3cded4..e9e16e5e2d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +929,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_ReplSlotEntry;
 
 
 /*
@@ -1031,7 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_ReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1129,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_ReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.25.1

v13-0002-Test-where-there-are-more-replication-slot-stati.patchtext/x-patch; charset=US-ASCII; name=v13-0002-Test-where-there-are-more-replication-slot-stati.patchDownload
From 410d14bd6aee29e29e75cdc8a49ea0d9fc0c0214 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 12 Apr 2021 16:10:11 +0530
Subject: [PATCH v13 2/2] Test where there are more replication slot statistics
 than max_replication_slot slots at startup.

There is a remote scenario where one of the replication slots is dropped and
the drop slot statistics message is not received by the statistic collector
process, now if the max_replication_slots is reduced to the actual number of
replication slots that are in use and the server is re-started then the
statistics process will not be aware of this and the statistic collector
process will write beyond the slots available, added a test for this.
---
 contrib/test_decoding/t/001_repl_stats.pl | 30 +++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11b6cd9b9c..5ca4d3d7a7 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -66,11 +67,36 @@ is($result, qq(regression_slot1|t|t
 regression_slot2|t|t
 regression_slot3|t|t), 'check replication statistics are updated');
 
+# Test to remove one of the replication slots and adjust
+# max_replication_slots accordingly to the number of slots. This leads
+# to a mismatch between the number of slots present in the stats file and the
+# number of stats present in the shared memory, simulating the scenario for
+# drop slot message lost by the statistics collector process. We verify
+# replication statistics data is fine after restart.
+
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+$result = $node->safe_psql('postgres',
+	"SELECT slot_name, total_txns > 0 AS total_txn,
+	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
+	ORDER BY slot_name"
+);
+is($result, qq(regression_slot1|t|t
+regression_slot2|t|t), 'check replication statistics are updated');
+
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
-- 
2.25.1

#82Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#77)
Re: Replication slot stats misgivings

On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the update! The patch looks good to me.

I have pushed the first patch. Comments on the next patch
v13-0001-Use-HTAB-for-replication-slot-statistics:
1.
+ /*
+ * Check for all replication slots in stats hash table. We do this check
+ * when replSlotStats has more than max_replication_slots entries, i.e,
+ * when there are stats for the already-dropped slot, to avoid frequent
+ * call SearchNamedReplicationSlot() which acquires LWLock.
+ */
+ if (replSlotStats && hash_get_num_entries(replSlotStats) >
max_replication_slots)
+ {
+ PgStat_ReplSlotEntry *slotentry;
+
+ hash_seq_init(&hstat, replSlotStats);
+ while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+ {
+ if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+ pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+ }
+ }

Is SearchNamedReplicationSlot() so frequently used that we need to do
this only when the hash table has entries more than
max_replication_slots? I think it would be better if we can do it
without such a condition to reduce the chances of missing the slot
stats. We don't have any such restrictions for any other cases in this
function.

I think it is better to add CHECK_FOR_INTERRUPTS in the above while loop?

2.
/*
* Replication slot statistics kept in the stats collector
*/
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry

I think the comment above this structure can be changed to "The
collector's data per slot" or something like that. Also, if we have to
follow table/function/db style, then probably this structure should be
named as PgStat_StatReplSlotEntry.

3.
- * create the statistics for the replication slot.
+ * create the statistics for the replication slot. In case where the
+ * message for dropping the old slot gets lost and a slot with the same is

/the same is/the same name is/.

Can we mention something similar to what you have added here in docs as well?

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

--
With Regards,
Amit Kapila.

#83Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#54)
Re: Replication slot stats misgivings

On Mon, Apr 12, 2021 at 2:57 PM vignesh C <vignesh21@gmail.com> wrote:

On Sat, Mar 20, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

I felt we can update the replication slot statistics data each time we
spill/stream the transaction data instead of accumulating the
statistics and updating at the end. I have tried this in the attached
patch and the statistics data were getting updated.
Thoughts?

Did you check if we can update the stats when we release the slot as
discussed above? I am not sure if it is easy to do at the time of slot
release because this information might not be accessible there and in
some cases, we might have already released the decoding
context/reorderbuffer where this information is stored. It might be
okay to update this when we stream or spill but let's see if we can do
it easily at the time of slot release.

--
With Regards,
Amit Kapila.

#84vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#83)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 2:57 PM vignesh C <vignesh21@gmail.com> wrote:

On Sat, Mar 20, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

I felt we can update the replication slot statistics data each time we
spill/stream the transaction data instead of accumulating the
statistics and updating at the end. I have tried this in the attached
patch and the statistics data were getting updated.
Thoughts?

Did you check if we can update the stats when we release the slot as
discussed above? I am not sure if it is easy to do at the time of slot
release because this information might not be accessible there and in
some cases, we might have already released the decoding
context/reorderbuffer where this information is stored. It might be
okay to update this when we stream or spill but let's see if we can do
it easily at the time of slot release.

I'm not sure if we will be able to update stats from here, as we will
not have access to decoding context/reorderbuffer at this place, and
also like you pointed out I noticed that the decoding context gets
released earlier itself.

Regards,
Vignesh

#85vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#82)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the update! The patch looks good to me.

I have pushed the first patch. Comments on the next patch
v13-0001-Use-HTAB-for-replication-slot-statistics:

Also should we change PGSTAT_FILE_FORMAT_ID as we have modified the
replication slot statistics?

Regards,
Vignesh

#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#82)
Re: Replication slot stats misgivings

Amit Kapila <amit.kapila16@gmail.com> writes:

I have pushed the first patch.

The buildfarm suggests that this isn't entirely stable:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Each of those animals has also passed at least once since this went in,
so I'm betting on a timing-dependent issue.

regards, tom lane

#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#86)
Re: Replication slot stats misgivings

I wrote:

The buildfarm suggests that this isn't entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Oh, I missed that hyrax is showing the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&amp;dt=2021-04-16%2007%3A05%3A44

So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
that way.

regards, tom lane

#88vignesh C
vignesh21@gmail.com
In reply to: Tom Lane (#87)
Re: Replication slot stats misgivings

On Sun, Apr 18, 2021 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

The buildfarm suggests that this isn't entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Oh, I missed that hyrax is showing the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&amp;dt=2021-04-16%2007%3A05%3A44

So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
that way.

I will try to check and identify why it is failing.

Regards,
Vignesh

#89Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#88)
Re: Replication slot stats misgivings

On Sun, Apr 18, 2021 at 7:36 AM vignesh C <vignesh21@gmail.com> wrote:

On Sun, Apr 18, 2021 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

The buildfarm suggests that this isn't entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Oh, I missed that hyrax is showing the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&amp;dt=2021-04-16%2007%3A05%3A44

So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
that way.

I will try to check and identify why it is failing.

I think the failure is due to the reason that in the new tests after
reset, we are not waiting for the stats message to be delivered as we
were doing in other cases. Also, for the new test (non-spilled case),
we need to decode changes as we are doing for other tests, otherwise,
it will show the old stats.

--
With Regards,
Amit Kapila.

#90vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#89)
Re: Replication slot stats misgivings

On Sun, Apr 18, 2021 at 8:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Apr 18, 2021 at 7:36 AM vignesh C <vignesh21@gmail.com> wrote:

On Sun, Apr 18, 2021 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

The buildfarm suggests that this isn't entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Oh, I missed that hyrax is showing the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&amp;dt=2021-04-16%2007%3A05%3A44

So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
that way.

I will try to check and identify why it is failing.

I think the failure is due to the reason that in the new tests after
reset, we are not waiting for the stats message to be delivered as we
were doing in other cases. Also, for the new test (non-spilled case),
we need to decode changes as we are doing for other tests, otherwise,
it will show the old stats.

I also felt that is the reason for the failure, I will fix and post a
patch for this.

Regards,
Vignesh

#91Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#89)
1 attachment(s)
Re: Replication slot stats misgivings

On Sun, Apr 18, 2021 at 12:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Apr 18, 2021 at 7:36 AM vignesh C <vignesh21@gmail.com> wrote:

On Sun, Apr 18, 2021 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

The buildfarm suggests that this isn't entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Oh, I missed that hyrax is showing the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&amp;dt=2021-04-16%2007%3A05%3A44

So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
that way.

I will try to check and identify why it is failing.

I think the failure is due to the reason that in the new tests after
reset, we are not waiting for the stats message to be delivered as we
were doing in other cases. Also, for the new test (non-spilled case),
we need to decode changes as we are doing for other tests, otherwise,
it will show the old stats.

Yes, also the following expectation in expected/stats.out is wrong:

SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes FROM pg_stat_replication_slots;
slot_name | spill_txns | spill_count | total_txns | total_bytes
-----------------+------------+-------------+------------+-------------
regression_slot | f | f | t | t
(1 row)

We should expect all values are 0. Please find attached the patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

fix_test_decoding_test.patchapplication/octet-stream; name=fix_test_decoding_test.patchDownload
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bc8e601eab..8be5360807 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -95,8 +95,8 @@ SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_
  regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
--- decode and check stats again.
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+-- decode and check stats again. consume all changes.
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 -------
   5002
@@ -120,8 +120,24 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
  
 (1 row)
 
--- non-spilled xact
+SELECT wait_for_decode_stats(true, false);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+-- Test for non-spilled xact. Since the logical decoding skips sending
+-- uninteresting transactions to the decoding output plugin, which could
+-- be counted as spill_txns/count but not as total_txns/bytes, we increase
+-- logical_decoding_work_mem to ensure not spill transactions.
+SET logical_decoding_work_mem TO '64MB';
 INSERT INTO stats_test values(generate_series(1, 10));
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+ count 
+-------
+    12
+(1 row)
+
 SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
@@ -131,9 +147,10 @@ SELECT wait_for_decode_stats(false, false);
 SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
     slot_name    | spill_txns | spill_count | total_txns | total_bytes 
 -----------------+------------+-------------+------------+-------------
- regression_slot | f          | f           | t          | t
+ regression_slot | t          | t           | t          | t
 (1 row)
 
+RESET logical_decoding_work_mem;
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
 BEGIN;
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 8c34aeced1..41bfd364e7 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -67,17 +67,24 @@ SELECT pg_stat_reset_replication_slot('regression_slot');
 SELECT wait_for_decode_stats(true, true);
 SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
--- decode and check stats again.
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+-- decode and check stats again. consume all changes.
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
 SELECT wait_for_decode_stats(false, true);
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 SELECT pg_stat_reset_replication_slot('regression_slot');
+SELECT wait_for_decode_stats(true, false);
 
--- non-spilled xact
+-- Test for non-spilled xact. Since the logical decoding skips sending
+-- uninteresting transactions to the decoding output plugin, which could
+-- be counted as spill_txns/count but not as total_txns/bytes, we increase
+-- logical_decoding_work_mem to ensure not spill transactions.
+SET logical_decoding_work_mem TO '64MB';
 INSERT INTO stats_test values(generate_series(1, 10));
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
 SELECT wait_for_decode_stats(false, false);
 SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+RESET logical_decoding_work_mem;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
#92vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#90)
1 attachment(s)
Re: Replication slot stats misgivings

On Sun, Apr 18, 2021 at 9:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Sun, Apr 18, 2021 at 8:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Apr 18, 2021 at 7:36 AM vignesh C <vignesh21@gmail.com> wrote:

On Sun, Apr 18, 2021 at 3:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

The buildfarm suggests that this isn't entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&amp;dt=2021-04-17%2011%3A14%3A49
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&amp;dt=2021-04-17%2016%3A30%3A15

Oh, I missed that hyrax is showing the identical symptom:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&amp;dt=2021-04-16%2007%3A05%3A44

So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
that way.

I will try to check and identify why it is failing.

I think the failure is due to the reason that in the new tests after
reset, we are not waiting for the stats message to be delivered as we
were doing in other cases. Also, for the new test (non-spilled case),
we need to decode changes as we are doing for other tests, otherwise,
it will show the old stats.

I also felt that is the reason for the failure, I will fix and post a
patch for this.

Attached a patch which includes the changes for the fix. I have moved
the non-spilled transaction test to reduce the steps which reduces
calling pg_logical_slot_get_changes before this test.

Regards,
Vignesh

Attachments:

buildfarm_failure.patchtext/x-patch; charset=US-ASCII; name=buildfarm_failure.patchDownload
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bc8e601eab..c537e38bfe 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -51,20 +51,15 @@ BEGIN
     extract(epoch from clock_timestamp() - start_time);
 END
 $$ LANGUAGE plpgsql;
--- spilling the xact
-BEGIN;
-INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
-COMMIT;
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+-- non-spilled xact
+INSERT INTO stats_test values(1);
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 -------
-  5002
+     3
 (1 row)
 
--- Check stats, wait for the stats collector to update. We can't test the
--- exact stats count as that can vary if any background transaction (say by
--- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false, true);
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 -----------------------
  
@@ -73,17 +68,17 @@ SELECT wait_for_decode_stats(false, true);
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
     slot_name    | spill_txns | spill_count | total_txns | total_bytes 
 -----------------+------------+-------------+------------+-------------
- regression_slot | t          | t           | t          | t
+ regression_slot | f          | f           | t          | t
 (1 row)
 
--- reset the slot stats, and wait for stats collector to reset
+-- reset the slot stats, and wait for stats collector's total txn to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
  pg_stat_reset_replication_slot 
 --------------------------------
  
 (1 row)
 
-SELECT wait_for_decode_stats(true, true);
+SELECT wait_for_decode_stats(true, false);
  wait_for_decode_stats 
 -----------------------
  
@@ -95,13 +90,19 @@ SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_
  regression_slot |          0 |           0 |          0 |           0
 (1 row)
 
--- decode and check stats again.
+-- spilling the xact
+BEGIN;
+INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
+COMMIT;
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 -------
   5002
 (1 row)
 
+-- Check stats, wait for the stats collector to update. We can't test the
+-- exact stats count as that can vary if any background transaction (say by
+-- autovacuum) happens in parallel to the main transaction.
 SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 -----------------------
@@ -114,24 +115,42 @@ SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count,
  regression_slot | t          | t           | t          | t
 (1 row)
 
+-- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
  pg_stat_reset_replication_slot 
 --------------------------------
  
 (1 row)
 
--- non-spilled xact
-INSERT INTO stats_test values(generate_series(1, 10));
-SELECT wait_for_decode_stats(false, false);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 -----------------------
  
 (1 row)
 
-SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
     slot_name    | spill_txns | spill_count | total_txns | total_bytes 
 -----------------+------------+-------------+------------+-------------
- regression_slot | f          | f           | t          | t
+ regression_slot |          0 |           0 |          0 |           0
+(1 row)
+
+-- decode and check stats again.
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+ count 
+-------
+  5002
+(1 row)
+
+SELECT wait_for_decode_stats(false, true);
+ wait_for_decode_stats 
+-----------------------
+ 
+(1 row)
+
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
+-----------------+------------+-------------+------------+-------------
+ regression_slot | t          | t           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 8c34aeced1..8a0451dffe 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -50,6 +50,17 @@ BEGIN
 END
 $$ LANGUAGE plpgsql;
 
+-- non-spilled xact
+INSERT INTO stats_test values(1);
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT wait_for_decode_stats(false, false);
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+
+-- reset the slot stats, and wait for stats collector's total txn to reset
+SELECT pg_stat_reset_replication_slot('regression_slot');
+SELECT wait_for_decode_stats(true, false);
+SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
+
 -- spilling the xact
 BEGIN;
 INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
@@ -72,13 +83,6 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL,
 SELECT wait_for_decode_stats(false, true);
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
-SELECT pg_stat_reset_replication_slot('regression_slot');
-
--- non-spilled xact
-INSERT INTO stats_test values(generate_series(1, 10));
-SELECT wait_for_decode_stats(false, false);
-SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
 BEGIN;
#93Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#82)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the update! The patch looks good to me.

I have pushed the first patch. Comments on the next patch
v13-0001-Use-HTAB-for-replication-slot-statistics:
1.
+ /*
+ * Check for all replication slots in stats hash table. We do this check
+ * when replSlotStats has more than max_replication_slots entries, i.e,
+ * when there are stats for the already-dropped slot, to avoid frequent
+ * call SearchNamedReplicationSlot() which acquires LWLock.
+ */
+ if (replSlotStats && hash_get_num_entries(replSlotStats) >
max_replication_slots)
+ {
+ PgStat_ReplSlotEntry *slotentry;
+
+ hash_seq_init(&hstat, replSlotStats);
+ while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+ {
+ if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+ pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+ }
+ }

Is SearchNamedReplicationSlot() so frequently used that we need to do
this only when the hash table has entries more than
max_replication_slots? I think it would be better if we can do it
without such a condition to reduce the chances of missing the slot
stats. We don't have any such restrictions for any other cases in this
function.

Please see below comment on #4.

I think it is better to add CHECK_FOR_INTERRUPTS in the above while loop?

Agreed.

2.
/*
* Replication slot statistics kept in the stats collector
*/
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_ReplSlotEntry

I think the comment above this structure can be changed to "The
collector's data per slot" or something like that. Also, if we have to
follow table/function/db style, then probably this structure should be
named as PgStat_StatReplSlotEntry.

Agreed.

3.
- * create the statistics for the replication slot.
+ * create the statistics for the replication slot. In case where the
+ * message for dropping the old slot gets lost and a slot with the same is

/the same is/the same name is/.

Can we mention something similar to what you have added here in docs as well?

Agreed.

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.
Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

As another design, we can get all stats from the hash table in one
shot as you suggested. If we do that, it's better to check garbage
slot stats every time pgstat_vacuum_stat() is called so the view
doesn't show those stats but cannot avoid that completely.

I'll change the code pointed out by #1 and #4 according to this design
discussion.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#94Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#91)
Re: Replication slot stats misgivings

On Sun, Apr 18, 2021 at 6:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, also the following expectation in expected/stats.out is wrong:

SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes FROM pg_stat_replication_slots;
slot_name | spill_txns | spill_count | total_txns | total_bytes
-----------------+------------+-------------+------------+-------------
regression_slot | f | f | t | t
(1 row)

We should expect all values are 0. Please find attached the patch.

Right. Both your and Vignesh's patch will fix the problem but I mildly
prefer Vignesh's one as that seems a bit simpler. So, I went ahead and
pushed his patch with minor other changes. Thanks to both of you.

--
With Regards,
Amit Kapila.

#95Amit Kapila
amit.kapila16@gmail.com
In reply to: Justin Pryzby (#80)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 8:50 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Apr 16, 2021 at 08:48:29AM +0530, Amit Kapila wrote:

I am fine with your proposed changes. There are one or two more
patches in this area. I can include your suggestions along with those
if you don't mind?

However's convenient is fine

Thanks for your suggestions. I have pushed your changes as part of the
commit c64dcc7fee.

--
With Regards,
Amit Kapila.

#96Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#93)
Re: Replication slot stats misgivings

On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
slot_name | total_txns
-----------+------------
s1 | 0
s2 | 0
|
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

--
With Regards,
Amit Kapila.

#97Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#96)
1 attachment(s)
Re: Replication slot stats misgivings

On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
slot_name | total_txns
-----------+------------
s1 | 0
s2 | 0
|
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

Good catch. I think it's better to set 0 to all counters and NULL to
reset_stats.

Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

Agreed.

I've attached the updated patch, please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v8-0001-Use-HTAB-for-replication-slot-statistics.patchapplication/octet-stream; name=v8-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From 699b629b15af2e5eee0fe5a9128436c1d1339343 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 19 Apr 2021 16:40:27 +0900
Subject: [PATCH v8] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where a meesage for dropping
a slot gets lost: 1) the stats for the new slot are not recorded and
2) writing beyond the end of the array when after restring the number
of slots whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes it to HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 doc/src/sgml/monitoring.sgml              |   9 +
 src/backend/catalog/system_views.sql      |  30 +--
 src/backend/postmaster/pgstat.c           | 268 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 141 +++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 10 files changed, 274 insertions(+), 227 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c44d087508..f8b2b8dc1a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2753,6 +2753,15 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    </tgroup>
   </table>
 
+  <note>
+   <para>
+    In the case where the reporting message for dropping a replication slot gets lost
+    and a slot with the same name is created, the replication slot statistics will be
+    accumulated into the old slot.  In that case, you can reset the particular
+    replication slot statistics by <function>pg_stat_reset_replication_slot()</function>.
+   </para>
+  </note>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-wal-receiver-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e96dd73280..fd95465b04 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..0fa8d36c37 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,23 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hashtable if they still exist.
+	 */
+	if (replSlotStats)
+	{
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1533,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1806,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2872,17 +2865,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_StatReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_StatReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3654,7 +3649,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3738,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3975,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +3999,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4185,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_StatReplSlotEntry slotbuf;
+				PgStat_StatReplSlotEntry *slotent;
+
+				if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotbuf.slotname, true);
+				memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4424,7 +4418,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_StatReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4547,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4759,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5182,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_StatReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5209,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5531,29 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_StatReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5749,59 +5731,79 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_StatReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_StatReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
+
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_StatReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..93b31a46f6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_ReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..3cc8879b51 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_StatReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_StatReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same
+	 * name is created, the stats will be accumulated into the old slots since
+	 * we use the slot name as the key. In that case, user can reset the
+	 * particular stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..3a7a7efcd6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,75 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/*
+ * Get the statistics for the replication slot. If the slot statistics is not
+ * available, return all-zeroes stats.
+ */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text		*slotname_text = PG_GETARG_TEXT_P(0);
+	NameData	slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_ReplSlotEntry *slotent;
+	PgStat_ReplSlotEntry allzero;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-	MemoryContextSwitchTo(oldcontext);
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
+	if (!slotent)
 	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
-
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
-
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
-
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+		/* If the slot is not found, set zero to all counters */
+		memset(&allzero, 0, sizeof(PgStat_ReplSlotEntry));
+		slotent = &allzero;
 	}
 
-	tuplestore_donestoring(tupstore);
+	values[0] = CStringGetTextDatum(NameStr(slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	return (Datum) 0;
+	if (slotent->stat_reset_timestamp == 0 || !slotent)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b62abcd22c..e4cd514992 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5308,14 +5308,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2aeb3cded4..b22a3a264f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_StatReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +929,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_StatReplSlotEntry;
 
 
 /*
@@ -1031,7 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1129,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_StatReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.24.3 (Apple Git-128)

#98vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#83)
1 attachment(s)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 12, 2021 at 2:57 PM vignesh C <vignesh21@gmail.com> wrote:

On Sat, Mar 20, 2021 at 9:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 20, 2021 at 12:22 AM Andres Freund <andres@anarazel.de> wrote:

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

We can probably update the stats each time we spilled or streamed the
transaction data but it was not clear at that stage whether or how
much it will be useful.

I felt we can update the replication slot statistics data each time we
spill/stream the transaction data instead of accumulating the
statistics and updating at the end. I have tried this in the attached
patch and the statistics data were getting updated.
Thoughts?

Did you check if we can update the stats when we release the slot as
discussed above? I am not sure if it is easy to do at the time of slot
release because this information might not be accessible there and in
some cases, we might have already released the decoding
context/reorderbuffer where this information is stored. It might be
okay to update this when we stream or spill but let's see if we can do
it easily at the time of slot release.

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Regards,
Vignesh

Attachments:

0001-Update-decoding-stats-while-releasing-replication-sl.patchtext/x-patch; charset=US-ASCII; name=0001-Update-decoding-stats-while-releasing-replication-sl.patchDownload
From 2b92c8f2824e1ef731b3c79fa50b6afb75314f76 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 19 Apr 2021 14:54:27 +0530
Subject: [PATCH] Update decoding stats while releasing replication slot.

Currently replication slot statistics are updated at commit/rollback, if
the transaction is interrupted the stats might not get updated. Fixed
this by updating statistics during replication slot release. As part of
the fix, moved the statistics variable from ReorderBuffer to
ReplicationSlot structure to make the stats variable accessible at ReplicationSlotRelease.
---
 src/backend/replication/logical/decode.c      |  6 +-
 src/backend/replication/logical/logical.c     | 71 +++++++++++--------
 .../replication/logical/reorderbuffer.c       | 39 +++++-----
 src/backend/replication/slot.c                |  6 ++
 src/include/replication/logical.h             |  1 -
 src/include/replication/reorderbuffer.h       | 23 ------
 src/include/replication/slot.h                | 24 +++++++
 7 files changed, 95 insertions(+), 75 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7924581cdc..6fbf22f345 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -750,7 +750,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats();
 }
 
 /*
@@ -832,7 +832,7 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats();
 }
 
 
@@ -889,7 +889,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/* update the decoding stats */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats();
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..22b30ffdef 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -197,6 +197,15 @@ StartupDecodingContext(List *output_plugin_options,
 		LWLockRelease(ProcArrayLock);
 	}
 
+	slot->spillTxns = 0;
+	slot->spillCount = 0;
+	slot->spillBytes = 0;
+	slot->streamTxns = 0;
+	slot->streamCount = 0;
+	slot->streamBytes = 0;
+	slot->totalTxns = 0;
+	slot->totalBytes = 0;
+
 	ctx->slot = slot;
 
 	ctx->reader = XLogReaderAllocate(wal_segment_size, NULL, cleanup_cb);
@@ -1770,44 +1779,46 @@ ResetLogicalStreamingState(void)
  * Report stats for a slot.
  */
 void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats()
 {
-	ReorderBuffer *rb = ctx->reorder;
 	PgStat_ReplSlotStats repSlotStat;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
+	if (slot->spillBytes <= 0 && slot->streamBytes <= 0 && slot->totalBytes <= 0)
 		return;
 
 	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
-		 rb,
-		 (long long) rb->spillTxns,
-		 (long long) rb->spillCount,
-		 (long long) rb->spillBytes,
-		 (long long) rb->streamTxns,
-		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes,
-		 (long long) rb->totalTxns,
-		 (long long) rb->totalBytes);
-
-	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
-	repSlotStat.spill_txns = rb->spillTxns;
-	repSlotStat.spill_count = rb->spillCount;
-	repSlotStat.spill_bytes = rb->spillBytes;
-	repSlotStat.stream_txns = rb->streamTxns;
-	repSlotStat.stream_count = rb->streamCount;
-	repSlotStat.stream_bytes = rb->streamBytes;
-	repSlotStat.total_txns = rb->totalTxns;
-	repSlotStat.total_bytes = rb->totalBytes;
+		 slot,
+		 (long long) slot->spillTxns,
+		 (long long) slot->spillCount,
+		 (long long) slot->spillBytes,
+		 (long long) slot->streamTxns,
+		 (long long) slot->streamCount,
+		 (long long) slot->streamBytes,
+		 (long long) slot->totalTxns,
+		 (long long) slot->totalBytes);
+
+	namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+	repSlotStat.spill_txns = slot->spillTxns;
+	repSlotStat.spill_count = slot->spillCount;
+	repSlotStat.spill_bytes = slot->spillBytes;
+	repSlotStat.stream_txns = slot->streamTxns;
+	repSlotStat.stream_count = slot->streamCount;
+	repSlotStat.stream_bytes = slot->streamBytes;
+	repSlotStat.total_txns = slot->totalTxns;
+	repSlotStat.total_bytes = slot->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 
-	rb->spillTxns = 0;
-	rb->spillCount = 0;
-	rb->spillBytes = 0;
-	rb->streamTxns = 0;
-	rb->streamCount = 0;
-	rb->streamBytes = 0;
-	rb->totalTxns = 0;
-	rb->totalBytes = 0;
+	slot->spillTxns = 0;
+	slot->spillCount = 0;
+	slot->spillBytes = 0;
+	slot->streamTxns = 0;
+	slot->streamCount = 0;
+	slot->streamBytes = 0;
+	slot->totalTxns = 0;
+	slot->totalBytes = 0;
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5cb484f032..700ccf542c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -344,15 +344,6 @@ ReorderBufferAllocate(void)
 	buffer->outbufsize = 0;
 	buffer->size = 0;
 
-	buffer->spillTxns = 0;
-	buffer->spillCount = 0;
-	buffer->spillBytes = 0;
-	buffer->streamTxns = 0;
-	buffer->streamCount = 0;
-	buffer->streamBytes = 0;
-	buffer->totalTxns = 0;
-	buffer->totalBytes = 0;
-
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
 	dlist_init(&buffer->toplevel_by_lsn);
@@ -1316,6 +1307,9 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 	ReorderBufferChange *change;
 	ReorderBufferIterTXNEntry *entry;
 	int32		off;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* nothing there anymore */
 	if (state->heap->bh_size == 0)
@@ -1369,7 +1363,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		slot->totalBytes += rb->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2018,6 +2012,9 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	ReorderBufferChange *volatile specinsert = NULL;
 	volatile bool stream_started = false;
 	ReorderBufferTXN *volatile curtxn = NULL;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* build data to be able to lookup the CommandIds of catalog tuples */
 	ReorderBufferBuildTupleCidHash(rb, txn);
@@ -2380,9 +2377,9 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 * which we have already accounted in ReorderBufferIterTXNNext.
 		 */
 		if (!rbtxn_is_streamed(txn))
-			rb->totalTxns++;
+			slot->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		slot->totalBytes += rb->size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3482,6 +3479,9 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	elog(DEBUG2, "spill %u changes in XID %u to disk",
 		 (uint32) txn->nentries_mem, txn->xid);
@@ -3543,11 +3543,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* update the statistics iff we have spilled anything */
 	if (spilled)
 	{
-		rb->spillCount += 1;
-		rb->spillBytes += size;
+		slot->spillCount += 1;
+		slot->spillBytes += size;
 
 		/* don't consider already serialized transactions */
-		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+		slot->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3818,6 +3818,9 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	CommandId	command_id;
 	Size		stream_bytes;
 	bool		txn_is_streamed;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* We can never reach here for a subtransaction. */
 	Assert(txn->toptxn == NULL);
@@ -3911,11 +3914,11 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	ReorderBufferProcessTXN(rb, txn, InvalidXLogRecPtr, snapshot_now,
 							command_id, true);
 
-	rb->streamCount += 1;
-	rb->streamBytes += stream_bytes;
+	slot->streamCount += 1;
+	slot->streamBytes += stream_bytes;
 
 	/* Don't consider already streamed transaction. */
-	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
+	slot->streamTxns += (txn_is_streamed) ? 0 : 1;
 
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..07e7901fa6 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -500,6 +500,12 @@ ReplicationSlotRelease(void)
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (SlotIsLogical(slot))
+	{
+		/* update the decoding stats that are present in the slot */
+		UpdateDecodingStats();
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 7dfcb7be18..f109e90692 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -134,6 +134,5 @@ extern bool filter_prepare_cb_wrapper(LogicalDecodingContext *ctx,
 									  TransactionId xid, const char *gid);
 extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id);
 extern void ResetLogicalStreamingState(void);
-extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
 
 #endif
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bfab8303ee..c8295fc9ed 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -602,29 +602,6 @@ struct ReorderBuffer
 
 	/* memory accounting */
 	Size		size;
-
-	/*
-	 * Statistics about transactions spilled to disk.
-	 *
-	 * A single transaction may be spilled repeatedly, which is why we keep
-	 * two different counters. For spilling, the transaction counter includes
-	 * both toplevel transactions and subtransactions.
-	 */
-	int64		spillTxns;		/* number of transactions spilled to disk */
-	int64		spillCount;		/* spill-to-disk invocation counter */
-	int64		spillBytes;		/* amount of data spilled to disk */
-
-	/* Statistics about transactions streamed to the decoding output plugin */
-	int64		streamTxns;		/* number of transactions streamed */
-	int64		streamCount;	/* streaming invocation counter */
-	int64		streamBytes;	/* amount of data streamed */
-
-	/*
-	 * Statistics about all the transactions sent to the decoding output
-	 * plugin
-	 */
-	int64		totalTxns;		/* total number of transactions sent */
-	int64		totalBytes;		/* total amount of data sent */
 };
 
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..d1f30e095c 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -172,6 +172,29 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * Statistics about transactions spilled to disk.
+	 *
+	 * A single transaction may be spilled repeatedly, which is why we keep
+	 * two different counters. For spilling, the transaction counter includes
+	 * both toplevel transactions and subtransactions.
+	 */
+	int64		spillTxns;		/* number of transactions spilled to disk */
+	int64		spillCount;		/* spill-to-disk invocation counter */
+	int64		spillBytes;		/* amount of data spilled to disk */
+
+	/* Statistics about transactions streamed to the decoding output plugin */
+	int64		streamTxns;		/* number of transactions streamed */
+	int64		streamCount;	/* streaming invocation counter */
+	int64		streamBytes;	/* amount of data streamed */
+
+	/*
+	 * Statistics about all the transactions sent to the decoding output
+	 * plugin
+	 */
+	int64		totalTxns;		/* total number of transactions sent */
+	int64		totalBytes;		/* total amount of data sent */
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -231,5 +254,6 @@ extern void StartupReplicationSlots(void);
 extern void CheckPointReplicationSlots(void);
 
 extern void CheckSlotRequirements(void);
+extern void UpdateDecodingStats(void);
 
 #endif							/* SLOT_H */
-- 
2.25.1

#99Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#97)
1 attachment(s)
Re: Replication slot stats misgivings

On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
slot_name | total_txns
-----------+------------
s1 | 0
s2 | 0
|
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

Good catch. I think it's better to set 0 to all counters and NULL to
reset_stats.

Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

Agreed.

I've attached the updated patch, please review it.

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v9-0001-Use-HTAB-for-replication-slot-statistics.patchapplication/octet-stream; name=v9-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From 5ae463675b9993fb41ed28d16bde35be949ae5c4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 19 Apr 2021 16:40:27 +0900
Subject: [PATCH v9] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where a meesage for dropping
a slot gets lost: 1) the stats for the new slot are not recorded and
2) writing beyond the end of the array when after restring the number
of slots whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes it to HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 doc/src/sgml/monitoring.sgml              |   9 +
 src/backend/catalog/system_views.sql      |  30 +--
 src/backend/postmaster/pgstat.c           | 268 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 141 +++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 10 files changed, 274 insertions(+), 227 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf083bb77..1157045531 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2753,6 +2753,15 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    </tgroup>
   </table>
 
+  <note>
+   <para>
+    In the case where the reporting message for dropping a replication slot gets lost
+    and a slot with the same name is created, the replication slot statistics will be
+    accumulated into the old slot.  In that case, you can reset the particular
+    replication slot statistics by <function>pg_stat_reset_replication_slot()</function>.
+   </para>
+  </note>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-wal-receiver-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e96dd73280..fd95465b04 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..0fa8d36c37 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,23 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Check for all replication slots in stats hashtable if they still exist.
+	 */
+	if (replSlotStats)
+	{
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1533,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1806,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2872,17 +2865,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_StatReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_StatReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3654,7 +3649,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3738,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3975,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +3999,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4185,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_StatReplSlotEntry slotbuf;
+				PgStat_StatReplSlotEntry *slotent;
+
+				if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotbuf.slotname, true);
+				memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4424,7 +4418,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_StatReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4547,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4759,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5182,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_StatReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5209,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5531,29 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats, (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE, NULL);
 	}
 	else
 	{
+		PgStat_StatReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5749,59 +5731,79 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_StatReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_StatReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStats,
+												   (void *) &name,
+												   create_it ? HASH_ENTER : HASH_FIND,
+												   &found);
+
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_StatReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..00543ede45 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_StatReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..3cc8879b51 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_StatReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_StatReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same
+	 * name is created, the stats will be accumulated into the old slots since
+	 * we use the slot name as the key. In that case, user can reset the
+	 * particular stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..55d2ec0862 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,75 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/*
+ * Get the statistics for the replication slot. If the slot statistics is not
+ * available, return all-zeroes stats.
+ */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text		*slotname_text = PG_GETARG_TEXT_P(0);
+	NameData	slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_StatReplSlotEntry *slotent;
+	PgStat_StatReplSlotEntry allzero;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-	MemoryContextSwitchTo(oldcontext);
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
+	if (!slotent)
 	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
-
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
-
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
-
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+		/* If the slot is not found, set zero to all counters */
+		memset(&allzero, 0, sizeof(PgStat_StatReplSlotEntry));
+		slotent = &allzero;
 	}
 
-	tuplestore_donestoring(tupstore);
+	values[0] = CStringGetTextDatum(NameStr(slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	return (Datum) 0;
+	if (slotent->stat_reset_timestamp == 0 || !slotent)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b62abcd22c..e4cd514992 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5308,14 +5308,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5c5920b0b5..5f466d9a80 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_StatReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +929,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_StatReplSlotEntry;
 
 
 /*
@@ -1031,7 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1129,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_StatReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.24.3 (Apple Git-128)

#100Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#99)
1 attachment(s)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

I was thinking about whether we can someway avoid the below risk:
In case where the
+ * message for dropping the old slot gets lost and a slot with the same
+ * name is created, the stats will be accumulated into the old slots since
+ * we use the slot name as the key. In that case, user can reset the
+ * particular stats by pg_stat_reset_replication_slot().

What if we send a separate message for create slot such that the stats
collector will initialize the entries even if the previous drop
message is lost or came later? If we do that then if the drop message
is lost, the create with same name won't accumulate the stats and if
the drop came later, it will remove the newly created stats but
anyway, later stats from the same slot will again create the slot
entry in the hash table.

Also, I think we can include the test case prepared by Vignesh in the email [1]/messages/by-id/CALDaNm3yBctNFE6X2FV_haRF4uue9okm1_DVE6ZANWvOV_CvYw@mail.gmail.com.

Apart from the above, I have made few minor modifications in the attached patch.
(a) + if (slotent->stat_reset_timestamp == 0 || !slotent)
I don't understand why second part of check is required? By this time
slotent will anyway have some valid value.

(b) + slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStats,
+    (void *) &name,
+    create_it ? HASH_ENTER : HASH_FIND,
+    &found);

It is better to use NameStr here.

(c) made various changes in comments and some other cosmetic changes.

[1]: /messages/by-id/CALDaNm3yBctNFE6X2FV_haRF4uue9okm1_DVE6ZANWvOV_CvYw@mail.gmail.com

--
With Regards,
Amit Kapila.

Attachments:

v10-0001-Use-HTAB-for-replication-slot-statistics.patchapplication/octet-stream; name=v10-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From 62abfc3ec8e30d8b8963765e6fcf8116e59d88f5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 19 Apr 2021 16:40:27 +0900
Subject: [PATCH v10] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for repilcation
slots. But this had two problems in case where a meesage for dropping
a slot gets lost: 1) the stats for the new slot are not recorded and
2) writing beyond the end of the array when after restring the number
of slots whose stats are stored in the stats file exceeds
max_replication_slots.

This commit changes it to HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() checks
if a slot for stats entry in the stats collector still exists or not.
Then send drop-slot-stats message.
---
 doc/src/sgml/monitoring.sgml              |   9 +
 src/backend/catalog/system_views.sql      |  30 +--
 src/backend/postmaster/pgstat.c           | 271 +++++++++++-----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  23 +-
 src/backend/utils/adt/pgstatfuncs.c       | 145 +++++++-----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |   8 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 10 files changed, 280 insertions(+), 228 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf083bb776..11570455316 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2753,6 +2753,15 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    </tgroup>
   </table>
 
+  <note>
+   <para>
+    In the case where the reporting message for dropping a replication slot gets lost
+    and a slot with the same name is created, the replication slot statistics will be
+    accumulated into the old slot.  In that case, you can reset the particular
+    replication slot statistics by <function>pg_stat_reset_replication_slot()</function>.
+   </para>
+  </note>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-wal-receiver-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e96dd732805..fd95465b04b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d6..a497cac0763 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Search for all the dead replication slots in stats hashtable and tell
+	 * the stats collector to drop them.
+	 */
+	if (replSlotStats)
+	{
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -2872,17 +2866,19 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_StatReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
+	PgStat_StatReplSlotEntry *slotent = NULL;
+
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	slotent = pgstat_get_replslot_entry(slotname, false);
+
+	return slotent;
 }
 
 /*
@@ -3654,7 +3650,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3739,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3976,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4000,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4186,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+			{
+				PgStat_StatReplSlotEntry slotbuf;
+				PgStat_StatReplSlotEntry *slotent;
+
+				if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
 									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
 					goto done;
 				}
-				nReplSlotStats++;
+
+				slotent = pgstat_get_replslot_entry(slotbuf.slotname, true);
+				memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
 				break;
+			}
 
 			case 'E':
 				goto done;
@@ -4424,7 +4419,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_StatReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4548,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4760,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5183,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_StatReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5210,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5532,31 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats,
+							   (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE,
+							   NULL);
 	}
 	else
 	{
+		PgStat_StatReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
 		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		slotent->spill_txns += msg->m_spill_txns;
+		slotent->spill_count += msg->m_spill_count;
+		slotent->spill_bytes += msg->m_spill_bytes;
+		slotent->stream_txns += msg->m_stream_txns;
+		slotent->stream_count += msg->m_stream_count;
+		slotent->stream_bytes += msg->m_stream_bytes;
+		slotent->total_txns += msg->m_total_txns;
+		slotent->total_bytes += msg->m_total_bytes;
 	}
 }
 
@@ -5749,59 +5734,79 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_StatReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_StatReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStats,
+													   (void *) NameStr(name),
+													   create_it ? HASH_ENTER : HASH_FIND,
+													   &found);
+
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_StatReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c676412..00543ede45a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_StatReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78d..3cc8879b512 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -329,8 +329,8 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 */
 	if (SlotIsLogical(slot))
 	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
+		PgStat_StatReplSlotEntry repSlotStat;
+		MemSet(&repSlotStat, 0, sizeof(PgStat_StatReplSlotEntry));
 		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 		pgstat_report_replslot(&repSlotStat);
 	}
@@ -349,17 +349,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +370,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +417,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +713,11 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot. In case where the
+	 * message for dropping the old slot gets lost and a slot with the same
+	 * name is created, the stats will be accumulated into the old slots since
+	 * we use the slot name as the key. In that case, user can reset the
+	 * particular stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a402..f2fd919e571 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,77 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/*
+ * Get the statistics for the replication slot. If the slot statistics is not
+ * available, return all-zeroes stats.
+ */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text		*slotname_text = PG_GETARG_TEXT_P(0);
+	NameData	slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_StatReplSlotEntry *slotent;
+	PgStat_StatReplSlotEntry allzero;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
+	if (!slotent)
 	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
-
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
-
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
-
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+		/*
+		 * If the slot is not found, initialise its stats. This is possible if
+		 * the create slot message is lost.
+		 */
+		memset(&allzero, 0, sizeof(PgStat_StatReplSlotEntry));
+		slotent = &allzero;
 	}
 
-	tuplestore_donestoring(tupstore);
+	values[0] = CStringGetTextDatum(NameStr(slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	return (Datum) 0;
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b62abcd22c0..e4cd5149925 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5308,14 +5308,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5c5920b0b5f..5f466d9a804 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -917,7 +917,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_StatReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +929,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_StatReplSlotEntry;
 
 
 /*
@@ -1031,7 +1031,7 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1129,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_StatReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50df..357068403a1 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef8..ce2ecb914be 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
-- 
2.28.0.windows.1

#101vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#99)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
slot_name | total_txns
-----------+------------
s1 | 0
s2 | 0
|
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

Good catch. I think it's better to set 0 to all counters and NULL to
reset_stats.

Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

Agreed.

I've attached the updated patch, please review it.

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

Thanks for the updated patch, few comments:
1) We can change "slotent = pgstat_get_replslot_entry(slotname,
false);" to "return pgstat_get_replslot_entry(slotname, false);" and
remove the slotent variable.

+       PgStat_StatReplSlotEntry *slotent = NULL;
+
        backend_read_statsfile();
-       *nslots_p = nReplSlotStats;
-       return replSlotStats;
+       slotent = pgstat_get_replslot_entry(slotname, false);
+
+       return slotent;

2) Should we change PGSTAT_FILE_FORMAT_ID as the statistic file format
has changed for replication statistics?

3) We can include PgStat_StatReplSlotEntry in typedefs.lst and remove
PgStat_ReplSlotStats from typedefs.lst

4) Few indentation issues are there, we can run pgindent on pgstat.c changes:
                        case 'R':
-                               if
(fread(&replSlotStats[nReplSlotStats], 1,
sizeof(PgStat_ReplSlotStats), fpin)
-                                       != sizeof(PgStat_ReplSlotStats))
+                       {
+                               PgStat_StatReplSlotEntry slotbuf;
+                               PgStat_StatReplSlotEntry *slotent;
+
+                               if (fread(&slotbuf, 1,
sizeof(PgStat_StatReplSlotEntry), fpin)
+                                       != sizeof(PgStat_StatReplSlotEntry))

Regards,
Vignesh

#102Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#100)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 6:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

I was thinking about whether we can someway avoid the below risk:
In case where the
+ * message for dropping the old slot gets lost and a slot with the same
+ * name is created, the stats will be accumulated into the old slots since
+ * we use the slot name as the key. In that case, user can reset the
+ * particular stats by pg_stat_reset_replication_slot().

What if we send a separate message for create slot such that the stats
collector will initialize the entries even if the previous drop
message is lost or came later? If we do that then if the drop message
is lost, the create with same name won't accumulate the stats and if
the drop came later, it will remove the newly created stats but
anyway, later stats from the same slot will again create the slot
entry in the hash table.

Sounds good to me. There is still little chance to happen if messages
for both creating and dropping slots with the same name got lost, but
it's unlikely to happen in practice.

Also, I think we can include the test case prepared by Vignesh in the email [1].

Apart from the above, I have made few minor modifications in the attached patch.
(a) + if (slotent->stat_reset_timestamp == 0 || !slotent)
I don't understand why second part of check is required? By this time
slotent will anyway have some valid value.

(b) + slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStats,
+    (void *) &name,
+    create_it ? HASH_ENTER : HASH_FIND,
+    &found);

It is better to use NameStr here.

(c) made various changes in comments and some other cosmetic changes.

All the above changes make sense to me.

I'll submit the updated patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#103Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#101)
1 attachment(s)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 7:22 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
slot_name | total_txns
-----------+------------
s1 | 0
s2 | 0
|
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

Good catch. I think it's better to set 0 to all counters and NULL to
reset_stats.

Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

Agreed.

I've attached the updated patch, please review it.

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

Thanks for the updated patch, few comments:

Thank you for the review comments.

1) We can change "slotent = pgstat_get_replslot_entry(slotname,
false);" to "return pgstat_get_replslot_entry(slotname, false);" and
remove the slotent variable.

+       PgStat_StatReplSlotEntry *slotent = NULL;
+
backend_read_statsfile();
-       *nslots_p = nReplSlotStats;
-       return replSlotStats;
+       slotent = pgstat_get_replslot_entry(slotname, false);
+
+       return slotent;

Fixed.

2) Should we change PGSTAT_FILE_FORMAT_ID as the statistic file format
has changed for replication statistics?

The struct name is changed but I think the statistics file format has
not changed by this patch. No?

3) We can include PgStat_StatReplSlotEntry in typedefs.lst and remove
PgStat_ReplSlotStats from typedefs.lst

Fixed.

4) Few indentation issues are there, we can run pgindent on pgstat.c changes:
case 'R':
-                               if
(fread(&replSlotStats[nReplSlotStats], 1,
sizeof(PgStat_ReplSlotStats), fpin)
-                                       != sizeof(PgStat_ReplSlotStats))
+                       {
+                               PgStat_StatReplSlotEntry slotbuf;
+                               PgStat_StatReplSlotEntry *slotent;
+
+                               if (fread(&slotbuf, 1,
sizeof(PgStat_StatReplSlotEntry), fpin)
+                                       != sizeof(PgStat_StatReplSlotEntry))

Fixed.

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.
Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v11-0001-Use-HTAB-for-replication-slot-statistics.patchapplication/octet-stream; name=v11-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From d3c27f50ab16f3ebfd90e762c908f8f15e9a6631 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 19 Apr 2021 16:40:27 +0900
Subject: [PATCH v11] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for replication
slots. But this had two problems in the cases where a message for
dropping a slot gets lost, which leaves the stats for already-dropped
slot in the array: 1) the stats for the new slot are not recorded
if the array is full and 2) writing beyond the end of the array when
after restarting the number of slots whose stats are stored in the
stats file exceeds max_replication_slots.

This commit changes it to HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() search
for all the dead replication slots in stats hashtable and tell the
collector to remove them. To not show the stats for the already-dropped
slots, pg_replication_slots view searches slot stats by the slot name
taken from pg_replication_slots.

Also, we send a message for creating a slot at slot creation,
initializing the stats. This reduces the possibility that the stats
are accumulated into the old slot stats when a message for dropping a
slot gets lost.
---
 contrib/test_decoding/t/001_repl_stats.pl |  76 +++++-
 doc/src/sgml/monitoring.sgml              |  10 +
 src/backend/catalog/system_views.sql      |  30 ++-
 src/backend/postmaster/pgstat.c           | 313 ++++++++++++----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  28 +-
 src/backend/utils/adt/pgstatfuncs.c       | 145 ++++++----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |  10 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 src/tools/pgindent/typedefs.list          |   2 +-
 12 files changed, 389 insertions(+), 247 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11b6cd9b9c..1a4fad301c 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 3;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -12,6 +13,20 @@ $node->init(allows_streaming => 'logical');
 $node->append_conf('postgresql.conf', 'synchronous_commit = on');
 $node->start;
 
+# Check that replicatoin slot stats are expected.
+sub test_slot_stats
+{
+	my ($node, $expected, $msg) = @_;
+
+	my $result = $node->safe_psql(
+		'postgres', qq[
+		SELECT slot_name, total_txns > 0 AS total_txn,
+			   total_bytes > 0 AS total_bytes
+			   FROM pg_stat_replication_slots
+			   ORDER BY slot_name]);
+	is($result, $expected, $msg);
+}
+
 # Create table.
 $node->safe_psql('postgres',
         "CREATE TABLE test_repl_stat(col1 int)");
@@ -57,14 +72,59 @@ $node->start;
 
 # Verify statistics data present in pg_stat_replication_slots are sane after
 # restart.
-my $result = $node->safe_psql('postgres',
-	"SELECT slot_name, total_txns > 0 AS total_txn,
-	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
-	ORDER BY slot_name"
-);
-is($result, qq(regression_slot1|t|t
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
+regression_slot2|t|t
+regression_slot3|t|t),
+	'check replication statistics are updated');
+
+# Test to remove one of the replication slots and adjust
+# max_replication_slots accordingly to the number of slots. This leads
+# to a mismatch between the number of slots present in the stats file and the
+# number of stats present in the shared memory, simulating the scenario for
+# drop slot message lost by the statistics collector process. We verify
+# replication statistics data is fine after restart.
+
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
+regression_slot2|t|t),
+	'check replication statistics after removing the slot directory');
+
+# Create the replication slot with the same name as the one that we just
+# removed, and verify if the statistics for the new slot are initialized.
+# We don't need to wait for the message for creating the slot to reach the
+# collector since pg_stat_replication_slots view shows all-zeroed statistics
+# even if the slot statistics don't exist in the hash table. However, if
+# the slot uses the old statistics used until the slot being dropped, the
+# view shows non-zeros statistics, which must not happen.
+$node->append_conf('postgresql.conf', 'max_replication_slots = 3');
+$node->restart;
+
+$node->safe_psql(
+	'postgres', qq[
+	SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
+]);
+
+# Verify only statistics for regression_slot3 are initialized.
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
 regression_slot2|t|t
-regression_slot3|t|t), 'check replication statistics are updated');
+regression_slot3|f|f),
+	'check replication statistics after re-creating the same name slot');
 
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5cf083bb77..d952c93e9c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2753,6 +2753,16 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    </tgroup>
   </table>
 
+  <note>
+   <para>
+    In the case where the reporting messages for both dropping and creating a
+    replication slot with the same name got lost, the replication slot statistics
+    will be accumulated into the old slot.  In that case, you can reset the
+    particular replication slot statistics by
+    <function>pg_stat_reset_replication_slot()</function>.
+   </para>
+  </note>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-wal-receiver-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e96dd73280..fd95465b04 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..1e6f132e7c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStats = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Search for all the dead replication slots in stats hashtable and tell
+	 * the stats collector to drop them.
+	 */
+	if (replSlotStats)
+	{
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1822,6 +1816,7 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
+	msg.m_create = false;
 	msg.m_drop = false;
 	msg.m_spill_txns = repSlotStat->spill_txns;
 	msg.m_spill_count = repSlotStat->spill_count;
@@ -1834,6 +1829,24 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
+/* ----------
+ * pgstat_report_replslot_create() -
+ *
+ *	Tell the collector about creating the replication slot.
+ * ----------
+ */
+void
+pgstat_report_replslot_create(const char *slotname)
+{
+	PgStat_MsgReplSlot msg;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
+	namestrcpy(&msg.m_slotname, slotname);
+	msg.m_create = true;
+	msg.m_drop = false;
+	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
+}
+
 /* ----------
  * pgstat_report_replslot_drop() -
  *
@@ -1847,6 +1860,7 @@ pgstat_report_replslot_drop(const char *slotname)
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, slotname);
+	msg.m_create = false;
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -2872,17 +2886,15 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_StatReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	return pgstat_get_replslot_entry(slotname, false);
 }
 
 /*
@@ -3654,7 +3666,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3755,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStats)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStats);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotentry, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3992,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4016,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4202,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
 				{
-					ereport(pgStatRunningInCollector ? LOG : WARNING,
-							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-					goto done;
+					PgStat_StatReplSlotEntry slotbuf;
+					PgStat_StatReplSlotEntry *slotent;
+
+					if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+						!= sizeof(PgStat_StatReplSlotEntry))
+					{
+						ereport(pgStatRunningInCollector ? LOG : WARNING,
+								(errmsg("corrupted statistics file \"%s\"",
+										statfile)));
+						goto done;
+					}
+
+					slotent = pgstat_get_replslot_entry(slotbuf.slotname, true);
+					memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
+					break;
 				}
-				nReplSlotStats++;
-				break;
 
 			case 'E':
 				goto done;
@@ -4424,7 +4435,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_StatReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4564,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4765,7 +4776,6 @@ pgstat_clear_snapshot(void)
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
 	replSlotStats = NULL;
-	nReplSlotStats = 0;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5199,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_StatReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStats == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStats);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
 		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5226,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5548,45 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
+		Assert(!msg->m_create);
+
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStats != NULL)
+			(void) hash_search(replSlotStats,
+							   (void *) NameStr(msg->m_slotname),
+							   HASH_REMOVE,
+							   NULL);
 	}
 	else
 	{
-		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		PgStat_StatReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
+		if (msg->m_create)
+		{
+			/*
+			 * If the message for dropping the slot with the same name gets
+			 * lost, slotent has stats for the old slot. So we initialize
+			 * all counters at slot creation.
+			 */
+			pgstat_reset_replslot(slotent, 0);
+		}
+		else
+		{
+			/* Update the replication slot statistics */
+			slotent->spill_txns += msg->m_spill_txns;
+			slotent->spill_count += msg->m_spill_count;
+			slotent->spill_bytes += msg->m_spill_bytes;
+			slotent->stream_txns += msg->m_stream_txns;
+			slotent->stream_count += msg->m_stream_count;
+			slotent->stream_bytes += msg->m_stream_bytes;
+			slotent->total_txns += msg->m_total_txns;
+			slotent->total_bytes += msg->m_total_bytes;
+		}
 	}
 }
 
@@ -5749,59 +5764,79 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_StatReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_StatReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	/*
+	 * Create the replication slot stats hash table if we don't have
+	 * it already.
+	 */
+	if (replSlotStats == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+
+		replSlotStats = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStats,
+													   (void *) NameStr(name),
+													   create_it ? HASH_ENTER : HASH_FIND,
+													   &found);
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
+
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		memset(slotent, 0, sizeof(PgStat_StatReplSlotEntry));
+		namestrcpy(&(slotent->slotname), NameStr(name));
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..00543ede45 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_StatReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..e0c96151ca 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,12 +328,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
-		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
-		pgstat_report_replslot(&repSlotStat);
-	}
+		pgstat_report_replslot_create(NameStr(slot->data.name));
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
@@ -349,17 +344,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +365,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +412,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -712,7 +708,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * and create messages while holding ReplicationSlotAllocationLock to
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
-	 * create the statistics for the replication slot.
+	 * create the statistics for the replication slot.  In the cases where the
+	 * message for dropping the old slot gets lost and a slot with the same
+	 * name is created, since the stats will be initialized by the message
+	 * for creating the slot the statistics are not accumulated into the
+	 * old slot unless the messages for both creating and dropping slots with
+	 * the same name got lost.  Just in case it happens, the user can reset
+	 * the particular stats by pg_stat_reset_replication_slot().
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..f2fd919e57 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,77 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/*
+ * Get the statistics for the replication slot. If the slot statistics is not
+ * available, return all-zeroes stats.
+ */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text		*slotname_text = PG_GETARG_TEXT_P(0);
+	NameData	slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_StatReplSlotEntry *slotent;
+	PgStat_StatReplSlotEntry allzero;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
+	if (!slotent)
 	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
-
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
-
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
-
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+		/*
+		 * If the slot is not found, initialise its stats. This is possible if
+		 * the create slot message is lost.
+		 */
+		memset(&allzero, 0, sizeof(PgStat_StatReplSlotEntry));
+		slotent = &allzero;
 	}
 
-	tuplestore_donestoring(tupstore);
+	values[0] = CStringGetTextDatum(NameStr(slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	return (Datum) 0;
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b62abcd22c..e4cd514992 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5308,14 +5308,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5c5920b0b5..1ce363e7d1 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
 	NameData	m_slotname;
+	bool		m_create;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +918,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_StatReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +930,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_StatReplSlotEntry;
 
 
 /*
@@ -1031,7 +1032,8 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_create(const char *slotname);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1131,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_StatReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c7aff677d4..878b67a276 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1870,12 +1870,12 @@ PgStat_MsgTabstat
 PgStat_MsgTempFile
 PgStat_MsgVacuum
 PgStat_MsgWal
-PgStat_ReplSlotStats
 PgStat_SLRUStats
 PgStat_Shared_Reset_Target
 PgStat_Single_Reset_Type
 PgStat_StatDBEntry
 PgStat_StatFuncEntry
+PgStat_StatReplSlotEntry
 PgStat_StatTabEntry
 PgStat_SubXactStatus
 PgStat_TableCounts
-- 
2.24.3 (Apple Git-128)

#104Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#103)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have one question:

+ /*
+ * Create the replication slot stats hash table if we don't have
+ * it already.
+ */
+ if (replSlotStats == NULL)
  {
- if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
- return i; /* found */
+ HASHCTL hash_ctl;
+
+ hash_ctl.keysize = sizeof(NameData);
+ hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+ hash_ctl.hcxt = pgStatLocalContext;
+
+ replSlotStats = hash_create("Replication slots hash",
+ PGSTAT_REPLSLOT_HASH_SIZE,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
  }

It seems to me that the patch is always creating a hash table in
pgStatLocalContext? AFAIU, we need to create it in pgStatLocalContext
when we read stats via backend_read_statsfile so that we can clear it
at the end of the transaction. The db/function stats seems to be doing
the same. Is there a reason why here we need to always create it in
pgStatLocalContext?

--
With Regards,
Amit Kapila.

#105vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#103)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:22 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 20, 2021 at 9:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 19, 2021 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4.
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
..
..
-/* Get the statistics for the replication slots */
+/* Get the statistics for the replication slot */
Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
{
#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
- ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ text *slotname_text = PG_GETARG_TEXT_P(0);
+ NameData slotname;

I think with the above changes getting all the slot stats has become
much costlier. Is there any reason why can't we get all the stats from
the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
slot_name | total_txns
-----------+------------
s1 | 0
s2 | 0
|
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

Good catch. I think it's better to set 0 to all counters and NULL to
reset_stats.

Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

Agreed.

I've attached the updated patch, please review it.

I've attached the new version patch that fixed the compilation error
reported off-line by Amit.

Thanks for the updated patch, few comments:

Thank you for the review comments.

1) We can change "slotent = pgstat_get_replslot_entry(slotname,
false);" to "return pgstat_get_replslot_entry(slotname, false);" and
remove the slotent variable.

+       PgStat_StatReplSlotEntry *slotent = NULL;
+
backend_read_statsfile();
-       *nslots_p = nReplSlotStats;
-       return replSlotStats;
+       slotent = pgstat_get_replslot_entry(slotname, false);
+
+       return slotent;

Fixed.

2) Should we change PGSTAT_FILE_FORMAT_ID as the statistic file format
has changed for replication statistics?

The struct name is changed but I think the statistics file format has
not changed by this patch. No?

I tried to create stats on head and then applied this patch and tried
reading the stats, it could not get the values, the backtrace for the
same is:
(gdb) bt
#0 0x000055fe12f8a93d in pg_detoast_datum (datum=0x7f7f7f7f7f7f7f7f)
at fmgr.c:1727
#1 0x000055fe12ec2a03 in pg_stat_get_replication_slot
(fcinfo=0x55fe1357e150) at pgstatfuncs.c:2316
#2 0x000055fe12b6af23 in ExecMakeTableFunctionResult
(setexpr=0x55fe13563c28, econtext=0x55fe13563b90,
argContext=0x55fe1357e030, expectedDesc=0x55fe13564968,
randomAccess=false) at execSRF.c:234
#3 0x000055fe12b87ba3 in FunctionNext (node=0x55fe13563a78) at
nodeFunctionscan.c:95
#4 0x000055fe12b6c929 in ExecScanFetch (node=0x55fe13563a78,
accessMtd=0x55fe12b87aee <FunctionNext>, recheckMtd=0x55fe12b87eea
<FunctionRecheck>) at execScan.c:133
#5 0x000055fe12b6c9a2 in ExecScan (node=0x55fe13563a78,
accessMtd=0x55fe12b87aee <FunctionNext>, recheckMtd=0x55fe12b87eea
<FunctionRecheck>) at execScan.c:182
#6 0x000055fe12b87f40 in ExecFunctionScan (pstate=0x55fe13563a78) at
nodeFunctionscan.c:270
#7 0x000055fe12b687eb in ExecProcNodeFirst (node=0x55fe13563a78) at
execProcnode.c:462
#8 0x000055fe12b5c713 in ExecProcNode (node=0x55fe13563a78) at
../../../src/include/executor/executor.h:257
#9 0x000055fe12b5f147 in ExecutePlan (estate=0x55fe135635f0,
planstate=0x55fe13563a78, use_parallel_mode=false,
operation=CMD_SELECT, sendTuples=true, numberTuples=0,
direction=ForwardScanDirection, dest=0x55fe13579558,
execute_once=true) at execMain.c:1551
#10 0x000055fe12b5cded in standard_ExecutorRun
(queryDesc=0x55fe1349acd0, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:361
#11 0x000055fe12b5cbfc in ExecutorRun (queryDesc=0x55fe1349acd0,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:305
#12 0x000055fe12dca9ce in PortalRunSelect (portal=0x55fe134ed2f0,
forward=true, count=0, dest=0x55fe13579558) at pquery.c:912
#13 0x000055fe12dca607 in PortalRun (portal=0x55fe134ed2f0,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x55fe13579558, altdest=0x55fe13579558,
qc=0x7ffefa53cd30) at pquery.c:756
#14 0x000055fe12dc3915 in exec_simple_query
(query_string=0x55fe134796e0 "select * from pg_stat_replication_slots
;") at postgres.c:1196

I feel we can change CATALOG_VERSION_NO so that we will get this error
"The database cluster was initialized with CATALOG_VERSION_NO
2021XXXXX, but the server was compiled with CATALOG_VERSION_NO
2021XXXXX." which will prevent the above issue.

Regards,
Vignesh

#106Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#105)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 9:39 AM vignesh C <vignesh21@gmail.com> wrote:

I feel we can change CATALOG_VERSION_NO so that we will get this error
"The database cluster was initialized with CATALOG_VERSION_NO
2021XXXXX, but the server was compiled with CATALOG_VERSION_NO
2021XXXXX." which will prevent the above issue.

Right, but we normally do that just before commit. We might want to
mention it in the commit message just as a Note so that we don't
forget to bump it before commit but otherwise, we don't need to change
it in the patch.

--
With Regards,
Amit Kapila.

#107vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#106)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 9:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 21, 2021 at 9:39 AM vignesh C <vignesh21@gmail.com> wrote:

I feel we can change CATALOG_VERSION_NO so that we will get this error
"The database cluster was initialized with CATALOG_VERSION_NO
2021XXXXX, but the server was compiled with CATALOG_VERSION_NO
2021XXXXX." which will prevent the above issue.

Right, but we normally do that just before commit. We might want to
mention it in the commit message just as a Note so that we don't
forget to bump it before commit but otherwise, we don't need to change
it in the patch.

Yes, that is fine with me.

Regards,
Vignesh

#108Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#103)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.

I am not sure how much useful your new test is because you are testing
it for slot name for which we have removed the slot file. It is not
related to stat messages this patch is sending. I think we can leave
that for now. One other minor comment:

- * create the statistics for the replication slot.
+ * create the statistics for the replication slot.  In the cases where the
+ * message for dropping the old slot gets lost and a slot with the same
+ * name is created, since the stats will be initialized by the message
+ * for creating the slot the statistics are not accumulated into the
+ * old slot unless the messages for both creating and dropping slots with
+ * the same name got lost.  Just in case it happens, the user can reset
+ * the particular stats by pg_stat_reset_replication_slot().

I think we can change it to something like: " XXX In case, the
messages for creation and drop slot of the same name get lost and
create happens before (auto)vacuum cleans up the dead slot, the stats
will be accumulated into the old slot. One can imagine having OIDs for
each slot to avoid the accumulation of stats but that doesn't seem
worth doing as in practice this won't happen frequently.". Also, I am
not sure after your recent change whether it is a good idea to mention
something in docs. What do you think?

--
With Regards,
Amit Kapila.

#109Dilip Kumar
dilipbalaut@gmail.com
In reply to: Masahiko Sawada (#103)
Re: Replication slot stats misgivings

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.
Please review it.

Overall the patch looks good to me. However, I have one question, I
did not understand the reason behind moving the below code from
"pgstat_reset_replslot_counter" to "pg_stat_reset_replication_slot"?

+        /*
+         * Check if the slot exists with the given name. It is possible that by
+         * the time this message is executed the slot is dropped but at least
+         * this check will ensure that the given name is for a valid slot.
+         */
+        slot = SearchNamedReplicationSlot(target, true);
+
+        if (!slot)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("replication slot \"%s\" does not exist",
+                            target)));
+
+        /*
+         * Nothing to do for physical slots as we collect stats only for
+         * logical slots.
+         */
+        if (SlotIsPhysical(slot))
+            PG_RETURN_VOID();
+    }
+
     pgstat_reset_replslot_counter(target);

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#110Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#104)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 12:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have one question:

+ /*
+ * Create the replication slot stats hash table if we don't have
+ * it already.
+ */
+ if (replSlotStats == NULL)
{
- if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
- return i; /* found */
+ HASHCTL hash_ctl;
+
+ hash_ctl.keysize = sizeof(NameData);
+ hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+ hash_ctl.hcxt = pgStatLocalContext;
+
+ replSlotStats = hash_create("Replication slots hash",
+ PGSTAT_REPLSLOT_HASH_SIZE,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
}

It seems to me that the patch is always creating a hash table in
pgStatLocalContext? AFAIU, we need to create it in pgStatLocalContext
when we read stats via backend_read_statsfile so that we can clear it
at the end of the transaction. The db/function stats seems to be doing
the same. Is there a reason why here we need to always create it in
pgStatLocalContext?

I wanted to avoid creating the hash table if there is no replication
slot. But as you pointed out, we create the hash table even when
lookup (i.g., create_it is false), which is bad. So I think we can
have pgstat_get_replslot_entry() return NULL without creating the hash
table if the hash table is NULL and create_it is false so that backend
processes don’t create the hash table, not via
backend_read_statsfile(). Or another idea would be to always create
the hash table in pgstat_read_statsfiles(). That way, it would
simplify the code but could waste the memory if there is no
replication slot. I slightly prefer the former but what do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#111Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#108)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.

I am not sure how much useful your new test is because you are testing
it for slot name for which we have removed the slot file. It is not
related to stat messages this patch is sending. I think we can leave
that for now.

I might be missing something but I think the test is related to the
message for creating a slot that initializes all counters. No? If
there is no that message, we will end up getting old stats if a
message for dropping slot gets lost (simulated by dropping slot file)
and the same name slot is created.

One other minor comment:

- * create the statistics for the replication slot.
+ * create the statistics for the replication slot.  In the cases where the
+ * message for dropping the old slot gets lost and a slot with the same
+ * name is created, since the stats will be initialized by the message
+ * for creating the slot the statistics are not accumulated into the
+ * old slot unless the messages for both creating and dropping slots with
+ * the same name got lost.  Just in case it happens, the user can reset
+ * the particular stats by pg_stat_reset_replication_slot().

I think we can change it to something like: " XXX In case, the
messages for creation and drop slot of the same name get lost and
create happens before (auto)vacuum cleans up the dead slot, the stats
will be accumulated into the old slot. One can imagine having OIDs for
each slot to avoid the accumulation of stats but that doesn't seem
worth doing as in practice this won't happen frequently.". Also, I am
not sure after your recent change whether it is a good idea to mention
something in docs. What do you think?

Both points make sense to me. I'll update the comment and remove the
mention in the doc in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#112Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#110)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 2:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 21, 2021 at 12:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have one question:

+ /*
+ * Create the replication slot stats hash table if we don't have
+ * it already.
+ */
+ if (replSlotStats == NULL)
{
- if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
- return i; /* found */
+ HASHCTL hash_ctl;
+
+ hash_ctl.keysize = sizeof(NameData);
+ hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+ hash_ctl.hcxt = pgStatLocalContext;
+
+ replSlotStats = hash_create("Replication slots hash",
+ PGSTAT_REPLSLOT_HASH_SIZE,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
}

It seems to me that the patch is always creating a hash table in
pgStatLocalContext? AFAIU, we need to create it in pgStatLocalContext
when we read stats via backend_read_statsfile so that we can clear it
at the end of the transaction. The db/function stats seems to be doing
the same. Is there a reason why here we need to always create it in
pgStatLocalContext?

I wanted to avoid creating the hash table if there is no replication
slot. But as you pointed out, we create the hash table even when
lookup (i.g., create_it is false), which is bad. So I think we can
have pgstat_get_replslot_entry() return NULL without creating the hash
table if the hash table is NULL and create_it is false so that backend
processes don’t create the hash table, not via
backend_read_statsfile(). Or another idea would be to always create
the hash table in pgstat_read_statsfiles(). That way, it would
simplify the code but could waste the memory if there is no
replication slot.

If you create it after reading 'R' message as we do in the case of 'D'
message then it won't waste any memory. So probably creating in
pgstat_read_statsfiles() would be better unless you see some other
problem with that.

--
With Regards,
Amit Kapila.

#113Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#111)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 2:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 21, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.

I am not sure how much useful your new test is because you are testing
it for slot name for which we have removed the slot file. It is not
related to stat messages this patch is sending. I think we can leave
that for now.

I might be missing something but I think the test is related to the
message for creating a slot that initializes all counters. No? If
there is no that message, we will end up getting old stats if a
message for dropping slot gets lost (simulated by dropping slot file)
and the same name slot is created.

The test is not waiting for a new slot creation message to reach the
stats collector. So, if the old slot data still exists in the file and
now when we read stats via backend, then won't there exists a chance
that old slot stats data still exists?

--
With Regards,
Amit Kapila.

#114Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#112)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 6:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 21, 2021 at 2:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 21, 2021 at 12:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have one question:

+ /*
+ * Create the replication slot stats hash table if we don't have
+ * it already.
+ */
+ if (replSlotStats == NULL)
{
- if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
- return i; /* found */
+ HASHCTL hash_ctl;
+
+ hash_ctl.keysize = sizeof(NameData);
+ hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+ hash_ctl.hcxt = pgStatLocalContext;
+
+ replSlotStats = hash_create("Replication slots hash",
+ PGSTAT_REPLSLOT_HASH_SIZE,
+ &hash_ctl,
+ HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
}

It seems to me that the patch is always creating a hash table in
pgStatLocalContext? AFAIU, we need to create it in pgStatLocalContext
when we read stats via backend_read_statsfile so that we can clear it
at the end of the transaction. The db/function stats seems to be doing
the same. Is there a reason why here we need to always create it in
pgStatLocalContext?

I wanted to avoid creating the hash table if there is no replication
slot. But as you pointed out, we create the hash table even when
lookup (i.g., create_it is false), which is bad. So I think we can
have pgstat_get_replslot_entry() return NULL without creating the hash
table if the hash table is NULL and create_it is false so that backend
processes don’t create the hash table, not via
backend_read_statsfile(). Or another idea would be to always create
the hash table in pgstat_read_statsfiles(). That way, it would
simplify the code but could waste the memory if there is no
replication slot.

If you create it after reading 'R' message as we do in the case of 'D'
message then it won't waste any memory. So probably creating in
pgstat_read_statsfiles() would be better unless you see some other
problem with that.

Yeah, I think that's the approach I mentioned as the former. I’ll
change in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#115Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#113)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 6:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 21, 2021 at 2:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 21, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.

I am not sure how much useful your new test is because you are testing
it for slot name for which we have removed the slot file. It is not
related to stat messages this patch is sending. I think we can leave
that for now.

I might be missing something but I think the test is related to the
message for creating a slot that initializes all counters. No? If
there is no that message, we will end up getting old stats if a
message for dropping slot gets lost (simulated by dropping slot file)
and the same name slot is created.

The test is not waiting for a new slot creation message to reach the
stats collector. So, if the old slot data still exists in the file and
now when we read stats via backend, then won't there exists a chance
that old slot stats data still exists?

You're right. We should wait for the message to reach the collector.
Or should we remove that test case?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#116Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#115)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 3:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The test is not waiting for a new slot creation message to reach the
stats collector. So, if the old slot data still exists in the file and
now when we read stats via backend, then won't there exists a chance
that old slot stats data still exists?

You're right. We should wait for the message to reach the collector.
Or should we remove that test case?

I feel we can remove it. I am not sure how much value this additional
test case is adding.

--
With Regards,
Amit Kapila.

#117Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Dilip Kumar (#109)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 4:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.
Please review it.

Overall the patch looks good to me. However, I have one question, I
did not understand the reason behind moving the below code from
"pgstat_reset_replslot_counter" to "pg_stat_reset_replication_slot"?

Andres pointed out that pgstat_reset_replslot_counter() acquires lwlock[1]/messages/by-id/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de:

---
- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.
---

I changed the code so that pgstat_reset_replslot_counter() doesn't
acquire directly lwlock but I think that it's appropriate to do the
existence check for slots in pgstatfunc.c rather than pgstat.c.

Regards,

[1]: /messages/by-id/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#118Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#116)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, Apr 21, 2021 at 7:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 21, 2021 at 3:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The test is not waiting for a new slot creation message to reach the
stats collector. So, if the old slot data still exists in the file and
now when we read stats via backend, then won't there exists a chance
that old slot stats data still exists?

You're right. We should wait for the message to reach the collector.
Or should we remove that test case?

I feel we can remove it. I am not sure how much value this additional
test case is adding.

Okay, removed.

I’ve attached the updated patch. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v12-0001-Use-HTAB-for-replication-slot-statistics.patchapplication/x-patch; name=v12-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From d3d6e7e2c059423c2277d0cc152077a063938391 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 19 Apr 2021 16:40:27 +0900
Subject: [PATCH v12] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for replication
slots. But this had two problems in the cases where a message for
dropping a slot gets lost, which leaves the stats for already-dropped
slot in the array: 1) the stats for the new slot are not recorded
if the array is full and 2) writing beyond the end of the array when
after restarting the number of slots whose stats are stored in the
stats file exceeds max_replication_slots.

This commit changes it to HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() search
for all the dead replication slots in stats hashtable and tell the
collector to remove them. To not show the stats for the already-dropped
slots, pg_replication_slots view searches slot stats by the slot name
taken from pg_replication_slots.

Also, we send a message for creating a slot at slot creation,
initializing the stats. This reduces the possibility that the stats
are accumulated into the old slot stats when a message for dropping a
slot gets lost.
---
 contrib/test_decoding/t/001_repl_stats.pl |  54 +++-
 src/backend/catalog/system_views.sql      |  30 +-
 src/backend/postmaster/pgstat.c           | 319 ++++++++++++----------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  26 +-
 src/backend/utils/adt/pgstatfuncs.c       | 145 ++++++----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |  10 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 src/tools/pgindent/typedefs.list          |   2 +-
 11 files changed, 359 insertions(+), 249 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11b6cd9b9c..ad19ad83c6 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -12,6 +13,20 @@ $node->init(allows_streaming => 'logical');
 $node->append_conf('postgresql.conf', 'synchronous_commit = on');
 $node->start;
 
+# Check that replicatoin slot stats are expected.
+sub test_slot_stats
+{
+	my ($node, $expected, $msg) = @_;
+
+	my $result = $node->safe_psql(
+		'postgres', qq[
+		SELECT slot_name, total_txns > 0 AS total_txn,
+			   total_bytes > 0 AS total_bytes
+			   FROM pg_stat_replication_slots
+			   ORDER BY slot_name]);
+	is($result, $expected, $msg);
+}
+
 # Create table.
 $node->safe_psql('postgres',
         "CREATE TABLE test_repl_stat(col1 int)");
@@ -57,20 +72,41 @@ $node->start;
 
 # Verify statistics data present in pg_stat_replication_slots are sane after
 # restart.
-my $result = $node->safe_psql('postgres',
-	"SELECT slot_name, total_txns > 0 AS total_txn,
-	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
-	ORDER BY slot_name"
-);
-is($result, qq(regression_slot1|t|t
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
 regression_slot2|t|t
-regression_slot3|t|t), 'check replication statistics are updated');
+regression_slot3|t|t),
+	'check replication statistics are updated');
+
+# Test to remove one of the replication slots and adjust
+# max_replication_slots accordingly to the number of slots. This leads
+# to a mismatch between the number of slots present in the stats file and the
+# number of stats present in the shared memory, simulating the scenario for
+# drop slot message lost by the statistics collector process. We verify
+# replication statistics data is fine after restart.
+
+$node->stop;
+my $datadir = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
+regression_slot2|t|t),
+	'check replication statistics after removing the slot file');
 
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
 $node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
 
 # shutdown
 $node->stop;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e96dd73280..fd95465b04 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..a5c0fc59dc 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStatHash = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Search for all the dead replication slots in stats hashtable and tell
+	 * the stats collector to drop them.
+	 */
+	if (replSlotStatHash)
+	{
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStatHash);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1822,6 +1816,7 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
+	msg.m_create = false;
 	msg.m_drop = false;
 	msg.m_spill_txns = repSlotStat->spill_txns;
 	msg.m_spill_count = repSlotStat->spill_count;
@@ -1834,6 +1829,24 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
+/* ----------
+ * pgstat_report_replslot_create() -
+ *
+ *	Tell the collector about creating the replication slot.
+ * ----------
+ */
+void
+pgstat_report_replslot_create(const char *slotname)
+{
+	PgStat_MsgReplSlot msg;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
+	namestrcpy(&msg.m_slotname, slotname);
+	msg.m_create = true;
+	msg.m_drop = false;
+	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
+}
+
 /* ----------
  * pgstat_report_replslot_drop() -
  *
@@ -1847,6 +1860,7 @@ pgstat_report_replslot_drop(const char *slotname)
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, slotname);
+	msg.m_create = false;
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -2872,17 +2886,15 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_StatReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	return pgstat_get_replslot_entry(slotname, false);
 }
 
 /*
@@ -3654,7 +3666,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3755,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStatHash)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_StatReplSlotEntry *slotent;
+
+		hash_seq_init(&hstat, replSlotStatHash);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotent, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3992,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4016,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4202,27 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
 				{
-					ereport(pgStatRunningInCollector ? LOG : WARNING,
-							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-					goto done;
+					PgStat_StatReplSlotEntry slotbuf;
+					PgStat_StatReplSlotEntry *slotent;
+
+					if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+						!= sizeof(PgStat_StatReplSlotEntry))
+					{
+						ereport(pgStatRunningInCollector ? LOG : WARNING,
+								(errmsg("corrupted statistics file \"%s\"",
+										statfile)));
+						goto done;
+					}
+
+					slotent = pgstat_get_replslot_entry(slotbuf.slotname, true);
+					memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
+					break;
 				}
-				nReplSlotStats++;
-				break;
 
 			case 'E':
 				goto done;
@@ -4424,7 +4435,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_StatReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4564,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a replication
 				 * slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4764,8 +4775,7 @@ pgstat_clear_snapshot(void)
 	/* Reset variables */
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
-	replSlotStats = NULL;
-	nReplSlotStats = 0;
+	replSlotStatHash = NULL;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5199,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_StatReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStatHash == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStatHash);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
-		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		/* Get the slot statistics to reset */
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5226,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5548,45 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
+		Assert(!msg->m_create);
+
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStatHash != NULL)
+			(void) hash_search(replSlotStatHash,
+							   (void *) &(msg->m_slotname),
+							   HASH_REMOVE,
+							   NULL);
 	}
 	else
 	{
-		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		PgStat_StatReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
+		if (msg->m_create)
+		{
+			/*
+			 * If the message for dropping the slot with the same name gets
+			 * lost, slotent has stats for the old slot. So we initialize
+			 * all counters at slot creation.
+			 */
+			pgstat_reset_replslot(slotent, 0);
+		}
+		else
+		{
+			/* Update the replication slot statistics */
+			slotent->spill_txns += msg->m_spill_txns;
+			slotent->spill_count += msg->m_spill_count;
+			slotent->spill_bytes += msg->m_spill_bytes;
+			slotent->stream_txns += msg->m_stream_txns;
+			slotent->stream_count += msg->m_stream_count;
+			slotent->stream_bytes += msg->m_stream_bytes;
+			slotent->total_txns += msg->m_total_txns;
+			slotent->total_bytes += msg->m_total_bytes;
+		}
 	}
 }
 
@@ -5749,59 +5764,81 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
  * create_it tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_StatReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create_it)
 {
-	int			i;
+	PgStat_StatReplSlotEntry *slotent;
+	bool	found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStatHash == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		/*
+		 * Quick return NULL if the hash table is empty and the caller
+		 * didn't request to create the entry.
+		 */
+		if (!create_it)
+			return NULL;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+		hash_ctl.hcxt = pgStatLocalContext;
+		replSlotStatHash = hash_create("Replication slots hash",
+									PGSTAT_REPLSLOT_HASH_SIZE,
+									&hash_ctl,
+									HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStatHash,
+													   (void *) &name,
+													   create_it ? HASH_ENTER : HASH_FIND,
+													   &found);
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create_it && !found);
+		return NULL;
+	}
+
+	/* initialize the entry */
+	if (create_it && !found)
+	{
+		namestrcpy(&(slotent->slotname), NameStr(name));
+		pgstat_reset_replslot(slotent, 0);
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..00543ede45 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_StatReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..c8d8df2889 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,12 +328,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
-		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
-		pgstat_report_replslot(&repSlotStat);
-	}
+		pgstat_report_replslot_create(NameStr(slot->data.name));
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
@@ -349,17 +344,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
 	ReplicationSlot	*slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +365,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +412,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -713,6 +709,12 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
 	 * create the statistics for the replication slot.
+	 *
+	 * XXX In case, the messages for creation and drop slot of the same name
+	 * get lost and create happens before (auto)vacuum cleans up the dead slot,
+	 * the stats will be accumulated into the old slot. One can imagine having
+	 * OIDs for each slot to avoid the accumulation of stats but that doesn't
+	 * seem worth doing as in practice this won't happen frequently.
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..f2fd919e57 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,32 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that by
+		 * the time this message is executed the slot is dropped but at least
+		 * this check will ensure that the given name is for a valid slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2305,77 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/*
+ * Get the statistics for the replication slot. If the slot statistics is not
+ * available, return all-zeroes stats.
+ */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text		*slotname_text = PG_GETARG_TEXT_P(0);
+	NameData	slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_StatReplSlotEntry *slotent;
+	PgStat_StatReplSlotEntry allzero;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
+	if (!slotent)
 	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
-
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
-
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
-
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+		/*
+		 * If the slot is not found, initialise its stats. This is possible if
+		 * the create slot message is lost.
+		 */
+		memset(&allzero, 0, sizeof(PgStat_StatReplSlotEntry));
+		slotent = &allzero;
 	}
 
-	tuplestore_donestoring(tupstore);
+	values[0] = CStringGetTextDatum(NameStr(slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	return (Datum) 0;
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b62abcd22c..e4cd514992 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5308,14 +5308,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5c5920b0b5..1ce363e7d1 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
 	NameData	m_slotname;
+	bool		m_create;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +918,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_StatReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +930,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_StatReplSlotEntry;
 
 
 /*
@@ -1031,7 +1032,8 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_create(const char *slotname);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1131,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_StatReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c7aff677d4..878b67a276 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1870,12 +1870,12 @@ PgStat_MsgTabstat
 PgStat_MsgTempFile
 PgStat_MsgVacuum
 PgStat_MsgWal
-PgStat_ReplSlotStats
 PgStat_SLRUStats
 PgStat_Shared_Reset_Target
 PgStat_Single_Reset_Type
 PgStat_StatDBEntry
 PgStat_StatFuncEntry
+PgStat_StatReplSlotEntry
 PgStat_StatTabEntry
 PgStat_SubXactStatus
 PgStat_TableCounts
-- 
2.24.3 (Apple Git-128)

#119Dilip Kumar
dilipbalaut@gmail.com
In reply to: Masahiko Sawada (#117)
Re: Replication slot stats misgivings

On Thu, Apr 22, 2021 at 7:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 21, 2021 at 4:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Apr 20, 2021 at 7:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the patch. In addition to the test Vignesh prepared, I
added one test for the message for creating a slot that checks if the
statistics are initialized after re-creating the same name slot.
Please review it.

Overall the patch looks good to me. However, I have one question, I
did not understand the reason behind moving the below code from
"pgstat_reset_replslot_counter" to "pg_stat_reset_replication_slot"?

Andres pointed out that pgstat_reset_replslot_counter() acquires lwlock[1]:

---
- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.
---

I changed the code so that pgstat_reset_replslot_counter() doesn't
acquire directly lwlock but I think that it's appropriate to do the
existence check for slots in pgstatfunc.c rather than pgstat.c.

Thanks for pointing that out. It makes sense to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#120Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#118)
Re: Replication slot stats misgivings

On Thu, Apr 22, 2021 at 8:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Few comments:
1.
I think we want stats collector to not use pgStatLocalContext unless
it has read the stats file similar to other cases. So probably, we
should allocate it in pgStatLocalContext when we read 'R' message in
pgstat_read_statsfiles. Also, the function pgstat_get_replslot_entry
should not use pgStatLocalContext to allocate the hash table.
2.
+ if (replSlotStatHash != NULL)
+ (void) hash_search(replSlotStatHash,
+ (void *) &(msg->m_slotname),
+ HASH_REMOVE,
+ NULL);

Why have you changed this part from using NameStr?
3.
+# Check that replicatoin slot stats are expected.

Typo. replicatoin/replication

--
With Regards,
Amit Kapila.

#121Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#120)
Re: Replication slot stats misgivings

On Thu, Apr 22, 2021 at 1:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 22, 2021 at 8:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Few comments:
1.
I think we want stats collector to not use pgStatLocalContext unless
it has read the stats file similar to other cases. So probably, we
should allocate it in pgStatLocalContext when we read 'R' message in
pgstat_read_statsfiles. Also, the function pgstat_get_replslot_entry
should not use pgStatLocalContext to allocate the hash table.

Agreed.

2.
+ if (replSlotStatHash != NULL)
+ (void) hash_search(replSlotStatHash,
+    (void *) &(msg->m_slotname),
+    HASH_REMOVE,
+    NULL);

Why have you changed this part from using NameStr?

I thought that since the hash table is created with the key size
sizeof(NameData) it's better to use NameData for searching as well.

3.
+# Check that replicatoin slot stats are expected.

Typo. replicatoin/replication

Will fix in the next version.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#122Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#121)
Re: Replication slot stats misgivings

On Thu, Apr 22, 2021 at 10:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 22, 2021 at 1:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 22, 2021 at 8:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2.
+ if (replSlotStatHash != NULL)
+ (void) hash_search(replSlotStatHash,
+    (void *) &(msg->m_slotname),
+    HASH_REMOVE,
+    NULL);

Why have you changed this part from using NameStr?

I thought that since the hash table is created with the key size
sizeof(NameData) it's better to use NameData for searching as well.

Fair enough. I think this will give the same result either way.

--
With Regards,
Amit Kapila.

#123Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#122)
1 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 22, 2021 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 22, 2021 at 10:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 22, 2021 at 1:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 22, 2021 at 8:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2.
+ if (replSlotStatHash != NULL)
+ (void) hash_search(replSlotStatHash,
+    (void *) &(msg->m_slotname),
+    HASH_REMOVE,
+    NULL);

Why have you changed this part from using NameStr?

I thought that since the hash table is created with the key size
sizeof(NameData) it's better to use NameData for searching as well.

Fair enough. I think this will give the same result either way.

I've attached the updated version patch.

Besides the review comment from Amit, I changed
pgstat_read_statsfiles() so that it doesn't use
spgstat_get_replslot_entry(). That’s because it was slightly unclear
why we create the hash table beforehand even though we call
pgstat_read_statsfiles() with ‘create’ = true. By this change, the
caller of pgstat_get_replslot_entry() with ‘create’ = true is only
pgstat_recv_replslot(), which makes the code clear and safe since
pgstat_recv_replslot() is used only by the collector. Also, I ran
pgindent on the modified files.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v13-0001-Use-HTAB-for-replication-slot-statistics.patchapplication/octet-stream; name=v13-0001-Use-HTAB-for-replication-slot-statistics.patchDownload
From 90ac490a59fa62a8c93f51574c30d2b5e9437178 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 19 Apr 2021 16:40:27 +0900
Subject: [PATCH v13] Use HTAB for replication slot statistics.

Previously, we used to use the array to store stats for replication
slots. But this had two problems in the cases where a message for
dropping a slot gets lost, which leaves the stats for already-dropped
slot in the array: 1) the stats for the new slot are not recorded
if the array is full and 2) writing beyond the end of the array when
after restarting the number of slots whose stats are stored in the
stats file exceeds max_replication_slots.

This commit changes it to HTAB for replication slot statistics,
resolving both problems. Instead, we have pgstat_vacuum_stat() search
for all the dead replication slots in stats hashtable and tell the
collector to remove them. To not show the stats for the already-dropped
slots, pg_replication_slots view searches slot stats by the slot name
taken from pg_replication_slots.

Also, we send a message for creating a slot at slot creation,
initializing the stats. This reduces the possibility that the stats
are accumulated into the old slot stats when a message for dropping a
slot gets lost.
---
 contrib/test_decoding/t/001_repl_stats.pl |  69 ++++-
 src/backend/catalog/system_views.sql      |  30 +-
 src/backend/postmaster/pgstat.c           | 340 +++++++++++++---------
 src/backend/replication/logical/logical.c |   2 +-
 src/backend/replication/slot.c            |  28 +-
 src/backend/utils/adt/pgstatfuncs.c       | 146 ++++++----
 src/include/catalog/pg_proc.dat           |  14 +-
 src/include/pgstat.h                      |  10 +-
 src/include/replication/slot.h            |   2 +-
 src/test/regress/expected/rules.out       |   4 +-
 src/tools/pgindent/typedefs.list          |   2 +-
 11 files changed, 388 insertions(+), 259 deletions(-)

diff --git a/contrib/test_decoding/t/001_repl_stats.pl b/contrib/test_decoding/t/001_repl_stats.pl
index 11b6cd9b9c..3ab0e80722 100644
--- a/contrib/test_decoding/t/001_repl_stats.pl
+++ b/contrib/test_decoding/t/001_repl_stats.pl
@@ -2,9 +2,10 @@
 # drop replication slot and restart.
 use strict;
 use warnings;
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 2;
 
 # Test set-up
 my $node = get_new_node('test');
@@ -12,9 +13,22 @@ $node->init(allows_streaming => 'logical');
 $node->append_conf('postgresql.conf', 'synchronous_commit = on');
 $node->start;
 
+# Check that replication slot stats are expected.
+sub test_slot_stats
+{
+	my ($node, $expected, $msg) = @_;
+
+	my $result = $node->safe_psql(
+		'postgres', qq[
+		SELECT slot_name, total_txns > 0 AS total_txn,
+			   total_bytes > 0 AS total_bytes
+			   FROM pg_stat_replication_slots
+			   ORDER BY slot_name]);
+	is($result, $expected, $msg);
+}
+
 # Create table.
-$node->safe_psql('postgres',
-        "CREATE TABLE test_repl_stat(col1 int)");
+$node->safe_psql('postgres', "CREATE TABLE test_repl_stat(col1 int)");
 
 # Create replication slots.
 $node->safe_psql(
@@ -26,7 +40,8 @@ $node->safe_psql(
 ]);
 
 # Insert some data.
-$node->safe_psql('postgres', "INSERT INTO test_repl_stat values(generate_series(1, 5));");
+$node->safe_psql('postgres',
+	"INSERT INTO test_repl_stat values(generate_series(1, 5));");
 
 $node->safe_psql(
 	'postgres', qq[
@@ -50,27 +65,51 @@ $node->poll_query_until(
 
 # Test to drop one of the replication slot and verify replication statistics data is
 # fine after restart.
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot4')");
+$node->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('regression_slot4')");
 
 $node->stop;
 $node->start;
 
 # Verify statistics data present in pg_stat_replication_slots are sane after
 # restart.
-my $result = $node->safe_psql('postgres',
-	"SELECT slot_name, total_txns > 0 AS total_txn,
-	total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots
-	ORDER BY slot_name"
-);
-is($result, qq(regression_slot1|t|t
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
 regression_slot2|t|t
-regression_slot3|t|t), 'check replication statistics are updated');
+regression_slot3|t|t),
+	'check replication statistics are updated');
+
+# Test to remove one of the replication slots and adjust
+# max_replication_slots accordingly to the number of slots. This leads
+# to a mismatch between the number of slots present in the stats file and the
+# number of stats present in the shared memory, simulating the scenario for
+# drop slot message lost by the statistics collector process. We verify
+# replication statistics data is fine after restart.
+
+$node->stop;
+my $datadir           = $node->data_dir;
+my $slot3_replslotdir = "$datadir/pg_replslot/regression_slot3";
+
+rmtree($slot3_replslotdir);
+
+$node->append_conf('postgresql.conf', 'max_replication_slots = 2');
+$node->start;
+
+# Verify statistics data present in pg_stat_replication_slots are sane after
+# restart.
+test_slot_stats(
+	$node,
+	qq(regression_slot1|t|t
+regression_slot2|t|t),
+	'check replication statistics after removing the slot file');
 
 # cleanup
 $node->safe_psql('postgres', "DROP TABLE test_repl_stat");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot1')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot2')");
-$node->safe_psql('postgres', "SELECT pg_drop_replication_slot('regression_slot3')");
+$node->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('regression_slot1')");
+$node->safe_psql('postgres',
+	"SELECT pg_drop_replication_slot('regression_slot2')");
 
 # shutdown
 $node->stop;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e96dd73280..fd95465b04 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -866,20 +866,6 @@ CREATE VIEW pg_stat_replication AS
         JOIN pg_stat_get_wal_senders() AS W ON (S.pid = W.pid)
         LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
 
-CREATE VIEW pg_stat_replication_slots AS
-    SELECT
-            s.slot_name,
-            s.spill_txns,
-            s.spill_count,
-            s.spill_bytes,
-            s.stream_txns,
-            s.stream_count,
-            s.stream_bytes,
-            s.total_txns,
-            s.total_bytes,
-            s.stats_reset
-    FROM pg_stat_get_replication_slots() AS s;
-
 CREATE VIEW pg_stat_slru AS
     SELECT
             s.name,
@@ -984,6 +970,22 @@ CREATE VIEW pg_replication_slots AS
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
+CREATE VIEW pg_stat_replication_slots AS
+    SELECT
+            s.slot_name,
+            s.spill_txns,
+            s.spill_count,
+            s.spill_bytes,
+            s.stream_txns,
+            s.stream_count,
+            s.stream_bytes,
+            s.total_txns,
+            s.total_bytes,
+            s.stats_reset
+    FROM pg_replication_slots as r,
+        LATERAL pg_stat_get_replication_slot(slot_name) as s
+    WHERE r.datoid IS NOT NULL; -- excluding physical slots
+
 CREATE VIEW pg_stat_database AS
     SELECT
             D.oid AS datid,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e1ec7d8b7d..af5efdc094 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -106,6 +106,7 @@
 #define PGSTAT_DB_HASH_SIZE		16
 #define PGSTAT_TAB_HASH_SIZE	512
 #define PGSTAT_FUNCTION_HASH_SIZE	512
+#define PGSTAT_REPLSLOT_HASH_SIZE	32
 
 
 /* ----------
@@ -278,8 +279,7 @@ static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
 static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
-static PgStat_ReplSlotStats *replSlotStats;
-static int	nReplSlotStats;
+static HTAB *replSlotStatHash = NULL;
 static PgStat_RecoveryPrefetchStats recoveryPrefetchStats;
 
 /*
@@ -319,8 +319,8 @@ static void backend_read_statsfile(void);
 static bool pgstat_write_statsfile_needed(void);
 static bool pgstat_db_requested(Oid databaseid);
 
-static int	pgstat_replslot_index(const char *name, bool create_it);
-static void pgstat_reset_replslot(int i, TimestampTz ts);
+static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
+static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
@@ -1109,6 +1109,24 @@ pgstat_vacuum_stat(void)
 	/* Clean up */
 	hash_destroy(htab);
 
+	/*
+	 * Search for all the dead replication slots in stats hashtable and tell
+	 * the stats collector to drop them.
+	 */
+	if (replSlotStatHash)
+	{
+		PgStat_StatReplSlotEntry *slotentry;
+
+		hash_seq_init(&hstat, replSlotStatHash);
+		while ((slotentry = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
+				pgstat_report_replslot_drop(NameStr(slotentry->slotname));
+		}
+	}
+
 	/*
 	 * Lookup our own database entry; if not found, nothing more to do.
 	 */
@@ -1516,30 +1534,6 @@ pgstat_reset_replslot_counter(const char *name)
 
 	if (name)
 	{
-		ReplicationSlot *slot;
-
-		/*
-		 * Check if the slot exists with the given name. It is possible that by
-		 * the time this message is executed the slot is dropped but at least
-		 * this check will ensure that the given name is for a valid slot.
-		 */
-		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
-		slot = SearchNamedReplicationSlot(name);
-		LWLockRelease(ReplicationSlotControlLock);
-
-		if (!slot)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("replication slot \"%s\" does not exist",
-							name)));
-
-		/*
-		 * Nothing to do for physical slots as we collect stats only for
-		 * logical slots.
-		 */
-		if (SlotIsPhysical(slot))
-			return;
-
 		namestrcpy(&msg.m_slotname, name);
 		msg.clearall = false;
 	}
@@ -1813,7 +1807,7 @@ pgstat_report_tempfile(size_t filesize)
  * ----------
  */
 void
-pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
+pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat)
 {
 	PgStat_MsgReplSlot msg;
 
@@ -1822,6 +1816,7 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	 */
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, NameStr(repSlotStat->slotname));
+	msg.m_create = false;
 	msg.m_drop = false;
 	msg.m_spill_txns = repSlotStat->spill_txns;
 	msg.m_spill_count = repSlotStat->spill_count;
@@ -1834,6 +1829,24 @@ pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat)
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
 
+/* ----------
+ * pgstat_report_replslot_create() -
+ *
+ *	Tell the collector about creating the replication slot.
+ * ----------
+ */
+void
+pgstat_report_replslot_create(const char *slotname)
+{
+	PgStat_MsgReplSlot msg;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
+	namestrcpy(&msg.m_slotname, slotname);
+	msg.m_create = true;
+	msg.m_drop = false;
+	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
+}
+
 /* ----------
  * pgstat_report_replslot_drop() -
  *
@@ -1847,6 +1860,7 @@ pgstat_report_replslot_drop(const char *slotname)
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REPLSLOT);
 	namestrcpy(&msg.m_slotname, slotname);
+	msg.m_create = false;
 	msg.m_drop = true;
 	pgstat_send(&msg, sizeof(PgStat_MsgReplSlot));
 }
@@ -2872,17 +2886,15 @@ pgstat_fetch_slru(void)
  * pgstat_fetch_replslot() -
  *
  *	Support function for the SQL-callable pgstat* functions. Returns
- *	a pointer to the replication slot statistics struct and sets the
- *	number of entries in nslots_p.
+ *	a pointer to the replication slot statistics struct.
  * ---------
  */
-PgStat_ReplSlotStats *
-pgstat_fetch_replslot(int *nslots_p)
+PgStat_StatReplSlotEntry *
+pgstat_fetch_replslot(NameData slotname)
 {
 	backend_read_statsfile();
 
-	*nslots_p = nReplSlotStats;
-	return replSlotStats;
+	return pgstat_get_replslot_entry(slotname, false);
 }
 
 /*
@@ -3654,7 +3666,6 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
 	const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename;
 	int			rc;
-	int			i;
 
 	elog(DEBUG2, "writing stats file \"%s\"", statfile);
 
@@ -3744,11 +3755,17 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	/*
 	 * Write replication slot stats struct
 	 */
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStatHash)
 	{
-		fputc('R', fpout);
-		rc = fwrite(&replSlotStats[i], sizeof(PgStat_ReplSlotStats), 1, fpout);
-		(void) rc;				/* we'll check for error with ferror */
+		PgStat_StatReplSlotEntry *slotent;
+
+		hash_seq_init(&hstat, replSlotStatHash);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL)
+		{
+			fputc('R', fpout);
+			rc = fwrite(slotent, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
+			(void) rc;				/* we'll check for error with ferror */
+		}
 	}
 
 	/*
@@ -3975,12 +3992,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl,
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
-	/* Allocate the space for replication slot statistics */
-	replSlotStats = MemoryContextAllocZero(pgStatLocalContext,
-										   max_replication_slots
-										   * sizeof(PgStat_ReplSlotStats));
-	nReplSlotStats = 0;
-
 	/*
 	 * Clear out global, archiver, WAL and SLRU statistics so they start from
 	 * zero in case we can't load an existing statsfile.
@@ -4005,12 +4016,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	for (i = 0; i < SLRU_NUM_ELEMENTS; i++)
 		slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
-	/*
-	 * Set the same reset timestamp for all replication slots too.
-	 */
-	for (i = 0; i < max_replication_slots; i++)
-		replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;
-
 	/*
 	 * Try to open the stats file. If it doesn't exist, the backends simply
 	 * return zero for anything and the collector simply starts from scratch
@@ -4197,21 +4202,43 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
-				 * slot follows.
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a
+				 * replication slot follows.
 				 */
 			case 'R':
-				if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
 				{
-					ereport(pgStatRunningInCollector ? LOG : WARNING,
-							(errmsg("corrupted statistics file \"%s\"",
-									statfile)));
-					memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-					goto done;
+					PgStat_StatReplSlotEntry slotbuf;
+					PgStat_StatReplSlotEntry *slotent;
+
+					if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+						!= sizeof(PgStat_StatReplSlotEntry))
+					{
+						ereport(pgStatRunningInCollector ? LOG : WARNING,
+								(errmsg("corrupted statistics file \"%s\"",
+										statfile)));
+						goto done;
+					}
+
+					/* Create hash table if we don't have it already. */
+					if (replSlotStatHash == NULL)
+					{
+						HASHCTL		hash_ctl;
+
+						hash_ctl.keysize = sizeof(NameData);
+						hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+						hash_ctl.hcxt = pgStatLocalContext;
+						replSlotStatHash = hash_create("Replication slots hash",
+													   PGSTAT_REPLSLOT_HASH_SIZE,
+													   &hash_ctl,
+													   HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+					}
+
+					slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStatHash,
+																	   (void *) &slotbuf.slotname,
+																	   HASH_ENTER, NULL);
+					memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
+					break;
 				}
-				nReplSlotStats++;
-				break;
 
 			case 'E':
 				goto done;
@@ -4424,7 +4451,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_ArchiverStats myArchiverStats;
 	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
-	PgStat_ReplSlotStats myReplSlotStats;
+	PgStat_StatReplSlotEntry myReplSlotStats;
 	PgStat_RecoveryPrefetchStats myRecoveryPrefetchStats;
 	FILE	   *fpin;
 	int32		format_id;
@@ -4553,12 +4580,12 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 				break;
 
 				/*
-				 * 'R'	A PgStat_ReplSlotStats struct describing a replication
-				 * slot follows.
+				 * 'R'	A PgStat_StatReplSlotEntry struct describing a
+				 * replication slot follows.
 				 */
 			case 'R':
-				if (fread(&myReplSlotStats, 1, sizeof(PgStat_ReplSlotStats), fpin)
-					!= sizeof(PgStat_ReplSlotStats))
+				if (fread(&myReplSlotStats, 1, sizeof(PgStat_StatReplSlotEntry), fpin)
+					!= sizeof(PgStat_StatReplSlotEntry))
 				{
 					ereport(pgStatRunningInCollector ? LOG : WARNING,
 							(errmsg("corrupted statistics file \"%s\"",
@@ -4764,8 +4791,7 @@ pgstat_clear_snapshot(void)
 	/* Reset variables */
 	pgStatLocalContext = NULL;
 	pgStatDBHash = NULL;
-	replSlotStats = NULL;
-	nReplSlotStats = 0;
+	replSlotStatHash = NULL;
 
 	/*
 	 * Historically the backend_status.c facilities lived in this file, and
@@ -5189,20 +5215,26 @@ static void
 pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 								 int len)
 {
-	int			i;
-	int			idx = -1;
+	PgStat_StatReplSlotEntry *slotent;
 	TimestampTz ts;
 
+	/* Return if we don't have replication slot statistics */
+	if (replSlotStatHash == NULL)
+		return;
+
 	ts = GetCurrentTimestamp();
 	if (msg->clearall)
 	{
-		for (i = 0; i < nReplSlotStats; i++)
-			pgstat_reset_replslot(i, ts);
+		HASH_SEQ_STATUS sstat;
+
+		hash_seq_init(&sstat, replSlotStatHash);
+		while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&sstat)) != NULL)
+			pgstat_reset_replslot(slotent, ts);
 	}
 	else
 	{
-		/* Get the index of replication slot statistics to reset */
-		idx = pgstat_replslot_index(NameStr(msg->m_slotname), false);
+		/* Get the slot statistics to reset */
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, false);
 
 		/*
 		 * Nothing to do if the given slot entry is not found.  This could
@@ -5210,11 +5242,11 @@ pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg,
 		 * corresponding statistics entry is also removed before receiving the
 		 * reset message.
 		 */
-		if (idx < 0)
+		if (!slotent)
 			return;
 
 		/* Reset the stats for the requested replication slot */
-		pgstat_reset_replslot(idx, ts);
+		pgstat_reset_replslot(slotent, ts);
 	}
 }
 
@@ -5532,46 +5564,45 @@ pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len)
 static void
 pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
 {
-	int			idx;
-
-	/*
-	 * Get the index of replication slot statistics.  On dropping, we don't
-	 * create the new statistics.
-	 */
-	idx = pgstat_replslot_index(NameStr(msg->m_slotname), !msg->m_drop);
-
-	/*
-	 * The slot entry is not found or there is no space to accommodate the new
-	 * entry.  This could happen when the message for the creation of a slot
-	 * reached before the drop message even though the actual operations
-	 * happen in reverse order.  In such a case, the next update of the
-	 * statistics for the same slot will create the required entry.
-	 */
-	if (idx < 0)
-		return;
-
-	/* it must be a valid replication slot index */
-	Assert(idx < nReplSlotStats);
-
 	if (msg->m_drop)
 	{
+		Assert(!msg->m_create);
+
 		/* Remove the replication slot statistics with the given name */
-		if (idx < nReplSlotStats - 1)
-			memcpy(&replSlotStats[idx], &replSlotStats[nReplSlotStats - 1],
-				   sizeof(PgStat_ReplSlotStats));
-		nReplSlotStats--;
+		if (replSlotStatHash != NULL)
+			(void) hash_search(replSlotStatHash,
+							   (void *) &(msg->m_slotname),
+							   HASH_REMOVE,
+							   NULL);
 	}
 	else
 	{
-		/* Update the replication slot statistics */
-		replSlotStats[idx].spill_txns += msg->m_spill_txns;
-		replSlotStats[idx].spill_count += msg->m_spill_count;
-		replSlotStats[idx].spill_bytes += msg->m_spill_bytes;
-		replSlotStats[idx].stream_txns += msg->m_stream_txns;
-		replSlotStats[idx].stream_count += msg->m_stream_count;
-		replSlotStats[idx].stream_bytes += msg->m_stream_bytes;
-		replSlotStats[idx].total_txns += msg->m_total_txns;
-		replSlotStats[idx].total_bytes += msg->m_total_bytes;
+		PgStat_StatReplSlotEntry *slotent;
+
+		slotent = pgstat_get_replslot_entry(msg->m_slotname, true);
+		Assert(slotent);
+
+		if (msg->m_create)
+		{
+			/*
+			 * If the message for dropping the slot with the same name gets
+			 * lost, slotent has stats for the old slot. So we initialize all
+			 * counters at slot creation.
+			 */
+			pgstat_reset_replslot(slotent, 0);
+		}
+		else
+		{
+			/* Update the replication slot statistics */
+			slotent->spill_txns += msg->m_spill_txns;
+			slotent->spill_count += msg->m_spill_count;
+			slotent->spill_bytes += msg->m_spill_bytes;
+			slotent->stream_txns += msg->m_stream_txns;
+			slotent->stream_count += msg->m_stream_count;
+			slotent->stream_bytes += msg->m_stream_bytes;
+			slotent->total_txns += msg->m_total_txns;
+			slotent->total_bytes += msg->m_total_bytes;
+		}
 	}
 }
 
@@ -5749,59 +5780,80 @@ pgstat_db_requested(Oid databaseid)
 }
 
 /* ----------
- * pgstat_replslot_index
+ * pgstat_replslot_entry
  *
- * Return the index of entry of a replication slot with the given name, or
- * -1 if the slot is not found.
+ * Return the entry of replication slot stats with the given name. Return
+ * NULL if not found and the caller didn't request to create it.
  *
- * create_it tells whether to create the new slot entry if it is not found.
+ * create tells whether to create the new slot entry if it is not found.
  * ----------
  */
-static int
-pgstat_replslot_index(const char *name, bool create_it)
+static PgStat_StatReplSlotEntry *
+pgstat_get_replslot_entry(NameData name, bool create)
 {
-	int			i;
+	PgStat_StatReplSlotEntry *slotent;
+	bool		found;
 
-	Assert(nReplSlotStats <= max_replication_slots);
-	for (i = 0; i < nReplSlotStats; i++)
+	if (replSlotStatHash == NULL)
 	{
-		if (namestrcmp(&replSlotStats[i].slotname, name) == 0)
-			return i;			/* found */
+		HASHCTL		hash_ctl;
+
+		/*
+		 * Quick return NULL if the hash table is empty and the caller didn't
+		 * request to create the entry.
+		 */
+		if (!create)
+			return NULL;
+
+		hash_ctl.keysize = sizeof(NameData);
+		hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
+		replSlotStatHash = hash_create("Replication slots hash",
+									   PGSTAT_REPLSLOT_HASH_SIZE,
+									   &hash_ctl,
+									   HASH_ELEM | HASH_BLOBS);
 	}
 
-	/*
-	 * The slot is not found.  We don't want to register the new statistics if
-	 * the list is already full or the caller didn't request.
-	 */
-	if (i == max_replication_slots || !create_it)
-		return -1;
+	slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStatHash,
+													   (void *) &name,
+													   create ? HASH_ENTER : HASH_FIND,
+													   &found);
 
-	/* Register new slot */
-	memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
-	namestrcpy(&replSlotStats[nReplSlotStats].slotname, name);
+	if (!slotent)
+	{
+		/* not found */
+		Assert(!create && !found);
+		return NULL;
+	}
+
+	/* initialize the entry */
+	if (create && !found)
+	{
+		namestrcpy(&(slotent->slotname), NameStr(name));
+		pgstat_reset_replslot(slotent, 0);
+	}
 
-	return nReplSlotStats++;
+	return slotent;
 }
 
 /* ----------
  * pgstat_reset_replslot
  *
- * Reset the replication slot stats at index 'i'.
+ * Reset the given replication slot stats.
  * ----------
  */
 static void
-pgstat_reset_replslot(int i, TimestampTz ts)
+pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotent, TimestampTz ts)
 {
 	/* reset only counters. Don't clear slot name */
-	replSlotStats[i].spill_txns = 0;
-	replSlotStats[i].spill_count = 0;
-	replSlotStats[i].spill_bytes = 0;
-	replSlotStats[i].stream_txns = 0;
-	replSlotStats[i].stream_count = 0;
-	replSlotStats[i].stream_bytes = 0;
-	replSlotStats[i].total_txns = 0;
-	replSlotStats[i].total_bytes = 0;
-	replSlotStats[i].stat_reset_timestamp = ts;
+	slotent->spill_txns = 0;
+	slotent->spill_count = 0;
+	slotent->spill_bytes = 0;
+	slotent->stream_txns = 0;
+	slotent->stream_count = 0;
+	slotent->stream_bytes = 0;
+	slotent->total_txns = 0;
+	slotent->total_bytes = 0;
+	slotent->stat_reset_timestamp = ts;
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c67641..00543ede45 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1773,7 +1773,7 @@ void
 UpdateDecodingStats(LogicalDecodingContext *ctx)
 {
 	ReorderBuffer *rb = ctx->reorder;
-	PgStat_ReplSlotStats repSlotStat;
+	PgStat_StatReplSlotEntry repSlotStat;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78..cf261e200e 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -328,12 +328,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	 * ReplicationSlotAllocationLock.
 	 */
 	if (SlotIsLogical(slot))
-	{
-		PgStat_ReplSlotStats repSlotStat;
-		MemSet(&repSlotStat, 0, sizeof(PgStat_ReplSlotStats));
-		namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
-		pgstat_report_replslot(&repSlotStat);
-	}
+		pgstat_report_replslot_create(NameStr(slot->data.name));
 
 	/*
 	 * Now that the slot has been marked as in_use and active, it's safe to
@@ -349,17 +344,15 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  * Search for the named replication slot.
  *
  * Return the replication slot if found, otherwise NULL.
- *
- * The caller must hold ReplicationSlotControlLock in shared mode.
  */
 ReplicationSlot *
-SearchNamedReplicationSlot(const char *name)
+SearchNamedReplicationSlot(const char *name, bool need_lock)
 {
 	int			i;
-	ReplicationSlot	*slot = NULL;
+	ReplicationSlot *slot = NULL;
 
-	Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock,
-								LW_SHARED));
+	if (need_lock)
+		LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 
 	for (i = 0; i < max_replication_slots; i++)
 	{
@@ -372,6 +365,9 @@ SearchNamedReplicationSlot(const char *name)
 		}
 	}
 
+	if (need_lock)
+		LWLockRelease(ReplicationSlotControlLock);
+
 	return slot;
 }
 
@@ -416,7 +412,7 @@ retry:
 	 * Search for the slot with the specified name if the slot to acquire is
 	 * not given. If the slot is not found, we either return -1 or error out.
 	 */
-	s = slot ? slot : SearchNamedReplicationSlot(name);
+	s = slot ? slot : SearchNamedReplicationSlot(name, false);
 	if (s == NULL || !s->in_use)
 	{
 		LWLockRelease(ReplicationSlotControlLock);
@@ -713,6 +709,12 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
 	 * reduce that possibility. If the messages reached in reverse, we would
 	 * lose one statistics update message. But the next update message will
 	 * create the statistics for the replication slot.
+	 *
+	 * XXX In case, the messages for creation and drop slot of the same name
+	 * get lost and create happens before (auto)vacuum cleans up the dead
+	 * slot, the stats will be accumulated into the old slot. One can imagine
+	 * having OIDs for each slot to avoid the accumulation of stats but that
+	 * doesn't seem worth doing as in practice this won't happen frequently.
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_report_replslot_drop(NameStr(slot->data.name));
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2680190a40..94e8a49183 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/slot.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2207,8 +2208,33 @@ pg_stat_reset_replication_slot(PG_FUNCTION_ARGS)
 	char	   *target = NULL;
 
 	if (!PG_ARGISNULL(0))
+	{
+		ReplicationSlot *slot;
+
 		target = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
+		/*
+		 * Check if the slot exists with the given name. It is possible that
+		 * by the time this message is executed the slot is dropped but at
+		 * least this check will ensure that the given name is for a valid
+		 * slot.
+		 */
+		slot = SearchNamedReplicationSlot(target, true);
+
+		if (!slot)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("replication slot \"%s\" does not exist",
+							target)));
+
+		/*
+		 * Nothing to do for physical slots as we collect stats only for
+		 * logical slots.
+		 */
+		if (SlotIsPhysical(slot))
+			PG_RETURN_VOID();
+	}
+
 	pgstat_reset_replslot_counter(target);
 
 	PG_RETURN_VOID();
@@ -2280,73 +2306,77 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
-/* Get the statistics for the replication slots */
+/*
+ * Get the statistics for the replication slot. If the slot statistics is not
+ * available, return all-zeroes stats.
+ */
 Datum
-pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
+pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
 #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	text	   *slotname_text = PG_GETARG_TEXT_P(0);
+	NameData	slotname;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
-	PgStat_ReplSlotStats *slotstats;
-	int			nstats;
-	int			i;
+	Datum		values[10];
+	bool		nulls[10];
+	PgStat_StatReplSlotEntry *slotent;
+	PgStat_StatReplSlotEntry allzero;
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
 
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_REPLICATION_SLOT_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "slot_name",
+					   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "spill_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spill_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
-	slotstats = pgstat_fetch_replslot(&nstats);
-	for (i = 0; i < nstats; i++)
+	namestrcpy(&slotname, text_to_cstring(slotname_text));
+	slotent = pgstat_fetch_replslot(slotname);
+	if (!slotent)
 	{
-		Datum		values[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		bool		nulls[PG_STAT_GET_REPLICATION_SLOT_COLS];
-		PgStat_ReplSlotStats *s = &(slotstats[i]);
-
-		MemSet(values, 0, sizeof(values));
-		MemSet(nulls, 0, sizeof(nulls));
-
-		values[0] = CStringGetTextDatum(NameStr(s->slotname));
-		values[1] = Int64GetDatum(s->spill_txns);
-		values[2] = Int64GetDatum(s->spill_count);
-		values[3] = Int64GetDatum(s->spill_bytes);
-		values[4] = Int64GetDatum(s->stream_txns);
-		values[5] = Int64GetDatum(s->stream_count);
-		values[6] = Int64GetDatum(s->stream_bytes);
-		values[7] = Int64GetDatum(s->total_txns);
-		values[8] = Int64GetDatum(s->total_bytes);
-
-		if (s->stat_reset_timestamp == 0)
-			nulls[9] = true;
-		else
-			values[9] = TimestampTzGetDatum(s->stat_reset_timestamp);
-
-		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+		/*
+		 * If the slot is not found, initialise its stats. This is possible if
+		 * the create slot message is lost.
+		 */
+		memset(&allzero, 0, sizeof(PgStat_StatReplSlotEntry));
+		slotent = &allzero;
 	}
 
-	tuplestore_donestoring(tupstore);
+	values[0] = CStringGetTextDatum(NameStr(slotname));
+	values[1] = Int64GetDatum(slotent->spill_txns);
+	values[2] = Int64GetDatum(slotent->spill_count);
+	values[3] = Int64GetDatum(slotent->spill_bytes);
+	values[4] = Int64GetDatum(slotent->stream_txns);
+	values[5] = Int64GetDatum(slotent->stream_count);
+	values[6] = Int64GetDatum(slotent->stream_bytes);
+	values[7] = Int64GetDatum(slotent->total_txns);
+	values[8] = Int64GetDatum(slotent->total_bytes);
 
-	return (Datum) 0;
+	if (slotent->stat_reset_timestamp == 0)
+		nulls[9] = true;
+	else
+		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b62abcd22c..e4cd514992 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5308,14 +5308,14 @@
   proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,status,receive_start_lsn,receive_start_tli,written_lsn,flushed_lsn,received_tli,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time,slot_name,sender_host,sender_port,conninfo}',
   prosrc => 'pg_stat_get_wal_receiver' },
-{ oid => '8595', descr => 'statistics: information about replication slots',
-  proname => 'pg_stat_get_replication_slots', prorows => '10',
+{ oid => '8595', descr => 'statistics: information about replication slot',
+  proname => 'pg_stat_get_replication_slot', prorows => '1',
   proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r',
-  prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
-  prosrc => 'pg_stat_get_replication_slots' },
+  prorettype => 'record', proargtypes => 'text',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  prosrc => 'pg_stat_get_replication_slot' },
 { oid => '6118', descr => 'statistics: information about subscription',
   proname => 'pg_stat_get_subscription', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5c5920b0b5..1ce363e7d1 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -541,6 +541,7 @@ typedef struct PgStat_MsgReplSlot
 {
 	PgStat_MsgHdr m_hdr;
 	NameData	m_slotname;
+	bool		m_create;
 	bool		m_drop;
 	PgStat_Counter m_spill_txns;
 	PgStat_Counter m_spill_count;
@@ -917,7 +918,7 @@ typedef struct PgStat_SLRUStats
 /*
  * Replication slot statistics kept in the stats collector
  */
-typedef struct PgStat_ReplSlotStats
+typedef struct PgStat_StatReplSlotEntry
 {
 	NameData	slotname;
 	PgStat_Counter spill_txns;
@@ -929,7 +930,7 @@ typedef struct PgStat_ReplSlotStats
 	PgStat_Counter total_txns;
 	PgStat_Counter total_bytes;
 	TimestampTz stat_reset_timestamp;
-} PgStat_ReplSlotStats;
+} PgStat_StatReplSlotEntry;
 
 
 /*
@@ -1031,7 +1032,8 @@ extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);
 extern void pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount);
 extern void pgstat_report_checksum_failure(void);
-extern void pgstat_report_replslot(const PgStat_ReplSlotStats *repSlotStat);
+extern void pgstat_report_replslot(const PgStat_StatReplSlotEntry *repSlotStat);
+extern void pgstat_report_replslot_create(const char *slotname);
 extern void pgstat_report_replslot_drop(const char *slotname);
 
 extern void pgstat_initialize(void);
@@ -1129,7 +1131,7 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
-extern PgStat_ReplSlotStats *pgstat_fetch_replslot(int *nslots_p);
+extern PgStat_StatReplSlotEntry *pgstat_fetch_replslot(NameData slotname);
 extern PgStat_RecoveryPrefetchStats *pgstat_fetch_recoveryprefetch(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50d..357068403a 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -223,7 +223,7 @@ extern XLogRecPtr ReplicationSlotsComputeLogicalRestartLSN(void);
 extern bool ReplicationSlotsCountDBSlots(Oid dboid, int *nslots, int *nactive);
 extern void ReplicationSlotsDropDBSlots(Oid dboid);
 extern void InvalidateObsoleteReplicationSlots(XLogSegNo oldestSegno);
-extern ReplicationSlot *SearchNamedReplicationSlot(const char *name);
+extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
 extern void ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char *syncslotname, int szslot);
 extern void ReplicationSlotDropAtPubNode(WalReceiverConn *wrconn, char *slotname, bool missing_ok);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6399f3feef..ce2ecb914b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,7 +2071,9 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_txns,
     s.total_bytes,
     s.stats_reset
-   FROM pg_stat_get_replication_slots() s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset);
+   FROM pg_replication_slots r,
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+  WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT s.name,
     s.blks_zeroed,
     s.blks_hit,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c7aff677d4..878b67a276 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1870,12 +1870,12 @@ PgStat_MsgTabstat
 PgStat_MsgTempFile
 PgStat_MsgVacuum
 PgStat_MsgWal
-PgStat_ReplSlotStats
 PgStat_SLRUStats
 PgStat_Shared_Reset_Target
 PgStat_Single_Reset_Type
 PgStat_StatDBEntry
 PgStat_StatFuncEntry
+PgStat_StatReplSlotEntry
 PgStat_StatTabEntry
 PgStat_SubXactStatus
 PgStat_TableCounts
-- 
2.24.3 (Apple Git-128)

#124Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#123)
Re: Replication slot stats misgivings

On Thu, Apr 22, 2021 at 1:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thanks, it looks good to me now. I'll review/test some more before
committing but at this stage, I would like to know from Andres or
others whether they see any problem with this approach to fixing a few
of the problems reported in this thread. Basically, it will fix the
cases where the drop message is lost and we were not able to record
stats for new slots and writing beyond the end of the array when after
restarting the number of slots whose stats are stored in the stats
file exceeds max_replication_slots.

It uses HTAB instead of an array to record slot stats and also taught
pgstat_vacuum_stat() to search for all the dead replication slots in
stats hashtable and tell the collector to remove them. This still uses
slot_name as the key because we were not able to find a better way to
use slot's idx.

Andres, unless you see any problems with this approach, I would like
to move forward with this early next week?

--
With Regards,
Amit Kapila.

#125Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#98)
1 attachment(s)
Re: Replication slot stats misgivings

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

--
With Regards,
Amit Kapila.

Attachments:

v2-0001-Update-decoding-stats-during-replication-slot-rel.patchapplication/octet-stream; name=v2-0001-Update-decoding-stats-during-replication-slot-rel.patchDownload
From ee9e1dc7472c8ff30c31357290114e80c6aa4e8d Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 23 Apr 2021 14:23:28 +0530
Subject: [PATCH v2] Update decoding stats during replication slot release.

Currently, replication slot statistics are updated at prepare, commit, and
rollback. Now, if the transaction is interrupted the stats might not get
updated. Fixed this by updating statistics during replication slot
release.

Moved the statistics related variables from ReorderBuffer to
ReplicationSlot structure to make them accessible at
ReplicationSlotRelease.

Reported-by: Andres Freund
Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
---
 src/backend/replication/logical/decode.c      | 12 ++--
 src/backend/replication/logical/logical.c     | 55 +++++++++----------
 .../replication/logical/reorderbuffer.c       | 39 +++++++------
 src/backend/replication/slot.c                | 26 +++++++++
 src/include/replication/logical.h             |  2 +-
 src/include/replication/reorderbuffer.h       | 23 --------
 src/include/replication/slot.h                | 24 ++++++++
 7 files changed, 104 insertions(+), 77 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7924581cdcd..312fab07985 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -748,9 +748,10 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/*
 	 * Update the decoding stats at transaction prepare/commit/abort. It is
 	 * not clear that sending more or less frequently than this would be
-	 * better.
+	 * better. We do send the stats in ReplicationSlotRelease to avoid losing
+	 * in case the decoding is interrupted.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats();
 }
 
 /*
@@ -830,9 +831,10 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/*
 	 * Update the decoding stats at transaction prepare/commit/abort. It is
 	 * not clear that sending more or less frequently than this would be
-	 * better.
+	 * better. We do send the stats in ReplicationSlotRelease to avoid losing
+	 * in case the decoding is interrupted.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats();
 }
 
 
@@ -889,7 +891,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/* update the decoding stats */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats();
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 35b0c676412..156a9780f20 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1770,44 +1770,39 @@ ResetLogicalStreamingState(void)
  * Report stats for a slot.
  */
 void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats()
 {
-	ReorderBuffer *rb = ctx->reorder;
 	PgStat_ReplSlotStats repSlotStat;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
-	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
+	if (slot->spillBytes <= 0 && slot->streamBytes <= 0 && slot->totalBytes <= 0)
 		return;
 
 	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
-		 rb,
-		 (long long) rb->spillTxns,
-		 (long long) rb->spillCount,
-		 (long long) rb->spillBytes,
-		 (long long) rb->streamTxns,
-		 (long long) rb->streamCount,
-		 (long long) rb->streamBytes,
-		 (long long) rb->totalTxns,
-		 (long long) rb->totalBytes);
-
-	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
-	repSlotStat.spill_txns = rb->spillTxns;
-	repSlotStat.spill_count = rb->spillCount;
-	repSlotStat.spill_bytes = rb->spillBytes;
-	repSlotStat.stream_txns = rb->streamTxns;
-	repSlotStat.stream_count = rb->streamCount;
-	repSlotStat.stream_bytes = rb->streamBytes;
-	repSlotStat.total_txns = rb->totalTxns;
-	repSlotStat.total_bytes = rb->totalBytes;
+		 slot,
+		 (long long) slot->spillTxns,
+		 (long long) slot->spillCount,
+		 (long long) slot->spillBytes,
+		 (long long) slot->streamTxns,
+		 (long long) slot->streamCount,
+		 (long long) slot->streamBytes,
+		 (long long) slot->totalTxns,
+		 (long long) slot->totalBytes);
+
+	namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
+	repSlotStat.spill_txns = slot->spillTxns;
+	repSlotStat.spill_count = slot->spillCount;
+	repSlotStat.spill_bytes = slot->spillBytes;
+	repSlotStat.stream_txns = slot->streamTxns;
+	repSlotStat.stream_count = slot->streamCount;
+	repSlotStat.stream_bytes = slot->streamBytes;
+	repSlotStat.total_txns = slot->totalTxns;
+	repSlotStat.total_bytes = slot->totalBytes;
 
 	pgstat_report_replslot(&repSlotStat);
 
-	rb->spillTxns = 0;
-	rb->spillCount = 0;
-	rb->spillBytes = 0;
-	rb->streamTxns = 0;
-	rb->streamCount = 0;
-	rb->streamBytes = 0;
-	rb->totalTxns = 0;
-	rb->totalBytes = 0;
+	InitializeSlotStats(slot);
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5cb484f0323..700ccf542c9 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -344,15 +344,6 @@ ReorderBufferAllocate(void)
 	buffer->outbufsize = 0;
 	buffer->size = 0;
 
-	buffer->spillTxns = 0;
-	buffer->spillCount = 0;
-	buffer->spillBytes = 0;
-	buffer->streamTxns = 0;
-	buffer->streamCount = 0;
-	buffer->streamBytes = 0;
-	buffer->totalTxns = 0;
-	buffer->totalBytes = 0;
-
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
 	dlist_init(&buffer->toplevel_by_lsn);
@@ -1316,6 +1307,9 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 	ReorderBufferChange *change;
 	ReorderBufferIterTXNEntry *entry;
 	int32		off;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* nothing there anymore */
 	if (state->heap->bh_size == 0)
@@ -1369,7 +1363,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		slot->totalBytes += rb->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2018,6 +2012,9 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	ReorderBufferChange *volatile specinsert = NULL;
 	volatile bool stream_started = false;
 	ReorderBufferTXN *volatile curtxn = NULL;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* build data to be able to lookup the CommandIds of catalog tuples */
 	ReorderBufferBuildTupleCidHash(rb, txn);
@@ -2380,9 +2377,9 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		 * which we have already accounted in ReorderBufferIterTXNNext.
 		 */
 		if (!rbtxn_is_streamed(txn))
-			rb->totalTxns++;
+			slot->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		slot->totalBytes += rb->size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3482,6 +3479,9 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	elog(DEBUG2, "spill %u changes in XID %u to disk",
 		 (uint32) txn->nentries_mem, txn->xid);
@@ -3543,11 +3543,11 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* update the statistics iff we have spilled anything */
 	if (spilled)
 	{
-		rb->spillCount += 1;
-		rb->spillBytes += size;
+		slot->spillCount += 1;
+		slot->spillBytes += size;
 
 		/* don't consider already serialized transactions */
-		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+		slot->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3818,6 +3818,9 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	CommandId	command_id;
 	Size		stream_bytes;
 	bool		txn_is_streamed;
+	ReplicationSlot *slot = MyReplicationSlot;
+
+	Assert(MyReplicationSlot != NULL);
 
 	/* We can never reach here for a subtransaction. */
 	Assert(txn->toptxn == NULL);
@@ -3911,11 +3914,11 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	ReorderBufferProcessTXN(rb, txn, InvalidXLogRecPtr, snapshot_now,
 							command_id, true);
 
-	rb->streamCount += 1;
-	rb->streamBytes += stream_bytes;
+	slot->streamCount += 1;
+	slot->streamBytes += stream_bytes;
 
 	/* Don't consider already streamed transaction. */
-	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
+	slot->streamTxns += (txn_is_streamed) ? 0 : 1;
 
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f61b163f78d..6fde2bc8676 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -44,6 +44,7 @@
 #include "common/string.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "replication/logical.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
 #include "storage/proc.h"
@@ -295,6 +296,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	InitializeSlotStats(slot);
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -500,6 +502,15 @@ ReplicationSlotRelease(void)
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (SlotIsLogical(slot))
+	{
+		/*
+		 * Update the decoding stats. This avoids losing them if the decoding
+		 * is interrupted.
+		 */
+		UpdateDecodingStats();
+	}
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
@@ -1782,6 +1793,7 @@ RestoreSlotFromDisk(const char *name)
 		slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 		slot->candidate_restart_lsn = InvalidXLogRecPtr;
 		slot->candidate_restart_valid = InvalidXLogRecPtr;
+		InitializeSlotStats(slot);
 
 		slot->in_use = true;
 		slot->active_pid = 0;
@@ -1795,3 +1807,17 @@ RestoreSlotFromDisk(const char *name)
 				(errmsg("too many replication slots active before shutdown"),
 				 errhint("Increase max_replication_slots and try again.")));
 }
+
+/* Initialize Replication Slot stats */
+void
+InitializeSlotStats(ReplicationSlot *slot)
+{
+	slot->spillTxns = 0;
+	slot->spillCount = 0;
+	slot->spillBytes = 0;
+	slot->streamTxns = 0;
+	slot->streamCount = 0;
+	slot->streamBytes = 0;
+	slot->totalTxns = 0;
+	slot->totalBytes = 0;
+}
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 7dfcb7be187..44baec63114 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -134,6 +134,6 @@ extern bool filter_prepare_cb_wrapper(LogicalDecodingContext *ctx,
 									  TransactionId xid, const char *gid);
 extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id);
 extern void ResetLogicalStreamingState(void);
-extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
+extern void UpdateDecodingStats(void);
 
 #endif
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bfab8303ee7..c8295fc9edf 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -602,29 +602,6 @@ struct ReorderBuffer
 
 	/* memory accounting */
 	Size		size;
-
-	/*
-	 * Statistics about transactions spilled to disk.
-	 *
-	 * A single transaction may be spilled repeatedly, which is why we keep
-	 * two different counters. For spilling, the transaction counter includes
-	 * both toplevel transactions and subtransactions.
-	 */
-	int64		spillTxns;		/* number of transactions spilled to disk */
-	int64		spillCount;		/* spill-to-disk invocation counter */
-	int64		spillBytes;		/* amount of data spilled to disk */
-
-	/* Statistics about transactions streamed to the decoding output plugin */
-	int64		streamTxns;		/* number of transactions streamed */
-	int64		streamCount;	/* streaming invocation counter */
-	int64		streamBytes;	/* amount of data streamed */
-
-	/*
-	 * Statistics about all the transactions sent to the decoding output
-	 * plugin
-	 */
-	int64		totalTxns;		/* total number of transactions sent */
-	int64		totalBytes;		/* total amount of data sent */
 };
 
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 1ad5e6c50df..267eed60aef 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -172,6 +172,29 @@ typedef struct ReplicationSlot
 	XLogRecPtr	candidate_xmin_lsn;
 	XLogRecPtr	candidate_restart_valid;
 	XLogRecPtr	candidate_restart_lsn;
+
+	/*
+	 * Statistics about transactions spilled to disk.
+	 *
+	 * A single transaction may be spilled repeatedly, which is why we keep
+	 * two different counters. For spilling, the transaction counter includes
+	 * both toplevel transactions and subtransactions.
+	 */
+	int64		spillTxns;		/* number of transactions spilled to disk */
+	int64		spillCount;		/* spill-to-disk invocation counter */
+	int64		spillBytes;		/* amount of data spilled to disk */
+
+	/* Statistics about transactions streamed to the decoding output plugin */
+	int64		streamTxns;		/* number of transactions streamed */
+	int64		streamCount;	/* streaming invocation counter */
+	int64		streamBytes;	/* amount of data streamed */
+
+	/*
+	 * Statistics about all the transactions sent to the decoding output
+	 * plugin
+	 */
+	int64		totalTxns;		/* total number of transactions sent */
+	int64		totalBytes;		/* total amount of data sent */
 } ReplicationSlot;
 
 #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid)
@@ -231,5 +254,6 @@ extern void StartupReplicationSlots(void);
 extern void CheckPointReplicationSlots(void);
 
 extern void CheckSlotRequirements(void);
+extern void InitializeSlotStats(ReplicationSlot *slot);
 
 #endif							/* SLOT_H */
-- 
2.28.0.windows.1

#126Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#125)
Re: Replication slot stats misgivings

On Fri, Apr 23, 2021 at 6:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

The patch moves slot stats to the ReplicationSlot data that is on the
shared memory. If we have a space to store the statistics in the
shared memory can we simply accumulate the stats there and make them
persistent without using the stats collector? And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#127Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#126)
Re: Replication slot stats misgivings

On Mon, Apr 26, 2021 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 23, 2021 at 6:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

The patch moves slot stats to the ReplicationSlot data that is on the
shared memory. If we have a space to store the statistics in the
shared memory can we simply accumulate the stats there and make them
persistent without using the stats collector?

But for that, we need to write to file at every commit/abort/prepare
(decode of commit) which I think will incur significant overhead.
Also, we try to write after few commits then there is a danger of
losing them and still there could be a visible overhead for small
transactions.

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

--
With Regards,
Amit Kapila.

#128vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#127)
Re: Replication slot stats misgivings

On Mon, Apr 26, 2021 at 8:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 23, 2021 at 6:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

The patch moves slot stats to the ReplicationSlot data that is on the
shared memory. If we have a space to store the statistics in the
shared memory can we simply accumulate the stats there and make them
persistent without using the stats collector?

But for that, we need to write to file at every commit/abort/prepare
(decode of commit) which I think will incur significant overhead.
Also, we try to write after few commits then there is a danger of
losing them and still there could be a visible overhead for small
transactions.

I preferred not to persist this information to file, let's have stats
collector handle the stats persisting.

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

We use around 64 bytes of shared memory to store the statistics
information per slot, I'm not sure if this is a lot of memory. If this
memory is fine, then I felt the approach to store stats seems fine. If
that memory is too much then we could use the other approach to update
stats when we stream or spill the changes as suggested by Amit.

Regards,
Vignesh

#129Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#128)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 8:01 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 23, 2021 at 6:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

The patch moves slot stats to the ReplicationSlot data that is on the
shared memory. If we have a space to store the statistics in the
shared memory can we simply accumulate the stats there and make them
persistent without using the stats collector?

But for that, we need to write to file at every commit/abort/prepare
(decode of commit) which I think will incur significant overhead.
Also, we try to write after few commits then there is a danger of
losing them and still there could be a visible overhead for small
transactions.

I preferred not to persist this information to file, let's have stats
collector handle the stats persisting.

Sawada-San, I would like to go ahead with your
"Use-HTAB-for-replication-slot-statistics" unless you think otherwise?

--
With Regards,
Amit Kapila.

#130Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#129)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 11:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 8:01 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 23, 2021 at 6:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

The patch moves slot stats to the ReplicationSlot data that is on the
shared memory. If we have a space to store the statistics in the
shared memory can we simply accumulate the stats there and make them
persistent without using the stats collector?

But for that, we need to write to file at every commit/abort/prepare
(decode of commit) which I think will incur significant overhead.
Also, we try to write after few commits then there is a danger of
losing them and still there could be a visible overhead for small
transactions.

I preferred not to persist this information to file, let's have stats
collector handle the stats persisting.

Sawada-San, I would like to go ahead with your
"Use-HTAB-for-replication-slot-statistics" unless you think otherwise?

I agree that it's better to use the stats collector. So please go ahead.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#131Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#128)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 11:31 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 26, 2021 at 8:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 23, 2021 at 6:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 19, 2021 at 4:28 PM vignesh C <vignesh21@gmail.com> wrote:

I have made the changes to update the replication statistics at
replication slot release. Please find the patch attached for the same.
Thoughts?

Thanks, the changes look mostly good to me. The slot stats need to be
initialized in RestoreSlotFromDisk and ReplicationSlotCreate, not in
StartupDecodingContext. Apart from that, I have moved the declaration
of UpdateDecodingStats from slot.h back to logical.h. I have also
added/edited a few comments. Please check and let me know what do you
think of the attached?

The patch moves slot stats to the ReplicationSlot data that is on the
shared memory. If we have a space to store the statistics in the
shared memory can we simply accumulate the stats there and make them
persistent without using the stats collector?

But for that, we need to write to file at every commit/abort/prepare
(decode of commit) which I think will incur significant overhead.
Also, we try to write after few commits then there is a danger of
losing them and still there could be a visible overhead for small
transactions.

I preferred not to persist this information to file, let's have stats
collector handle the stats persisting.

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

We use around 64 bytes of shared memory to store the statistics
information per slot, I'm not sure if this is a lot of memory. If this
memory is fine, then I felt the approach to store stats seems fine. If
that memory is too much then we could use the other approach to update
stats when we stream or spill the changes as suggested by Amit.

I agree that makes it easier to send slot stats during
ReplicationSlotRelease() but I'd prefer to avoid storing data that
doesn't need to be shared in the shared buffer if possible. And those
counters are not used by physical slots at all. If sending slot stats
every time we stream or spill changes doesn't affect the system much,
I think it's better than having slot stats in the shared memory.

Also, not sure it’s better but another idea would be to make the slot
stats a global variable like pgBufferUsage and use it during decoding.
Or we can set a proc-exit callback? But to be honest, I'm not sure
which approach we should go with. Those approaches have proc and cons.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#132Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#131)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 9:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:31 AM vignesh C <vignesh21@gmail.com> wrote:

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

We use around 64 bytes of shared memory to store the statistics
information per slot, I'm not sure if this is a lot of memory. If this
memory is fine, then I felt the approach to store stats seems fine. If
that memory is too much then we could use the other approach to update
stats when we stream or spill the changes as suggested by Amit.

I agree that makes it easier to send slot stats during
ReplicationSlotRelease() but I'd prefer to avoid storing data that
doesn't need to be shared in the shared buffer if possible.

Sounds reasonable and we might add some stats in the future so that
will further increase the usage of shared memory.

And those
counters are not used by physical slots at all. If sending slot stats
every time we stream or spill changes doesn't affect the system much,
I think it's better than having slot stats in the shared memory.

As the minimum size of logical_decoding_work_mem is 64KB, so in the
worst case, we will send stats after decoding that many changes. I
don't think it would impact too much considering that we need to spill
or stream those many changes. If it concerns any users they can
always increase logical_decoding_work_mem. The default value is 64MB
at which point, I don't think it will matter sending the stats.

Also, not sure it’s better but another idea would be to make the slot
stats a global variable like pgBufferUsage and use it during decoding.

Hmm, I think it is better to avoid global variables if possible.

Or we can set a proc-exit callback? But to be honest, I'm not sure
which approach we should go with. Those approaches have proc and cons.

I think we can try the first approach listed here which is to send
stats each time we spill or stream.

--
With Regards,
Amit Kapila.

#133vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#132)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:31 AM vignesh C <vignesh21@gmail.com> wrote:

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

We use around 64 bytes of shared memory to store the statistics
information per slot, I'm not sure if this is a lot of memory. If this
memory is fine, then I felt the approach to store stats seems fine. If
that memory is too much then we could use the other approach to update
stats when we stream or spill the changes as suggested by Amit.

I agree that makes it easier to send slot stats during
ReplicationSlotRelease() but I'd prefer to avoid storing data that
doesn't need to be shared in the shared buffer if possible.

Sounds reasonable and we might add some stats in the future so that
will further increase the usage of shared memory.

And those
counters are not used by physical slots at all. If sending slot stats
every time we stream or spill changes doesn't affect the system much,
I think it's better than having slot stats in the shared memory.

As the minimum size of logical_decoding_work_mem is 64KB, so in the
worst case, we will send stats after decoding that many changes. I
don't think it would impact too much considering that we need to spill
or stream those many changes. If it concerns any users they can
always increase logical_decoding_work_mem. The default value is 64MB
at which point, I don't think it will matter sending the stats.

Sounds good to me, I will rebase my previous patch and send a patch for this.

Regards,
Vignesh

#134Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#133)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 1:18 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:31 AM vignesh C <vignesh21@gmail.com> wrote:

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

We use around 64 bytes of shared memory to store the statistics
information per slot, I'm not sure if this is a lot of memory. If this
memory is fine, then I felt the approach to store stats seems fine. If
that memory is too much then we could use the other approach to update
stats when we stream or spill the changes as suggested by Amit.

I agree that makes it easier to send slot stats during
ReplicationSlotRelease() but I'd prefer to avoid storing data that
doesn't need to be shared in the shared buffer if possible.

Sounds reasonable and we might add some stats in the future so that
will further increase the usage of shared memory.

And those
counters are not used by physical slots at all. If sending slot stats
every time we stream or spill changes doesn't affect the system much,
I think it's better than having slot stats in the shared memory.

As the minimum size of logical_decoding_work_mem is 64KB, so in the
worst case, we will send stats after decoding that many changes. I
don't think it would impact too much considering that we need to spill
or stream those many changes. If it concerns any users they can
always increase logical_decoding_work_mem. The default value is 64MB
at which point, I don't think it will matter sending the stats.

Sounds good to me, I will rebase my previous patch and send a patch for this.

+1. Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#135vignesh C
vignesh21@gmail.com
In reply to: vignesh C (#133)
1 attachment(s)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:31 AM vignesh C <vignesh21@gmail.com> wrote:

And I think there is
also a risk to increase shared memory when we want to add other
statistics in the future.

Yeah, so do you think it is not a good idea to store stats in
ReplicationSlot? Actually storing them in a slot makes it easier to
send them during ReplicationSlotRelease which is quite helpful if the
replication is interrupted due to some reason. Or the other idea was
that we send stats every time we stream or spill changes.

We use around 64 bytes of shared memory to store the statistics
information per slot, I'm not sure if this is a lot of memory. If this
memory is fine, then I felt the approach to store stats seems fine. If
that memory is too much then we could use the other approach to update
stats when we stream or spill the changes as suggested by Amit.

I agree that makes it easier to send slot stats during
ReplicationSlotRelease() but I'd prefer to avoid storing data that
doesn't need to be shared in the shared buffer if possible.

Sounds reasonable and we might add some stats in the future so that
will further increase the usage of shared memory.

And those
counters are not used by physical slots at all. If sending slot stats
every time we stream or spill changes doesn't affect the system much,
I think it's better than having slot stats in the shared memory.

As the minimum size of logical_decoding_work_mem is 64KB, so in the
worst case, we will send stats after decoding that many changes. I
don't think it would impact too much considering that we need to spill
or stream those many changes. If it concerns any users they can
always increase logical_decoding_work_mem. The default value is 64MB
at which point, I don't think it will matter sending the stats.

Sounds good to me, I will rebase my previous patch and send a patch for this.

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.
Thoughts?

Regards,
Vignesh

Attachments:

v3-0001-Update-replication-statistics-after-every-stream-.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Update-replication-statistics-after-every-stream-.patchDownload
From a1fd6f76c955482efa7e63ea6c25ae6350160b4d Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Tue, 27 Apr 2021 10:56:02 +0530
Subject: [PATCH v3] Update replication statistics after every stream/spill.

Currently, replication slot statistics are updated at prepare, commit, and
rollback. Now, if the transaction is interrupted the stats might not get
updated. Fixed this by updating replication statistics after every
stream/spill.
---
 src/backend/replication/logical/decode.c        | 6 +++---
 src/backend/replication/logical/logical.c       | 6 +++---
 src/backend/replication/logical/reorderbuffer.c | 3 +++
 src/include/replication/logical.h               | 2 +-
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7924581cdc..2c009d51ec 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -750,7 +750,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 /*
@@ -832,7 +832,7 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * not clear that sending more or less frequently than this would be
 	 * better.
 	 */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 
@@ -889,7 +889,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/* update the decoding stats */
-	UpdateDecodingStats(ctx);
+	UpdateDecodingStats(ctx->reorder);
 }
 
 /*
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 00543ede45..0c3ef0f93f 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1770,10 +1770,10 @@ ResetLogicalStreamingState(void)
  * Report stats for a slot.
  */
 void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)
 {
-	ReorderBuffer *rb = ctx->reorder;
 	PgStat_StatReplSlotEntry repSlotStat;
+	ReplicationSlot *slot = MyReplicationSlot;
 
 	/* Nothing to do if we don't have any replication stats to be sent. */
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
@@ -1790,7 +1790,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 		 (long long) rb->totalTxns,
 		 (long long) rb->totalBytes);
 
-	namestrcpy(&repSlotStat.slotname, NameStr(ctx->slot->data.name));
+	namestrcpy(&repSlotStat.slotname, NameStr(slot->data.name));
 	repSlotStat.spill_txns = rb->spillTxns;
 	repSlotStat.spill_count = rb->spillCount;
 	repSlotStat.spill_bytes = rb->spillBytes;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f710053..f4d97c1de3 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3551,6 +3551,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+		UpdateDecodingStats(rb);
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3920,6 +3921,8 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
 
+	UpdateDecodingStats(rb);
+
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
 	Assert(txn->nentries_mem == 0);
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 7dfcb7be18..8e03e055f6 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -134,6 +134,6 @@ extern bool filter_prepare_cb_wrapper(LogicalDecodingContext *ctx,
 									  TransactionId xid, const char *gid);
 extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id);
 extern void ResetLogicalStreamingState(void);
-extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
+extern void UpdateDecodingStats(ReorderBuffer *rb);
 
 #endif
-- 
2.25.1

#136Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#130)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sawada-San, I would like to go ahead with your
"Use-HTAB-for-replication-slot-statistics" unless you think otherwise?

I agree that it's better to use the stats collector. So please go ahead.

I have pushed this patch and seeing one buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2021-04-27%2009%3A23%3A14

starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2
s2_alter_tbl1_char s1_commit s2_get_changes
+ isolationtester: canceling step s1_init after 314 seconds
step s1_init: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
?column?

I am analyzing this. Do let me know if you have any thoughts on the same?

--
With Regards,
Amit Kapila.

#137Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#136)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have pushed this patch and seeing one buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2021-04-27%2009%3A23%3A14

starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2
s2_alter_tbl1_char s1_commit s2_get_changes
+ isolationtester: canceling step s1_init after 314 seconds
step s1_init: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
?column?

I am analyzing this.

After checking below logs corresponding to this test, it seems test
has been executed and create_slot was successful:
2021-04-27 11:06:43.770 UTC [17694956:52] isolation/concurrent_ddl_dml
STATEMENT: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
2021-04-27 11:07:11.748 UTC [5243096:9] LOG: checkpoint starting: time
2021-04-27 11:09:24.332 UTC [5243096:10] LOG: checkpoint complete:
wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.716 s, sync=0.001 s, total=132.584 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=198 kB, estimate=406 kB
2021-04-27 11:09:40.116 UTC [6226046:1] [unknown] LOG: connection
received: host=[local]
2021-04-27 11:09:40.117 UTC [17694956:53] isolation/concurrent_ddl_dml
LOG: statement: BEGIN;
2021-04-27 11:09:40.117 UTC [17694956:54] isolation/concurrent_ddl_dml
LOG: statement: INSERT INTO tbl1 (val1, val2) VALUES (1, 1);
2021-04-27 11:09:40.118 UTC [17694956:55] isolation/concurrent_ddl_dml
LOG: statement: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
2021-04-27 11:09:40.119 UTC [10944636:49] isolation/concurrent_ddl_dml
LOG: statement: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE character
varying;

I am not sure but there is some possibility that even though create
slot is successful, the isolation tester got successful in canceling
it, maybe because create_slot is just finished at the same time. As we
can see from logs, during this test checkpoint also happened which
could also lead to the slowness of this particular command.

Also, I see a lot of messages like below which indicate stats
collector is also quite slow:
2021-04-27 10:57:59.385 UTC [18743536:1] LOG: using stale statistics
instead of current ones because stats collector is not responding

I am not sure if the timeout happened because the machine is slow or
is it in any way related to code. I am seeing some previous failures
due to timeout on this machine [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2021-02-23%2004%3A23%3A56[2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2020-12-24%2005%3A31%3A43. In those failures, I see the
"using stale stats...." message. Also, I am not able to see why it can
fail due to this patch?

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2021-02-23%2004%3A23%3A56
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2020-12-24%2005%3A31%3A43

--
With Regards,
Amit Kapila.

#138Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#137)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 11:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I have pushed this patch and seeing one buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&amp;dt=2021-04-27%2009%3A23%3A14

starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2
s2_alter_tbl1_char s1_commit s2_get_changes
+ isolationtester: canceling step s1_init after 314 seconds
step s1_init: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
?column?

I am analyzing this.

After checking below logs corresponding to this test, it seems test
has been executed and create_slot was successful:

The pg_create_logical_replication_slot() was executed at 11:04:25:

2021-04-27 11:04:25.494 UTC [17694956:49] isolation/concurrent_ddl_dml
LOG: statement: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');

Therefore this command took 314 sec that matches the number the
isolation test reported. And the folling logs follow:

2021-04-27 11:06:43.770 UTC [17694956:50] isolation/concurrent_ddl_dml
LOG: logical decoding found consistent point at 0/17F9078
2021-04-27 11:06:43.770 UTC [17694956:51] isolation/concurrent_ddl_dml
DETAIL: There are no running transactions.

2021-04-27 11:06:43.770 UTC [17694956:52] isolation/concurrent_ddl_dml
STATEMENT: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
2021-04-27 11:07:11.748 UTC [5243096:9] LOG: checkpoint starting: time
2021-04-27 11:09:24.332 UTC [5243096:10] LOG: checkpoint complete:
wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.716 s, sync=0.001 s, total=132.584 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=198 kB, estimate=406 kB
2021-04-27 11:09:40.116 UTC [6226046:1] [unknown] LOG: connection
received: host=[local]
2021-04-27 11:09:40.117 UTC [17694956:53] isolation/concurrent_ddl_dml
LOG: statement: BEGIN;
2021-04-27 11:09:40.117 UTC [17694956:54] isolation/concurrent_ddl_dml
LOG: statement: INSERT INTO tbl1 (val1, val2) VALUES (1, 1);
2021-04-27 11:09:40.118 UTC [17694956:55] isolation/concurrent_ddl_dml
LOG: statement: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
2021-04-27 11:09:40.119 UTC [10944636:49] isolation/concurrent_ddl_dml
LOG: statement: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE character
varying;

I am not sure but there is some possibility that even though create
slot is successful, the isolation tester got successful in canceling
it, maybe because create_slot is just finished at the same time.

Yeah, we see the test log "canceling step s1_init after 314 seconds"
but don't see any log indicating canceling query.

As we
can see from logs, during this test checkpoint also happened which
could also lead to the slowness of this particular command.

Yes. I also think the checkpoint could somewhat lead to the slowness.
And since create_slot() took 2min to find a consistent snapshot the
system might have already been busy.

Also, I see a lot of messages like below which indicate stats
collector is also quite slow:
2021-04-27 10:57:59.385 UTC [18743536:1] LOG: using stale statistics
instead of current ones because stats collector is not responding

I am not sure if the timeout happened because the machine is slow or
is it in any way related to code. I am seeing some previous failures
due to timeout on this machine [1][2]. In those failures, I see the
"using stale stats...." message.

It seems like a time-dependent issue but I'm wondering why the logical
decoding test failed at this time.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#139Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#138)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 8:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if the timeout happened because the machine is slow or
is it in any way related to code. I am seeing some previous failures
due to timeout on this machine [1][2]. In those failures, I see the
"using stale stats...." message.

It seems like a time-dependent issue but I'm wondering why the logical
decoding test failed at this time.

As per the analysis done till now, it appears to be due to the reason
that the machine is slow which leads to timeout and there appear to be
some prior failures related to timeout as well. I think it is better
to wait for another run (or few runs) to see if this occurs again.

--
With Regards,
Amit Kapila.

#140vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#139)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 8:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 8:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if the timeout happened because the machine is slow or
is it in any way related to code. I am seeing some previous failures
due to timeout on this machine [1][2]. In those failures, I see the
"using stale stats...." message.

It seems like a time-dependent issue but I'm wondering why the logical
decoding test failed at this time.

As per the analysis done till now, it appears to be due to the reason
that the machine is slow which leads to timeout and there appear to be
some prior failures related to timeout as well. I think it is better
to wait for another run (or few runs) to see if this occurs again.

Yes, checkpoint seems to take a lot of time, could be because the
machine is slow. Let's wait for the next run and see.

Regards,
Vignesh

#141Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#135)
Re: Replication slot stats misgivings

On Tue, Apr 27, 2021 at 11:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.

 void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)

I don't think you need to change this interface because
reorderbuffer->private_data points to LogicalDecodingContext. See
StartupDecodingContext. Other than that there is a comment in the code
"Update the decoding stats at transaction prepare/commit/abort...".
This patch should extend that comment by saying something like
"Additionally we send the stats when we spill or stream the changes to
avoid losing them in case the decoding is interrupted."

--
With Regards,
Amit Kapila.

#142vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#141)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 8:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.

void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)

I don't think you need to change this interface because
reorderbuffer->private_data points to LogicalDecodingContext. See
StartupDecodingContext. Other than that there is a comment in the code
"Update the decoding stats at transaction prepare/commit/abort...".
This patch should extend that comment by saying something like
"Additionally we send the stats when we spill or stream the changes to
avoid losing them in case the decoding is interrupted."

Thanks for the comments, Please find the attached v4 patch having the
fixes for the same.

Regards,
Vignesh

Attachments:

v4-0001-Update-replication-statistics-after-every-stream-.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Update-replication-statistics-after-every-stream-.patchDownload
From 533c45c4cbd4545350d464bbf7a2df91ee668a75 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Tue, 27 Apr 2021 10:56:02 +0530
Subject: [PATCH v4] Update replication statistics after every stream/spill.

Currently, replication slot statistics are updated at prepare, commit, and
rollback. Now, if the transaction is interrupted the stats might not get
updated. Fixed this by updating replication statistics after every
stream/spill.
---
 src/backend/replication/logical/decode.c        | 14 ++++++++------
 src/backend/replication/logical/reorderbuffer.c |  6 ++++++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7924581cdc..888e064ec0 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -746,9 +746,10 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/*
-	 * Update the decoding stats at transaction prepare/commit/abort. It is
-	 * not clear that sending more or less frequently than this would be
-	 * better.
+	 * Update the decoding stats at transaction prepare/commit/abort.
+	 * Additionally we send the stats when we spill or stream the changes to
+	 * avoid losing them in case the decoding is interrupted. It is not clear
+	 * that sending more or less frequently than this would be better.
 	 */
 	UpdateDecodingStats(ctx);
 }
@@ -828,9 +829,10 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	ReorderBufferPrepare(ctx->reorder, xid, parsed->twophase_gid);
 
 	/*
-	 * Update the decoding stats at transaction prepare/commit/abort. It is
-	 * not clear that sending more or less frequently than this would be
-	 * better.
+	 * Update the decoding stats at transaction prepare/commit/abort.
+	 * Additionally we send the stats when we spill or stream the changes to
+	 * avoid losing them in case the decoding is interrupted. It is not clear
+	 * that sending more or less frequently than this would be better.
 	 */
 	UpdateDecodingStats(ctx);
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f710053..ceb83bcbf9 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3551,6 +3551,9 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+
+		/* update the decoding stats */
+		UpdateDecodingStats(rb->private_data);
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3920,6 +3923,9 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
 
+	/* update the decoding stats */
+	UpdateDecodingStats(rb->private_data);
+
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
 	Assert(txn->nentries_mem == 0);
-- 
2.25.1

#143Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#141)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.

void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)

I don't think you need to change this interface because
reorderbuffer->private_data points to LogicalDecodingContext. See
StartupDecodingContext.

+1

With this approach, we could still miss the totalTxns and totalBytes
updates if the decoding a large but less than
logical_decoding_work_mem is interrupted, right?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#144vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#143)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 9:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.

void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)

I don't think you need to change this interface because
reorderbuffer->private_data points to LogicalDecodingContext. See
StartupDecodingContext.

+1

With this approach, we could still miss the totalTxns and totalBytes
updates if the decoding a large but less than
logical_decoding_work_mem is interrupted, right?

Yes you are right, I felt that is reasonable and that way it reduces
frequent calls to the stats collector to update the stats.

Regards,
Vignesh

#145Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#143)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 9:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.

void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)

I don't think you need to change this interface because
reorderbuffer->private_data points to LogicalDecodingContext. See
StartupDecodingContext.

+1

With this approach, we could still miss the totalTxns and totalBytes
updates if the decoding a large but less than
logical_decoding_work_mem is interrupted, right?

Right, but is there some simple way to avoid that? I see two
possibilities (a) store stats in ReplicationSlot and then send them at
ReplicationSlotRelease but that will lead to an increase in shared
memory usage and as per the discussion above, we don't want that, (b)
send intermediate stats after decoding say N changes but for that, we
need to additionally compute the size of each change which might add
some overhead.

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

--
With Regards,
Amit Kapila.

#146Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#82)
1 attachment(s)
Re: Replication slot stats misgivings

On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the update! The patch looks good to me.

BTW regarding the commit f5fc2f5b23 that added total_txns and
total_bytes, we add the reorder buffer size (i.g., rb->size) to
rb->totalBytes but I think we should use the transaction size (i.g.,
txn->size) instead:

@@ -1363,6 +1365,11 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
ReorderBufferIterTXNState *state)
dlist_delete(&change->node);
dlist_push_tail(&state->old_change, &change->node);

+       /*
+        * Update the total bytes processed before releasing the current set
+        * of changes and restoring the new set of changes.
+        */
+       rb->totalBytes += rb->size;
        if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
                                        &state->entries[off].segno))
        {
@@ -2363,6 +2370,20 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn,
        ReorderBufferIterTXNFinish(rb, iterstate);
        iterstate = NULL;
+       /*
+        * Update total transaction count and total transaction bytes
+        * processed. Ensure to not count the streamed transaction multiple
+        * times.
+        *
+        * Note that the statistics computation has to be done after
+        * ReorderBufferIterTXNFinish as it releases the serialized change
+        * which we have already accounted in ReorderBufferIterTXNNext.
+        */
+       if (!rbtxn_is_streamed(txn))
+           rb->totalTxns++;
+
+       rb->totalBytes += rb->size;
+

IIUC rb->size could include multiple decoded transactions. So it's not
appropriate to add that value to the counter as the transaction size
passed to the logical decoding plugin. If the reorder buffer process a
transaction while having a large transaction that is being decoded, we
could end up more increasing txn_bytes than necessary.

Please review the attached patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

use_txn_size.patchapplication/x-patch; name=use_txn_size.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f710053..30d9ffa0da 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		rb->totalBytes += entry->txn->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2382,7 +2382,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!rbtxn_is_streamed(txn))
 			rb->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		rb->totalBytes += txn->size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
#147Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#146)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 12:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW regarding the commit f5fc2f5b23 that added total_txns and
total_bytes, we add the reorder buffer size (i.g., rb->size) to
rb->totalBytes but I think we should use the transaction size (i.g.,
txn->size) instead:

You are right about the problem but I think your proposed fix also
won't work because txn->size always has current transaction size which
will be top-transaction in the case when a transaction has multiple
subtransactions. It won't include the subtxn->size. For example, you
can try to decode with below kind of transaction:
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 values(2);
savepoint s2;
insert into t1 values(3);
commit;

I think we can fix it by keeping track of total_size in toptxn as we
are doing for the streaming case in ReorderBufferChangeMemoryUpdate.
We can probably do it for non-streaming cases as well.

--
With Regards,
Amit Kapila.

#148Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#147)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 12:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW regarding the commit f5fc2f5b23 that added total_txns and
total_bytes, we add the reorder buffer size (i.g., rb->size) to
rb->totalBytes but I think we should use the transaction size (i.g.,
txn->size) instead:

You are right about the problem but I think your proposed fix also
won't work because txn->size always has current transaction size which
will be top-transaction in the case when a transaction has multiple
subtransactions. It won't include the subtxn->size.

Right. I missed the point that ReorderBufferProcessTXN() processes
also subtransactions.

I think we can fix it by keeping track of total_size in toptxn as we
are doing for the streaming case in ReorderBufferChangeMemoryUpdate.
We can probably do it for non-streaming cases as well.

Agreed.

I've updated the patch. What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

use_total_size_v2.patchapplication/octet-stream; name=use_total_size_v2.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f710053..c4cae7c3c2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		rb->totalBytes += entry->txn->total_size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2382,7 +2382,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!rbtxn_is_streamed(txn))
 			rb->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		rb->totalBytes += txn->total_size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3073,7 +3073,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 {
 	Size		sz;
 	ReorderBufferTXN *txn;
-	ReorderBufferTXN *toptxn = NULL;
+	ReorderBufferTXN *toptxn;
 
 	Assert(change->txn);
 
@@ -3087,14 +3087,11 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 
 	txn = change->txn;
 
-	/* If streaming supported, update the total size in top level as well. */
-	if (ReorderBufferCanStream(rb))
-	{
-		if (txn->toptxn != NULL)
-			toptxn = txn->toptxn;
-		else
-			toptxn = txn;
-	}
+	/* Update the total size in top level as well. */
+	if (txn->toptxn != NULL)
+		toptxn = txn->toptxn;
+	else
+		toptxn = txn;
 
 	sz = ReorderBufferChangeSize(change);
 
@@ -3104,8 +3101,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size += sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size += sz;
+		toptxn->total_size += sz;
 	}
 	else
 	{
@@ -3114,8 +3110,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size -= sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size -= sz;
+		toptxn->total_size -= sz;
 	}
 
 	Assert(txn->size <= rb->size);
#149Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#148)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 4:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think we can fix it by keeping track of total_size in toptxn as we
are doing for the streaming case in ReorderBufferChangeMemoryUpdate.
We can probably do it for non-streaming cases as well.

Agreed.

I've updated the patch. What do you think?

@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
ReorderBufferIterTXNState *state)
  * Update the total bytes processed before releasing the current set
  * of changes and restoring the new set of changes.
  */
- rb->totalBytes += rb->size;
+ rb->totalBytes += entry->txn->total_size;
  if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
  &state->entries[off].segno))

I have not tested this but won't in the above change you need to check
txn->toptxn for subtxns?

--
With Regards,
Amit Kapila.

#150Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#145)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 9:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 12:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 27, 2021 at 11:02 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Apr 27, 2021 at 9:48 AM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes to update statistics during
spill/stream which prevents the statistics from being lost during
interrupt.

void
-UpdateDecodingStats(LogicalDecodingContext *ctx)
+UpdateDecodingStats(ReorderBuffer *rb)

I don't think you need to change this interface because
reorderbuffer->private_data points to LogicalDecodingContext. See
StartupDecodingContext.

+1

With this approach, we could still miss the totalTxns and totalBytes
updates if the decoding a large but less than
logical_decoding_work_mem is interrupted, right?

Right, but is there some simple way to avoid that? I see two
possibilities (a) store stats in ReplicationSlot and then send them at
ReplicationSlotRelease but that will lead to an increase in shared
memory usage and as per the discussion above, we don't want that, (b)
send intermediate stats after decoding say N changes but for that, we
need to additionally compute the size of each change which might add
some overhead.

Right.

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#151Tom Lane
tgl@sss.pgh.pa.us
In reply to: Masahiko Sawada (#150)
Re: Replication slot stats misgivings

It seems that the test case added by f5fc2f5b2 is still a bit
unstable, even after c64dcc7fe:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2021-04-23%2006%3A20%3A12

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2021-04-24%2018%3A20%3A10

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&amp;dt=2021-04-28%2017%3A53%3A14

(The snapper run fails to show regression.diffs, so it's not certain
that it's the same failure as peripatus, but ...)

regards, tom lane

#152Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#151)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 5:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems that the test case added by f5fc2f5b2 is still a bit
unstable, even after c64dcc7fe:

Hmm, I don't see the exact cause yet but there are two possibilities:
some transactions were really spilled, and it showed the old stats due
to losing the drop (and create) slot messages. For the former case, it
seems to better to create the slot just before the insertion and
setting logical_decoding_work_mem to the default (64MB). For the
latter case, maybe we can use a different name slot than the name used
in other tests?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#153Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#152)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 4:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 29, 2021 at 5:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems that the test case added by f5fc2f5b2 is still a bit
unstable, even after c64dcc7fe:

Hmm, I don't see the exact cause yet but there are two possibilities:
some transactions were really spilled,

This is the first test and inserts just one small record, so how it
can lead to spill of data. Do you mean to say that may be some
background process has written some transaction which leads to a spill
of data?

and it showed the old stats due
to losing the drop (and create) slot messages.

Yeah, something like this could happen. Another possibility here could
be that before the stats collector has processed drop and create
messages, we have enquired about the stats which lead to it giving us
the old stats. Note, that we don't wait for 'drop' or 'create' message
to be delivered. So, there is a possibility of the same. What do you
think?

For the former case, it
seems to better to create the slot just before the insertion and
setting logical_decoding_work_mem to the default (64MB). For the
latter case, maybe we can use a different name slot than the name used
in other tests?

How about doing both of the above suggestions? Alternatively, we can
wait for both 'drop' and 'create' message to be delivered but that
might be overkill.

--
With Regards,
Amit Kapila.

#154Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#153)
Re: Replication slot stats misgivings

Amit Kapila <amit.kapila16@gmail.com> writes:

This is the first test and inserts just one small record, so how it
can lead to spill of data. Do you mean to say that may be some
background process has written some transaction which leads to a spill
of data?

autovacuum, say?

Yeah, something like this could happen. Another possibility here could
be that before the stats collector has processed drop and create
messages, we have enquired about the stats which lead to it giving us
the old stats. Note, that we don't wait for 'drop' or 'create' message
to be delivered. So, there is a possibility of the same. What do you
think?

You should take a close look at the stats test in the main regression
tests. We had to jump through *high* hoops to get that to be stable,
and yet it still fails semi-regularly. This looks like pretty much the
same thing, and so I'm pessimistically inclined to guess that it will
never be entirely stable.

(At least not before the fabled stats collector rewrite, which may well
introduce some entirely new set of failure modes.)

Do we really need this test in this form? Perhaps it could be converted
to a TAP test that's a bit more forgiving.

regards, tom lane

#155Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#154)
Re: Replication slot stats misgivings

On 2021-04-28 23:20:00 -0400, Tom Lane wrote:

(At least not before the fabled stats collector rewrite, which may well
introduce some entirely new set of failure modes.)

FWIW, I added a function that forces a flush there. That can be done
synchronously and the underlying functionality needs to exist anyway to
deal with backend exit. Makes it a *lot* easier to write tests for stats
related things...

#156Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#154)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 8:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

This is the first test and inserts just one small record, so how it
can lead to spill of data. Do you mean to say that may be some
background process has written some transaction which leads to a spill
of data?

autovacuum, say?

Yeah, something like this could happen. Another possibility here could
be that before the stats collector has processed drop and create
messages, we have enquired about the stats which lead to it giving us
the old stats. Note, that we don't wait for 'drop' or 'create' message
to be delivered. So, there is a possibility of the same. What do you
think?

You should take a close look at the stats test in the main regression
tests. We had to jump through *high* hoops to get that to be stable,
and yet it still fails semi-regularly. This looks like pretty much the
same thing, and so I'm pessimistically inclined to guess that it will
never be entirely stable.

True, it is possible that we can't make it entirely stable but I would
like to try some more before giving up on this. Otherwise, I guess the
other possibility is to remove some of the latest tests added or
probably change them to be more forgiving. For example, we can change
the currently failing test to not check 'spill*' count and rely on
just 'total*' count which will work even in scenarios we discussed for
this failure but it will reduce the efficiency/completeness of the
test case.

(At least not before the fabled stats collector rewrite, which may well
introduce some entirely new set of failure modes.)

Do we really need this test in this form? Perhaps it could be converted
to a TAP test that's a bit more forgiving.

We have a TAP test for slot stats but there we are checking some
scenarios across the restart. We can surely move these tests also
there but it is not apparent to me how it can create a difference?

--
With Regards,
Amit Kapila.

#157Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#150)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

--
With Regards,
Amit Kapila.

#158Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#153)
1 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 11:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 4:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 29, 2021 at 5:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems that the test case added by f5fc2f5b2 is still a bit
unstable, even after c64dcc7fe:

Hmm, I don't see the exact cause yet but there are two possibilities:
some transactions were really spilled,

This is the first test and inserts just one small record, so how it
can lead to spill of data. Do you mean to say that may be some
background process has written some transaction which leads to a spill
of data?

Not sure but I thought that the logical decoding started to decodes
from a relatively old point for some reason and decoded incomplete
transactions that weren’t shown in the result.

and it showed the old stats due
to losing the drop (and create) slot messages.

Yeah, something like this could happen. Another possibility here could
be that before the stats collector has processed drop and create
messages, we have enquired about the stats which lead to it giving us
the old stats. Note, that we don't wait for 'drop' or 'create' message
to be delivered. So, there is a possibility of the same. What do you
think?

Yeah, that could happen even if any message didn't get dropped.

For the former case, it
seems to better to create the slot just before the insertion and
setting logical_decoding_work_mem to the default (64MB). For the
latter case, maybe we can use a different name slot than the name used
in other tests?

How about doing both of the above suggestions? Alternatively, we can
wait for both 'drop' and 'create' message to be delivered but that
might be overkill.

Agreed. Attached the patch doing both things.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

fix_stats_test.patchapplication/octet-stream; name=fix_stats_test.patchDownload
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 86d594ca15..7d174ee8d2 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -1,6 +1,6 @@
 -- predictability
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_stats', 'test_decoding');
  ?column? 
 ----------
  init
@@ -23,7 +23,7 @@ BEGIN
                   ELSE (spill_txns > 0)
              END
       INTO updated
-      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot_stats';
 
     ELSE
 
@@ -32,7 +32,7 @@ BEGIN
                   ELSE (total_txns > 0)
              END
       INTO updated
-      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot_stats';
 
     END IF;
 
@@ -52,8 +52,9 @@ BEGIN
 END
 $$ LANGUAGE plpgsql;
 -- non-spilled xact
+SET logical_decoding_work_mem to '64MB';
 INSERT INTO stats_test values(1);
-SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 -------
      3
@@ -66,13 +67,14 @@ SELECT wait_for_decode_stats(false, false);
 (1 row)
 
 SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
------------------+------------+-------------+------------+-------------
- regression_slot | t          | t           | t          | t
+       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
+-----------------------+------------+-------------+------------+-------------
+ regression_slot_stats | t          | t           | t          | t
 (1 row)
 
+RESET logical_decoding_work_mem;
 -- reset the slot stats, and wait for stats collector's total txn to reset
-SELECT pg_stat_reset_replication_slot('regression_slot');
+SELECT pg_stat_reset_replication_slot('regression_slot_stats');
  pg_stat_reset_replication_slot 
 --------------------------------
  
@@ -85,16 +87,16 @@ SELECT wait_for_decode_stats(true, false);
 (1 row)
 
 SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
------------------+------------+-------------+------------+-------------
- regression_slot |          0 |           0 |          0 |           0
+       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
+-----------------------+------------+-------------+------------+-------------
+ regression_slot_stats |          0 |           0 |          0 |           0
 (1 row)
 
 -- spilling the xact
 BEGIN;
 INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
 COMMIT;
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 -------
   5002
@@ -110,13 +112,13 @@ SELECT wait_for_decode_stats(false, true);
 (1 row)
 
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
------------------+------------+-------------+------------+-------------
- regression_slot | t          | t           | t          | t
+       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
+-----------------------+------------+-------------+------------+-------------
+ regression_slot_stats | t          | t           | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
-SELECT pg_stat_reset_replication_slot('regression_slot');
+SELECT pg_stat_reset_replication_slot('regression_slot_stats');
  pg_stat_reset_replication_slot 
 --------------------------------
  
@@ -129,13 +131,13 @@ SELECT wait_for_decode_stats(true, true);
 (1 row)
 
 SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
------------------+------------+-------------+------------+-------------
- regression_slot |          0 |           0 |          0 |           0
+       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
+-----------------------+------------+-------------+------------+-------------
+ regression_slot_stats |          0 |           0 |          0 |           0
 (1 row)
 
 -- decode and check stats again.
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 -------
   5002
@@ -148,30 +150,30 @@ SELECT wait_for_decode_stats(false, true);
 (1 row)
 
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-    slot_name    | spill_txns | spill_count | total_txns | total_bytes 
------------------+------------+-------------+------------+-------------
- regression_slot | t          | t           | t          | t
+       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
+-----------------------+------------+-------------+------------+-------------
+ regression_slot_stats | t          | t           | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
 BEGIN;
 SELECT slot_name FROM pg_stat_replication_slots;
-    slot_name    
------------------
- regression_slot
+       slot_name       
+-----------------------
+ regression_slot_stats
 (1 row)
 
 SELECT slot_name FROM pg_stat_replication_slots;
-    slot_name    
------------------
- regression_slot
+       slot_name       
+-----------------------
+ regression_slot_stats
 (1 row)
 
 COMMIT;
 DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
-SELECT pg_drop_replication_slot('regression_slot');
+SELECT pg_drop_replication_slot('regression_slot_stats');
  pg_drop_replication_slot 
 --------------------------
  
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 03fc27e537..263568b00c 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -1,7 +1,7 @@
 -- predictability
 SET synchronous_commit = on;
 
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_stats', 'test_decoding');
 
 CREATE TABLE stats_test(data text);
 
@@ -21,7 +21,7 @@ BEGIN
                   ELSE (spill_txns > 0)
              END
       INTO updated
-      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot_stats';
 
     ELSE
 
@@ -30,7 +30,7 @@ BEGIN
                   ELSE (total_txns > 0)
              END
       INTO updated
-      FROM pg_stat_replication_slots WHERE slot_name='regression_slot';
+      FROM pg_stat_replication_slots WHERE slot_name='regression_slot_stats';
 
     END IF;
 
@@ -51,13 +51,15 @@ END
 $$ LANGUAGE plpgsql;
 
 -- non-spilled xact
+SET logical_decoding_work_mem to '64MB';
 INSERT INTO stats_test values(1);
-SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
 SELECT wait_for_decode_stats(false, false);
 SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+RESET logical_decoding_work_mem;
 
 -- reset the slot stats, and wait for stats collector's total txn to reset
-SELECT pg_stat_reset_replication_slot('regression_slot');
+SELECT pg_stat_reset_replication_slot('regression_slot_stats');
 SELECT wait_for_decode_stats(true, false);
 SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
@@ -65,7 +67,7 @@ SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_
 BEGIN;
 INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
 COMMIT;
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
 
 -- Check stats, wait for the stats collector to update. We can't test the
 -- exact stats count as that can vary if any background transaction (say by
@@ -74,12 +76,12 @@ SELECT wait_for_decode_stats(false, true);
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
-SELECT pg_stat_reset_replication_slot('regression_slot');
+SELECT pg_stat_reset_replication_slot('regression_slot_stats');
 SELECT wait_for_decode_stats(true, true);
 SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
 
 -- decode and check stats again.
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
 SELECT wait_for_decode_stats(false, true);
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 
@@ -92,4 +94,4 @@ COMMIT;
 
 DROP FUNCTION wait_for_decode_stats(bool, bool);
 DROP TABLE stats_test;
-SELECT pg_drop_replication_slot('regression_slot');
+SELECT pg_drop_replication_slot('regression_slot_stats');
#159Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#158)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

How about doing both of the above suggestions? Alternatively, we can
wait for both 'drop' and 'create' message to be delivered but that
might be overkill.

Agreed. Attached the patch doing both things.

Thanks, the patch LGTM. I'll wait for a day before committing to see
if anyone has better ideas.

--
With Regards,
Amit Kapila.

#160vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#158)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 29, 2021 at 11:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 4:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 29, 2021 at 5:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It seems that the test case added by f5fc2f5b2 is still a bit
unstable, even after c64dcc7fe:

Hmm, I don't see the exact cause yet but there are two possibilities:
some transactions were really spilled,

This is the first test and inserts just one small record, so how it
can lead to spill of data. Do you mean to say that may be some
background process has written some transaction which leads to a spill
of data?

Not sure but I thought that the logical decoding started to decodes
from a relatively old point for some reason and decoded incomplete
transactions that weren’t shown in the result.

and it showed the old stats due
to losing the drop (and create) slot messages.

Yeah, something like this could happen. Another possibility here could
be that before the stats collector has processed drop and create
messages, we have enquired about the stats which lead to it giving us
the old stats. Note, that we don't wait for 'drop' or 'create' message
to be delivered. So, there is a possibility of the same. What do you
think?

Yeah, that could happen even if any message didn't get dropped.

For the former case, it
seems to better to create the slot just before the insertion and
setting logical_decoding_work_mem to the default (64MB). For the
latter case, maybe we can use a different name slot than the name used
in other tests?

How about doing both of the above suggestions? Alternatively, we can
wait for both 'drop' and 'create' message to be delivered but that
might be overkill.

Agreed. Attached the patch doing both things.

Having a different slot name should solve the problem. The patch looks
good to me.

Regards,
Vignesh

#161Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#149)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, Apr 28, 2021 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 4:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
ReorderBufferIterTXNState *state)
* Update the total bytes processed before releasing the current set
* of changes and restoring the new set of changes.
*/
- rb->totalBytes += rb->size;
+ rb->totalBytes += entry->txn->total_size;
if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
&state->entries[off].segno))

I have not tested this but won't in the above change you need to check
txn->toptxn for subtxns?

Now, I am able to reproduce this issue:
Create table t1(c1 int);
select pg_create_logical_replication_slot('s', 'test_decoding');
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 select generate_series(1, 100000);
commit;

postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL, NULL);
count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
s1 | 0 | 0 | 0 | 0 |
0 | 0 | 2 | 13200672 | 2021-04-29
14:33:55.156566+05:30
(1 row)

select * from pg_stat_reset_replication_slot('s1');

Now reduce the logical decoding work mem to allow spilling.
postgres=# set logical_decoding_work_mem='64kB';
SET
postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL, NULL);
count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
s1 | 1 | 202 | 13200000 | 0 |
0 | 0 | 2 | 672 | 2021-04-29
14:35:21.836613+05:30
(1 row)

You can notice that after we have allowed spilling the 'total_bytes'
stats is showing a different value. The attached patch fixes the issue
for me. Let me know what do you think about this?

--
With Regards,
Amit Kapila.

Attachments:

use_total_size_v3.patchapplication/octet-stream; name=use_total_size_v3.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f7100534..ad0b6b69e76 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1369,7 +1369,10 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		if (entry->txn->toptxn)
+			rb->totalBytes += entry->txn->toptxn->total_size;
+		else
+			rb->totalBytes += entry->txn->total_size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2382,7 +2385,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!rbtxn_is_streamed(txn))
 			rb->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		rb->totalBytes += txn->total_size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3073,7 +3076,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 {
 	Size		sz;
 	ReorderBufferTXN *txn;
-	ReorderBufferTXN *toptxn = NULL;
+	ReorderBufferTXN *toptxn;
 
 	Assert(change->txn);
 
@@ -3087,14 +3090,14 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 
 	txn = change->txn;
 
-	/* If streaming supported, update the total size in top level as well. */
-	if (ReorderBufferCanStream(rb))
-	{
-		if (txn->toptxn != NULL)
-			toptxn = txn->toptxn;
-		else
-			toptxn = txn;
-	}
+	/*
+	 * Update the total size in top level as well. This is later used to
+	 * compute the decoding stats.
+	 */
+	if (txn->toptxn != NULL)
+		toptxn = txn->toptxn;
+	else
+		toptxn = txn;
 
 	sz = ReorderBufferChangeSize(change);
 
@@ -3104,8 +3107,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size += sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size += sz;
+		toptxn->total_size += sz;
 	}
 	else
 	{
@@ -3114,8 +3116,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size -= sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size -= sz;
+		toptxn->total_size -= sz;
 	}
 
 	Assert(txn->size <= rb->size);
#162vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#161)
1 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 3:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Wed, Apr 28, 2021 at 4:51 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
ReorderBufferIterTXNState *state)
* Update the total bytes processed before releasing the current set
* of changes and restoring the new set of changes.
*/
- rb->totalBytes += rb->size;
+ rb->totalBytes += entry->txn->total_size;
if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
&state->entries[off].segno))

I have not tested this but won't in the above change you need to check
txn->toptxn for subtxns?

Now, I am able to reproduce this issue:
Create table t1(c1 int);
select pg_create_logical_replication_slot('s', 'test_decoding');
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 select generate_series(1, 100000);
commit;

postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL,

NULL);

count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset

-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------

s1 | 0 | 0 | 0 | 0 |
0 | 0 | 2 | 13200672 | 2021-04-29
14:33:55.156566+05:30
(1 row)

select * from pg_stat_reset_replication_slot('s1');

Now reduce the logical decoding work mem to allow spilling.
postgres=# set logical_decoding_work_mem='64kB';
SET
postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL,

NULL);

count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset

-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------

s1 | 1 | 202 | 13200000 | 0 |
0 | 0 | 2 | 672 | 2021-04-29
14:35:21.836613+05:30
(1 row)

You can notice that after we have allowed spilling the 'total_bytes'
stats is showing a different value. The attached patch fixes the issue
for me. Let me know what do you think about this?

I found one issue with the following scenario when testing with
logical_decoding_work_mem as 64kB:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;
SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 |
0 | 0 | 1 | *4262016* | 2021-04-29
17:50:00.080663+05:30
(1 row)

Same thing works fine with logical_decoding_work_mem as 64MB:
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 |
0 | 0 | 1 | *2640000* | 2021-04-29
17:50:00.080663+05:30
(1 row)

The patch required one change:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->*total_size*;
The above should be changed to:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->*size*;

Attached patch fixes the issue.
Thoughts?

Regards,
Vignesh

Attachments:

use_total_size_v4.patchtext/x-patch; charset=US-ASCII; name=use_total_size_v4.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f710053..cdf46a36af 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1369,7 +1369,10 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		if (entry->txn->toptxn)
+			rb->totalBytes += entry->txn->toptxn->total_size;
+		else
+			rb->totalBytes += entry->txn->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2382,7 +2385,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!rbtxn_is_streamed(txn))
 			rb->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		rb->totalBytes += txn->total_size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3073,7 +3076,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 {
 	Size		sz;
 	ReorderBufferTXN *txn;
-	ReorderBufferTXN *toptxn = NULL;
+	ReorderBufferTXN *toptxn;
 
 	Assert(change->txn);
 
@@ -3087,14 +3090,14 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 
 	txn = change->txn;
 
-	/* If streaming supported, update the total size in top level as well. */
-	if (ReorderBufferCanStream(rb))
-	{
-		if (txn->toptxn != NULL)
-			toptxn = txn->toptxn;
-		else
-			toptxn = txn;
-	}
+	/*
+	 * Update the total size in top level as well. This is later used to
+	 * compute the decoding stats.
+	 */
+	if (txn->toptxn != NULL)
+		toptxn = txn->toptxn;
+	else
+		toptxn = txn;
 
 	sz = ReorderBufferChangeSize(change);
 
@@ -3104,8 +3107,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size += sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size += sz;
+		toptxn->total_size += sz;
 	}
 	else
 	{
@@ -3114,8 +3116,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size -= sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size -= sz;
+		toptxn->total_size -= sz;
 	}
 
 	Assert(txn->size <= rb->size);
#163Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#162)
2 attachment(s)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 9:44 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 29, 2021 at 3:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 4:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
ReorderBufferIterTXNState *state)
* Update the total bytes processed before releasing the current set
* of changes and restoring the new set of changes.
*/
- rb->totalBytes += rb->size;
+ rb->totalBytes += entry->txn->total_size;
if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
&state->entries[off].segno))

I have not tested this but won't in the above change you need to check
txn->toptxn for subtxns?

Now, I am able to reproduce this issue:
Create table t1(c1 int);
select pg_create_logical_replication_slot('s', 'test_decoding');
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 select generate_series(1, 100000);
commit;

postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL, NULL);
count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
s1 | 0 | 0 | 0 | 0 |
0 | 0 | 2 | 13200672 | 2021-04-29
14:33:55.156566+05:30
(1 row)

select * from pg_stat_reset_replication_slot('s1');

Now reduce the logical decoding work mem to allow spilling.
postgres=# set logical_decoding_work_mem='64kB';
SET
postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL, NULL);
count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
s1 | 1 | 202 | 13200000 | 0 |
0 | 0 | 2 | 672 | 2021-04-29
14:35:21.836613+05:30
(1 row)

You can notice that after we have allowed spilling the 'total_bytes'
stats is showing a different value. The attached patch fixes the issue
for me. Let me know what do you think about this?

I found one issue with the following scenario when testing with logical_decoding_work_mem as 64kB:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;
SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 | 0 | 0 | 1 | 4262016 | 2021-04-29 17:50:00.080663+05:30
(1 row)

Same thing works fine with logical_decoding_work_mem as 64MB:
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 | 0 | 0 | 1 | 2640000 | 2021-04-29 17:50:00.080663+05:30
(1 row)

The patch required one change:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->total_size;
The above should be changed to:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->size;

Attached patch fixes the issue.
Thoughts?

After more thought, it seems to me that we should use txn->size here
regardless of the top transaction or subtransaction since we're
iterating changes associated with a transaction that is either the top
transaction or a subtransaction. Otherwise, I think if some
subtransactions are not serialized, we will end up adding bytes
including those subtransactions during iterating other serialized
subtransactions. Whereas in ReorderBufferProcessTXN() we should use
txn->total_txn since txn is always the top transaction. I've attached
another patch to do this.

BTW, to check how many bytes of changes are passed to the decoder
plugin I wrote and attached a simple decoder plugin that calculates
the total amount of bytes for each change on the decoding plugin side.
I think what we expect is that the amounts of change bytes shown on
both sides are matched. You can build it in the same way as other
third-party modules and need to create decoder_stats extension.

The basic usage is to execute pg_logical_slot_get/peek_changes() and
mystats('slot_name') in the same process. During decoding the changes,
decoder_stats plugin accumulates the change bytes in the local memory
and mystats() SQL function, defined in decoder_stats extension, shows
those stats.

I've done some test with v4 patch. For instance, with the following
workload the output is expected:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;

mystats() functions shows:

=# select pg_logical_slot_get_changes('test_slot', null, null);
=# select change_type, change_bytes, total_bytes from mystats('test_slot');
change_type | change_bytes | total_bytes
-------------+--------------+-------------
INSERT | 2578 kB | 2578 kB
(1 row)

'change_bytes' and 'total_bytes' are the total amount of changes
calculated on the plugin side and core side, respectively. Those are
matched, which is expected. On the other hand, with the following
workload those are not matched:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s2;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;

=# select pg_logical_slot_get_changes('test_slot', null, null);
=# select change_type, change_bytes, total_bytes from mystats('test_slot');
change_type | change_bytes | total_bytes
-------------+--------------+-------------
INSERT | 3867 kB | 5451 kB
(1 row)

This is fixed by the attached v5 patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

use_total_size_v5.patchapplication/octet-stream; name=use_total_size_v5.patchDownload
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f710053..c04fe1cf4b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		 * Update the total bytes processed before releasing the current set
 		 * of changes and restoring the new set of changes.
 		 */
-		rb->totalBytes += rb->size;
+		rb->totalBytes += entry->txn->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2382,7 +2382,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!rbtxn_is_streamed(txn))
 			rb->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		rb->totalBytes += txn->total_size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3073,7 +3073,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 {
 	Size		sz;
 	ReorderBufferTXN *txn;
-	ReorderBufferTXN *toptxn = NULL;
+	ReorderBufferTXN *toptxn;
 
 	Assert(change->txn);
 
@@ -3087,14 +3087,14 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 
 	txn = change->txn;
 
-	/* If streaming supported, update the total size in top level as well. */
-	if (ReorderBufferCanStream(rb))
-	{
-		if (txn->toptxn != NULL)
-			toptxn = txn->toptxn;
-		else
-			toptxn = txn;
-	}
+	/*
+	 * Update the total size in top level as well. This is later used to
+	 * compute the decoding stats.
+	 */
+	if (txn->toptxn != NULL)
+		toptxn = txn->toptxn;
+	else
+		toptxn = txn;
 
 	sz = ReorderBufferChangeSize(change);
 
@@ -3104,8 +3104,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size += sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size += sz;
+		toptxn->total_size += sz;
 	}
 	else
 	{
@@ -3114,8 +3113,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size -= sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size -= sz;
+		toptxn->total_size -= sz;
 	}
 
 	Assert(txn->size <= rb->size);
decoder_stats.tar.bz2application/x-bzip2; name=decoder_stats.tar.bz2Download
BZh91AY&SY��R�`��������?�������`>�:���v�qu�O9�m�`�
��:����
D
�@�2 AM��G����#FC(h1�bh4h@�Bd�PdCLM12i��F��j444@�4�i�Md4i�a&�$�H)����f����G�OS�4i����h�M�$�Q�4=���
h
F"A& L�0����FS��#���<�OI����F�y��@�8�i]�\K:���@�&3�@�%1�L�������	U$�U���A��X�w"�������P�T��0���B,�<����R,����UTEb,;5������>G�Hy�is�S�Z=��	C3h�]$��[����P�L��*S@�|����(�f�6`��`�y�y���D���	��E&\o���g�F'�b��"�=���8H��X�`������j��Q(����K;G���Kiy��+!��Z5����]@��o�Ap�^r���f6�:)2U��a�B�d}6\:�;��w��H>�
`t��y�T�%K!$'>F@��t������s��Ar�z��1	�0��Q��O�U�#kY������0K�~]l�^J�&l���{����g6`���A��A|}�V����TH��0��$��a^�3r=����;���
�����Fv�D|
����
6�k.��.h���cY�r��<��"��f�BZ��������.�o��
���pj���$��N>�]g��/��Y��4/�]��]�VI�{!9�`�O�0��%w�)3��3PmK�CL]�L�Q��t�6g��.n�zH�e;�'���NN��m�D7l�4���v\}��;�'�c�'��+������6��i���h�W6�Q���m�	e�^@�![��BU�wD�����9.�e���[���������������F8�Q�6NuM��i�{={�����7�I��a�-L������������OQn%|m-������mV�i��2�`��Fr����s��h��8�0�=��",���f}��[��J$�v|�>��} fg�����������'b�P��/�\=�IzJ�&Mu����������N�4s��'����U]���v�`p��
0_��f���54�S�'I��T&�$��p���F9V(�n�r��q$��r7u�|*E�B�����@"�w��n�0��*�P��y�����!?�j`�>������}�|�u�@@C7�a�Ly_*��}V���[���bo5���q�����Aa[-hn@��(*�9f�3�5;�"T
���}D`��N���
�M�����8����/�s-,�$j�p��T�<��hz{;8R6Z�B1�\����x���0�[�O�}	��y	�@�K39"A����6|��h� 2������z���X���P��a
�(��e�38�=B�����#���|��*���i�4b�m�]-d`F���|q�s+�������	e�!H��1f%���0�4VNo�w�	���_YE|S��"�2�uPxf+b�"�<<���DTK����m����o=x�R���A*u�y"I*�-�ZD���d���8�Kl�
(����������,�f��w=,�E'�]����>m�������y�����>��9}��(����)Da*�@LM�G$�f�&p�%��K��<K�a�����d�������?`��gDN��/oq�/������]x�<y)(��yh��/���I�+��:f��D�c�;y�d8�\�LaI��
w���N�/
������\���^pMcvJI��3�@Q�-	%���`Y�b��0VA��/�e����i0��\{v��?��nZ8����.���-�Kj��>�J"""""""�JT�c�Q��)�����7g��F�I��a�������:Az�*\�/�u��z�n�T0��T�m�y+������T�(��kf�$�y�b�U�W&�����`���R�du}KpG^�������y�\������N<������B�/�&#=��J0���z������Em0��L� UA�6�d���9��fI4��Jf���/A���4�&��K~pv��z{���@0��;��������&�6,l�����'?r>�8�0���5�I�������S��v��*��00�n.,���WY$��U�[�'?�i?%���9W�����zA���p�$�0�c���70��d
+n +!	i�O���H�y�*
�h���{�P8��XX�����1��l4����m�1Wo.0����N�G�%Qs���������05��O:v�5�+t]�����W���[]��������`*<E��I���aA����46=��1Fi��\�	P
��/������	"�����Rb"�+Aq �����"�	eq�>�
FI�����<� ���]t4��DeH����Z��.��4��I^2:X�
aR�'4, V��%OM��C�m��sf�p�n�g0�:@I�5���F"
��2&����!������h�M�������\\�O�����o���!��nLn"!�VA6��p��E:�Z=m&������������:���0f���ukV����0v��������
]|�����6d
��)����|��}����?o��J����f�8�;8�����HA�G����/iP�
�K�(��������40���T��4�<�tJ���R��q�`�T��g��s�a
M�������R�5��k�@��YeUc�b�r�����
G�'�~�(�D���y���0_+4��+K�n[YU1l���j�����`+�=<���i,��Y!����F���I!�j!���L,,�8K!����:�
B9�+���K���#@B�Y�"3�D��������M�la�6 K������%ID{�vG�"F�-M�"�{�-f]�F6�0�Rw�������n�<���6��y�M/��
J>`�6$���i����4l��3j��AF$�����HY !�-�l�
E���+y�
=�|�A����\M�(1i�{\������>6�]��h~�"3��b�5�B;|,�3���p�VM���	&nQf���	�H7�E����@q��'�����q��n*�U�1r���f�v
����I�CIZ���F~�V�����Z�9nZ�,��H�@t�q��0�1�E�0-�z_������#X�4�,W�A��<|��d1���������L
<�CU����
p�dD����i���2�;����J���0*-������P�i�X�����D��e���K�m/�A�]�1���bbV�GG��	e�
"���q�D�a�{u��z�8vy��-��?!�������J��Hh�!v\=������9�{�����M�}��a8Q�~����L���[�Hq��
��3�v�4�zI��y��_M�x/,�'{C���	0o@5
�����G�%���?I����1o?��ItW���{_�B�H����]�#������M��S�!�(���8�k�HW�i����<;�������{��Zq�:�������O�	ww�Ziu�c�!��ZetQ<r�0	����7Kp�����9.Q�f0���)&����-��tPK���8�OTcS������B��!�0(Z`��X,�E�����X�l�A����;�5�le����
��SMc����6�N����X��;�=05��5#�������
H4`��B7�N~��G$���mc�	r�gJ������yy��IH`K���%UE2{���@����^H.�����p�5�
X���;f�����d�T������FD]@�E���U���*��{�i�n��pId�!�`
����7�@04ysE<��-�z����#��P��!eGD!fA�m?%
0�i]��_��W8�/t���9I��k��N����HZYH���� �@�{e���C��8�wNkgk�7�,AqQ����	w
a�[i���U6����������ZIH�����0DgA��9<�����G��`1f:��
nH�)�J��3,v���u�s9��A��������#��z�N��jG{f ��eJ�gs@����a|��8��\$��0*��xnKo��Q z�������X���nd0�������8A.@�rE8P���R�
#164Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#159)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 12:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

How about doing both of the above suggestions? Alternatively, we can
wait for both 'drop' and 'create' message to be delivered but that
might be overkill.

Agreed. Attached the patch doing both things.

Thanks, the patch LGTM. I'll wait for a day before committing to see
if anyone has better ideas.

Pushed.

--
With Regards,
Amit Kapila.

#165vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#163)
Re: Replication slot stats misgivings

On Fri, Apr 30, 2021 at 5:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Apr 29, 2021 at 9:44 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Apr 29, 2021 at 3:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 4:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

@@ -1369,7 +1369,7 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb,
ReorderBufferIterTXNState *state)
* Update the total bytes processed before releasing the current set
* of changes and restoring the new set of changes.
*/
- rb->totalBytes += rb->size;
+ rb->totalBytes += entry->txn->total_size;
if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
&state->entries[off].segno))

I have not tested this but won't in the above change you need to check
txn->toptxn for subtxns?

Now, I am able to reproduce this issue:
Create table t1(c1 int);
select pg_create_logical_replication_slot('s', 'test_decoding');
Begin;
insert into t1 values(1);
savepoint s1;
insert into t1 select generate_series(1, 100000);
commit;

postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL, NULL);
count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
s1 | 0 | 0 | 0 | 0 |
0 | 0 | 2 | 13200672 | 2021-04-29
14:33:55.156566+05:30
(1 row)

select * from pg_stat_reset_replication_slot('s1');

Now reduce the logical decoding work mem to allow spilling.
postgres=# set logical_decoding_work_mem='64kB';
SET
postgres=# select count(*) from pg_logical_slot_peek_changes('s1', NULL, NULL);
count
--------
100005
(1 row)

postgres=# select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns |
stream_count | stream_bytes | total_txns | total_bytes |
stats_reset
-----------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
s1 | 1 | 202 | 13200000 | 0 |
0 | 0 | 2 | 672 | 2021-04-29
14:35:21.836613+05:30
(1 row)

You can notice that after we have allowed spilling the 'total_bytes'
stats is showing a different value. The attached patch fixes the issue
for me. Let me know what do you think about this?

I found one issue with the following scenario when testing with logical_decoding_work_mem as 64kB:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;
SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot1', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 | 0 | 0 | 1 | 4262016 | 2021-04-29 17:50:00.080663+05:30
(1 row)

Same thing works fine with logical_decoding_work_mem as 64MB:
select * from pg_stat_replication_slots;
slot_name | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset
------------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+----------------------------------
regression_slot1 | 6 | 154 | 9130176 | 0 | 0 | 0 | 1 | 2640000 | 2021-04-29 17:50:00.080663+05:30
(1 row)

The patch required one change:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->total_size;
The above should be changed to:
- rb->totalBytes += rb->size;
+ if (entry->txn->toptxn)
+ rb->totalBytes += entry->txn->toptxn->total_size;
+ else
+ rb->totalBytes += entry->txn->size;

Attached patch fixes the issue.
Thoughts?

After more thought, it seems to me that we should use txn->size here
regardless of the top transaction or subtransaction since we're
iterating changes associated with a transaction that is either the top
transaction or a subtransaction. Otherwise, I think if some
subtransactions are not serialized, we will end up adding bytes
including those subtransactions during iterating other serialized
subtransactions. Whereas in ReorderBufferProcessTXN() we should use
txn->total_txn since txn is always the top transaction. I've attached
another patch to do this.

BTW, to check how many bytes of changes are passed to the decoder
plugin I wrote and attached a simple decoder plugin that calculates
the total amount of bytes for each change on the decoding plugin side.
I think what we expect is that the amounts of change bytes shown on
both sides are matched. You can build it in the same way as other
third-party modules and need to create decoder_stats extension.

The basic usage is to execute pg_logical_slot_get/peek_changes() and
mystats('slot_name') in the same process. During decoding the changes,
decoder_stats plugin accumulates the change bytes in the local memory
and mystats() SQL function, defined in decoder_stats extension, shows
those stats.

I've done some test with v4 patch. For instance, with the following
workload the output is expected:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;

mystats() functions shows:

=# select pg_logical_slot_get_changes('test_slot', null, null);
=# select change_type, change_bytes, total_bytes from mystats('test_slot');
change_type | change_bytes | total_bytes
-------------+--------------+-------------
INSERT | 2578 kB | 2578 kB
(1 row)

'change_bytes' and 'total_bytes' are the total amount of changes
calculated on the plugin side and core side, respectively. Those are
matched, which is expected. On the other hand, with the following
workload those are not matched:

BEGIN;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s1;
INSERT INTO t1 values(generate_series(1,10000));
SAVEPOINT s2;
INSERT INTO t1 values(generate_series(1,10000));
COMMIT;

=# select pg_logical_slot_get_changes('test_slot', null, null);
=# select change_type, change_bytes, total_bytes from mystats('test_slot');
change_type | change_bytes | total_bytes
-------------+--------------+-------------
INSERT | 3867 kB | 5451 kB
(1 row)

This is fixed by the attached v5 patch.

The changes look good to me, I don't have any comments.

Regards,
Vignesh

#166Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#163)
1 attachment(s)
Re: Replication slot stats misgivings

On Fri, Apr 30, 2021 at 5:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After more thought, it seems to me that we should use txn->size here
regardless of the top transaction or subtransaction since we're
iterating changes associated with a transaction that is either the top
transaction or a subtransaction. Otherwise, I think if some
subtransactions are not serialized, we will end up adding bytes
including those subtransactions during iterating other serialized
subtransactions. Whereas in ReorderBufferProcessTXN() we should use
txn->total_txn since txn is always the top transaction. I've attached
another patch to do this.

LGTM. I have slightly edited the comments in the attached. I'll push
this early next week unless there are more comments.

--
With Regards,
Amit Kapila.

Attachments:

use_total_size_v6.patchapplication/octet-stream; name=use_total_size_v6.patchDownload
From 409d1f318a5c65785e81d9649955cdbab751766e Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Fri, 30 Apr 2021 10:12:39 +0530
Subject: [PATCH v6] Fix the computation of slot stats for 'total_bytes'.

Previously, we were using the size of all the changes present in
ReorderBuffer to compute total_bytes after decoding a transaction and that
can lead to counting some of the transactions' changes more than once. Fix
it by using the size of the changes decoded for a transaction to compute
'total_bytes'.

Author: Sawada Masahiko
Reviewed-by: Vignesh C, Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
---
 .../replication/logical/reorderbuffer.c       | 39 +++++++++----------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c27f7100534..7caf08f39d4 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1366,10 +1366,11 @@ ReorderBufferIterTXNNext(ReorderBuffer *rb, ReorderBufferIterTXNState *state)
 		dlist_push_tail(&state->old_change, &change->node);
 
 		/*
-		 * Update the total bytes processed before releasing the current set
-		 * of changes and restoring the new set of changes.
+		 * Update the total bytes processed by the txn for which we are
+		 * releasing the current set of changes and restoring the new set of
+		 * changes.
 		 */
-		rb->totalBytes += rb->size;
+		rb->totalBytes += entry->txn->size;
 		if (ReorderBufferRestoreChanges(rb, entry->txn, &entry->file,
 										&state->entries[off].segno))
 		{
@@ -2371,9 +2372,9 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		iterstate = NULL;
 
 		/*
-		 * Update total transaction count and total transaction bytes
-		 * processed. Ensure to not count the streamed transaction multiple
-		 * times.
+		 * Update total transaction count and total bytes processed by the
+		 * transaction and its subtransactions. Ensure to not count the
+		 * streamed transaction multiple times.
 		 *
 		 * Note that the statistics computation has to be done after
 		 * ReorderBufferIterTXNFinish as it releases the serialized change
@@ -2382,7 +2383,7 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		if (!rbtxn_is_streamed(txn))
 			rb->totalTxns++;
 
-		rb->totalBytes += rb->size;
+		rb->totalBytes += txn->total_size;
 
 		/*
 		 * Done with current changes, send the last message for this set of
@@ -3073,7 +3074,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 {
 	Size		sz;
 	ReorderBufferTXN *txn;
-	ReorderBufferTXN *toptxn = NULL;
+	ReorderBufferTXN *toptxn;
 
 	Assert(change->txn);
 
@@ -3087,14 +3088,14 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 
 	txn = change->txn;
 
-	/* If streaming supported, update the total size in top level as well. */
-	if (ReorderBufferCanStream(rb))
-	{
-		if (txn->toptxn != NULL)
-			toptxn = txn->toptxn;
-		else
-			toptxn = txn;
-	}
+	/*
+	 * Update the total size in top level as well. This is later used to
+	 * compute the decoding stats.
+	 */
+	if (txn->toptxn != NULL)
+		toptxn = txn->toptxn;
+	else
+		toptxn = txn;
 
 	sz = ReorderBufferChangeSize(change);
 
@@ -3104,8 +3105,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size += sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size += sz;
+		toptxn->total_size += sz;
 	}
 	else
 	{
@@ -3114,8 +3114,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 		rb->size -= sz;
 
 		/* Update the total size in the top transaction. */
-		if (toptxn)
-			toptxn->total_size -= sz;
+		toptxn->total_size -= sz;
 	}
 
 	Assert(txn->size <= rb->size);
-- 
2.28.0.windows.1

#167Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#157)
Re: Replication slot stats misgivings

On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

Sawada-San, any thoughts on this point? Apart from this, I think you
have suggested somewhere in this thread to slightly update the
description of stream_bytes. I would like to update the description of
stream_bytes and total_bytes as below:

stream_bytes
Amount of transaction data decoded for streaming in-progress
transactions to the decoding output plugin while decoding changes from
WAL for this slot. This and other streaming counters for this slot can
be used to tune logical_decoding_work_mem.

total_bytes
Amount of transaction data decoded for sending transactions to the
decoding output plugin while decoding changes from WAL for this slot.
Note that this includes data that is streamed and/or spilled.

This update considers two points:
a. we don't send this data across the network because plugin might
decide to filter this data, ex. based on publications.
b. not all of the decoded changes are sent to plugin, consider
REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, etc.

--
With Regards,
Amit Kapila.

#168Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#166)
Re: Replication slot stats misgivings

On Fri, Apr 30, 2021 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

LGTM. I have slightly edited the comments in the attached. I'll push
this early next week unless there are more comments.

Pushed.

--
With Regards,
Amit Kapila.

#169Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#167)
Re: Replication slot stats misgivings

On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

Sawada-San, any thoughts on this point?

before_shmem_exit is mostly used to the cleanup purpose but how about
on_shmem_exit()? pgstats relies on that to send stats at the
interruption. See pgstat_shutdown_hook().

That being said, I agree Vignesh' patch would cover most cases. If we
don't find any better solution, I think we can go with Vignesh's
patch.

Apart from this, I think you
have suggested somewhere in this thread to slightly update the
description of stream_bytes. I would like to update the description of
stream_bytes and total_bytes as below:

stream_bytes
Amount of transaction data decoded for streaming in-progress
transactions to the decoding output plugin while decoding changes from
WAL for this slot. This and other streaming counters for this slot can
be used to tune logical_decoding_work_mem.

total_bytes
Amount of transaction data decoded for sending transactions to the
decoding output plugin while decoding changes from WAL for this slot.
Note that this includes data that is streamed and/or spilled.

This update considers two points:
a. we don't send this data across the network because plugin might
decide to filter this data, ex. based on publications.
b. not all of the decoded changes are sent to plugin, consider
REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, etc.

Looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#170Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#168)
Re: Replication slot stats misgivings

On Mon, May 3, 2021 at 2:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Apr 30, 2021 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

LGTM. I have slightly edited the comments in the attached. I'll push
this early next week unless there are more comments.

Pushed.

Thank you!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#171Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#169)
Re: Replication slot stats misgivings

On Mon, May 3, 2021 at 5:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

Sawada-San, any thoughts on this point?

before_shmem_exit is mostly used to the cleanup purpose but how about
on_shmem_exit()? pgstats relies on that to send stats at the
interruption. See pgstat_shutdown_hook().

Yeah, that is worth trying. Would you like to give it a try? I think
it still might not cover the cases where we error out in the backend
while decoding via APIs because at that time we won't exit, maybe for
that we can consider Vignesh's patch.

--
With Regards,
Amit Kapila.

#172Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#171)
1 attachment(s)
Re: Replication slot stats misgivings

On Mon, May 3, 2021 at 10:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 3, 2021 at 5:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

Sawada-San, any thoughts on this point?

before_shmem_exit is mostly used to the cleanup purpose but how about
on_shmem_exit()? pgstats relies on that to send stats at the
interruption. See pgstat_shutdown_hook().

Yeah, that is worth trying. Would you like to give it a try?

Yes.

In this approach, I think we will need to have a static pointer in
logical.c pointing to LogicalDecodingContext that we’re using. At
StartupDecodingContext(), we set the pointer to the just created
LogicalDecodingContext and register the callback so that we can refer
to the LogicalDecodingContext on that callback. And at
FreeDecodingContext(), we reset the pointer to NULL (however, since
FreeDecodingContext() is not called when an error happens we would
need to ensure resetting it somehow). But, after more thought, if we
have the static pointer in logical.c it would rather be better to have
a global function that sends slot stats based on the
LogicalDecodingContext pointed by the static pointer and can be called
by ReplicationSlotRelease(). That way, we don’t need to worry about
erroring out cases as well as interruption cases, although we need to
have a new static pointer.

I've attached a quick-hacked patch. I also incorporated the change
that calls UpdateDecodingStats() at FreeDecodingContext() so that we
can send slot stats also in the case where we spilled/streamed changes
but finished without commit/abort/prepare record.

I think
it still might not cover the cases where we error out in the backend
while decoding via APIs because at that time we won't exit, maybe for
that we can consider Vignesh's patch.

Agreed. It seems to me that the approach of the attached patch is
better than the approach using on_shmem_exit(). So if we want to avoid
having the new static pointer and function for this purpose we can
consider Vignesh’s patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

send_stats_at_release.patchapplication/x-patch; name=send_stats_at_release.patchDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 00543ede45..f32a2da565 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -51,6 +51,12 @@ typedef struct LogicalErrorCallbackState
 	XLogRecPtr	report_location;
 } LogicalErrorCallbackState;
 
+/*
+ * Pointing to the currently-used logical decoding context andu sed to sent
+ * slot statistics on releasing slots.
+ */
+static LogicalDecodingContext *MyLogicalDecodingContext = NULL;
+
 /* wrappers around output plugin callbacks */
 static void output_plugin_error_callback(void *arg);
 static void startup_cb_wrapper(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
@@ -290,6 +296,13 @@ StartupDecodingContext(List *output_plugin_options,
 
 	MemoryContextSwitchTo(old_context);
 
+	/*
+	 * Keep holding the decoding context until freeing the decoding context
+	 * or releasing the logical slot.
+	 */
+	Assert(MyLogicalDecodingContext == NULL);
+	MyLogicalDecodingContext = ctx;
+
 	return ctx;
 }
 
@@ -626,10 +639,12 @@ FreeDecodingContext(LogicalDecodingContext *ctx)
 	if (ctx->callbacks.shutdown_cb != NULL)
 		shutdown_cb_wrapper(ctx);
 
+	UpdateDecodingStats(ctx);
 	ReorderBufferFree(ctx->reorder);
 	FreeSnapshotBuilder(ctx->snapshot_builder);
 	XLogReaderFree(ctx->reader);
 	MemoryContextDelete(ctx->context);
+	MyLogicalDecodingContext = NULL;
 }
 
 /*
@@ -1811,3 +1826,17 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->totalTxns = 0;
 	rb->totalBytes = 0;
 }
+
+/*
+ * The function called at releasing a logical replication slot, sending the
+ * remaining slot statistics.
+ */
+void
+DecodingContextAtSlotRelease(void)
+{
+	if (MyLogicalDecodingContext)
+	{
+		UpdateDecodingStats(MyLogicalDecodingContext);
+		MyLogicalDecodingContext = NULL;
+	}
+}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index cf261e200e..4e9b45c84b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -45,6 +45,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "replication/slot.h"
+#include "replication/logical.h"
 #include "storage/fd.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
@@ -496,6 +497,9 @@ ReplicationSlotRelease(void)
 
 	Assert(slot != NULL && slot->active_pid != 0);
 
+	if (SlotIsLogical(slot))
+		DecodingContextAtSlotRelease();
+
 	if (slot->data.persistency == RS_EPHEMERAL)
 	{
 		/*
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index 7dfcb7be18..c156045020 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -135,5 +135,6 @@ extern bool filter_prepare_cb_wrapper(LogicalDecodingContext *ctx,
 extern bool filter_by_origin_cb_wrapper(LogicalDecodingContext *ctx, RepOriginId origin_id);
 extern void ResetLogicalStreamingState(void);
 extern void UpdateDecodingStats(LogicalDecodingContext *ctx);
+extern void DecodingContextAtSlotRelease(void);
 
 #endif
#173vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#172)
Re: Replication slot stats misgivings

On Tue, May 4, 2021 at 9:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 10:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 3, 2021 at 5:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

Sawada-San, any thoughts on this point?

before_shmem_exit is mostly used to the cleanup purpose but how about
on_shmem_exit()? pgstats relies on that to send stats at the
interruption. See pgstat_shutdown_hook().

Yeah, that is worth trying. Would you like to give it a try?

Yes.

In this approach, I think we will need to have a static pointer in
logical.c pointing to LogicalDecodingContext that we’re using. At
StartupDecodingContext(), we set the pointer to the just created
LogicalDecodingContext and register the callback so that we can refer
to the LogicalDecodingContext on that callback. And at
FreeDecodingContext(), we reset the pointer to NULL (however, since
FreeDecodingContext() is not called when an error happens we would
need to ensure resetting it somehow). But, after more thought, if we
have the static pointer in logical.c it would rather be better to have
a global function that sends slot stats based on the
LogicalDecodingContext pointed by the static pointer and can be called
by ReplicationSlotRelease(). That way, we don’t need to worry about
erroring out cases as well as interruption cases, although we need to
have a new static pointer.

I've attached a quick-hacked patch. I also incorporated the change
that calls UpdateDecodingStats() at FreeDecodingContext() so that we
can send slot stats also in the case where we spilled/streamed changes
but finished without commit/abort/prepare record.

I think
it still might not cover the cases where we error out in the backend
while decoding via APIs because at that time we won't exit, maybe for
that we can consider Vignesh's patch.

Agreed. It seems to me that the approach of the attached patch is
better than the approach using on_shmem_exit(). So if we want to avoid
having the new static pointer and function for this purpose we can
consider Vignesh’s patch.

I'm ok with using either my patch or Sawada san's patch, Even I had
the same thought of whether we should have a static variable thought
pointed out by Sawada san. Apart from that I had one minor comment:
This comment needs to be corrected "andu sed to sent"
+/*
+ * Pointing to the currently-used logical decoding context andu sed to sent
+ * slot statistics on releasing slots.
+ */
+static LogicalDecodingContext *MyLogicalDecodingContext = NULL;
+

Regards,
Vignesh

#174Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#173)
Re: Replication slot stats misgivings

On Tue, May 4, 2021 at 2:34 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, May 4, 2021 at 9:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 10:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 3, 2021 at 5:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 29, 2021 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 28, 2021 at 7:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 28, 2021 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am not sure if any of these alternatives are a good idea. What do
you think? Do you have any other ideas for this?

I've been considering some ideas but don't come up with a good one
yet. It’s just an idea and not tested but how about having
CreateDecodingContext() register before_shmem_exit() callback with the
decoding context to ensure that we send slot stats even on
interruption. And FreeDecodingContext() cancels the callback.

Is it a good idea to send stats while exiting and rely on the same? I
think before_shmem_exit is mostly used for the cleanup purpose so not
sure if we can rely on it for this purpose. I think we can't be sure
that in all cases we will send all stats, so maybe Vignesh's patch is
sufficient to cover the cases where we avoid losing it in cases where
we would have sent a large amount of data.

Sawada-San, any thoughts on this point?

before_shmem_exit is mostly used to the cleanup purpose but how about
on_shmem_exit()? pgstats relies on that to send stats at the
interruption. See pgstat_shutdown_hook().

Yeah, that is worth trying. Would you like to give it a try?

Yes.

In this approach, I think we will need to have a static pointer in
logical.c pointing to LogicalDecodingContext that we’re using. At
StartupDecodingContext(), we set the pointer to the just created
LogicalDecodingContext and register the callback so that we can refer
to the LogicalDecodingContext on that callback. And at
FreeDecodingContext(), we reset the pointer to NULL (however, since
FreeDecodingContext() is not called when an error happens we would
need to ensure resetting it somehow). But, after more thought, if we
have the static pointer in logical.c it would rather be better to have
a global function that sends slot stats based on the
LogicalDecodingContext pointed by the static pointer and can be called
by ReplicationSlotRelease(). That way, we don’t need to worry about
erroring out cases as well as interruption cases, although we need to
have a new static pointer.

I've attached a quick-hacked patch. I also incorporated the change
that calls UpdateDecodingStats() at FreeDecodingContext() so that we
can send slot stats also in the case where we spilled/streamed changes
but finished without commit/abort/prepare record.

I think
it still might not cover the cases where we error out in the backend
while decoding via APIs because at that time we won't exit, maybe for
that we can consider Vignesh's patch.

Agreed. It seems to me that the approach of the attached patch is
better than the approach using on_shmem_exit(). So if we want to avoid
having the new static pointer and function for this purpose we can
consider Vignesh’s patch.

I'm ok with using either my patch or Sawada san's patch, Even I had
the same thought of whether we should have a static variable thought
pointed out by Sawada san. Apart from that I had one minor comment:
This comment needs to be corrected "andu sed to sent"
+/*
+ * Pointing to the currently-used logical decoding context andu sed to sent
+ * slot statistics on releasing slots.
+ */
+static LogicalDecodingContext *MyLogicalDecodingContext = NULL;
+

Right, that needs to be fixed.

After more thought, I'm concerned that my patch's approach might be
invasive for PG14. Given that Vignesh’s patch would cover most cases,
I think we can live with a small downside that could miss some slot
stats. If we want to ensure sending slot stats at releasing slot, we
can develop it as an improvement. My patch would be better to get
reviewed by more peoples including the design during PG15 development.
Thoughts?

Regards,

---
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#175Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#169)
1 attachment(s)
Re: Replication slot stats misgivings

On Mon, May 3, 2021 at 9:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 3, 2021 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Apart from this, I think you
have suggested somewhere in this thread to slightly update the
description of stream_bytes. I would like to update the description of
stream_bytes and total_bytes as below:

stream_bytes
Amount of transaction data decoded for streaming in-progress
transactions to the decoding output plugin while decoding changes from
WAL for this slot. This and other streaming counters for this slot can
be used to tune logical_decoding_work_mem.

total_bytes
Amount of transaction data decoded for sending transactions to the
decoding output plugin while decoding changes from WAL for this slot.
Note that this includes data that is streamed and/or spilled.

This update considers two points:
a. we don't send this data across the network because plugin might
decide to filter this data, ex. based on publications.
b. not all of the decoded changes are sent to plugin, consider
REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID,
REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, etc.

Looks good to me.

Attached the doc update patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

fix_doc.patchapplication/octet-stream; name=fix_doc.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 886e626be8..370cdc2e1a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2708,10 +2708,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         <structfield>stream_bytes</structfield><type>bigint</type>
        </para>
        <para>
-        Amount of decoded in-progress transaction data streamed to the decoding
-        output plugin while decoding changes from WAL for this slot. This and other
-        streaming counters for this slot can be used to gauge the network I/O which
-        occurred during logical decoding and allow tuning <literal>logical_decoding_work_mem</literal>.
+        Amount of transaction data decoded for streaming in-progress
+        transactions to the decoding output plugin while decoding changes from
+        WAL for this slot. This and other streaming counters for this slot can
+        be used to tune <literal>logical_decoding_work_mem</literal>.
        </para>
       </entry>
      </row>
@@ -2733,10 +2733,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         <structfield>total_bytes</structfield><type>bigint</type>
        </para>
        <para>
-        Amount of decoded transaction data sent to the decoding output plugin
-        while decoding the changes from WAL for this slot. This can be used to
-        gauge the total amount of data sent during logical decoding. Note that
-        this includes data that is streamed and/or spilled.
+        Amount of transaction data decoded for sending transactions to the
+        decoding output plugin while decoding changes from WAL for this slot.
+        Note that this includes data that is streamed and/or spilled.
        </para>
       </entry>
      </row>
#176Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#174)
1 attachment(s)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 6:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After more thought, I'm concerned that my patch's approach might be
invasive for PG14. Given that Vignesh’s patch would cover most cases,

I am not sure if your patch is too invasive but OTOH I am also
convinced that Vignesh's patch covers most cases and is much simpler
so we can go ahead with that. In the attached, I have combined
Vignesh's patch and your doc fix patch. Additionally, I have changed
some comments and some other cosmetic stuff. Let me know what you
think of the attached?

--
With Regards,
Amit Kapila.

Attachments:

v5-0001-Update-replication-statistics-after-every-stream-.patchapplication/octet-stream; name=v5-0001-Update-replication-statistics-after-every-stream-.patchDownload
From 26921119425fde02b29079e5ce749e8d58aa5802 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 6 May 2021 09:23:24 +0530
Subject: [PATCH v5] Update replication statistics after every stream/spill.

Currently, replication slot statistics are updated at prepare, commit, and
rollback. Now, if the transaction is interrupted the stats might not get
updated. Fixed this by updating replication statistics after every
stream/spill.

In passing update the docs to change the description of some of the slot
stats.

Author: Vignesh C, Sawada Masahiko
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml                    | 15 +++++++--------
 src/backend/replication/logical/decode.c        | 14 ++++++++------
 src/backend/replication/logical/reorderbuffer.c |  6 ++++++
 src/include/replication/reorderbuffer.h         |  4 ++--
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 886e626be80..370cdc2e1a1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2708,10 +2708,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         <structfield>stream_bytes</structfield><type>bigint</type>
        </para>
        <para>
-        Amount of decoded in-progress transaction data streamed to the decoding
-        output plugin while decoding changes from WAL for this slot. This and other
-        streaming counters for this slot can be used to gauge the network I/O which
-        occurred during logical decoding and allow tuning <literal>logical_decoding_work_mem</literal>.
+        Amount of transaction data decoded for streaming in-progress
+        transactions to the decoding output plugin while decoding changes from
+        WAL for this slot. This and other streaming counters for this slot can
+        be used to tune <literal>logical_decoding_work_mem</literal>.
        </para>
       </entry>
      </row>
@@ -2733,10 +2733,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         <structfield>total_bytes</structfield><type>bigint</type>
        </para>
        <para>
-        Amount of decoded transaction data sent to the decoding output plugin
-        while decoding the changes from WAL for this slot. This can be used to
-        gauge the total amount of data sent during logical decoding. Note that
-        this includes data that is streamed and/or spilled.
+        Amount of transaction data decoded for sending transactions to the
+        decoding output plugin while decoding changes from WAL for this slot.
+        Note that this includes data that is streamed and/or spilled.
        </para>
       </entry>
      </row>
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 7924581cdcd..888e064ec0f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -746,9 +746,10 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	}
 
 	/*
-	 * Update the decoding stats at transaction prepare/commit/abort. It is
-	 * not clear that sending more or less frequently than this would be
-	 * better.
+	 * Update the decoding stats at transaction prepare/commit/abort.
+	 * Additionally we send the stats when we spill or stream the changes to
+	 * avoid losing them in case the decoding is interrupted. It is not clear
+	 * that sending more or less frequently than this would be better.
 	 */
 	UpdateDecodingStats(ctx);
 }
@@ -828,9 +829,10 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	ReorderBufferPrepare(ctx->reorder, xid, parsed->twophase_gid);
 
 	/*
-	 * Update the decoding stats at transaction prepare/commit/abort. It is
-	 * not clear that sending more or less frequently than this would be
-	 * better.
+	 * Update the decoding stats at transaction prepare/commit/abort.
+	 * Additionally we send the stats when we spill or stream the changes to
+	 * avoid losing them in case the decoding is interrupted. It is not clear
+	 * that sending more or less frequently than this would be better.
 	 */
 	UpdateDecodingStats(ctx);
 }
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c79425fbb73..e80a195472e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3559,6 +3559,9 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
+
+		/* update the decoding stats */
+		UpdateDecodingStats((LogicalDecodingContext *) rb->private_data);
 	}
 
 	Assert(spilled == txn->nentries_mem);
@@ -3928,6 +3931,9 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	/* Don't consider already streamed transaction. */
 	rb->streamTxns += (txn_is_streamed) ? 0 : 1;
 
+	/* update the decoding stats */
+	UpdateDecodingStats((LogicalDecodingContext *) rb->private_data);
+
 	Assert(dlist_is_empty(&txn->changes));
 	Assert(txn->nentries == 0);
 	Assert(txn->nentries_mem == 0);
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bfab8303ee7..53cdfa5d88f 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -617,14 +617,14 @@ struct ReorderBuffer
 	/* Statistics about transactions streamed to the decoding output plugin */
 	int64		streamTxns;		/* number of transactions streamed */
 	int64		streamCount;	/* streaming invocation counter */
-	int64		streamBytes;	/* amount of data streamed */
+	int64		streamBytes;	/* amount of data decoded */
 
 	/*
 	 * Statistics about all the transactions sent to the decoding output
 	 * plugin
 	 */
 	int64		totalTxns;		/* total number of transactions sent */
-	int64		totalBytes;		/* total amount of data sent */
+	int64		totalBytes;		/* total amount of data decoded */
 };
 
 
-- 
2.28.0.windows.1

#177Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#176)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 1:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 6, 2021 at 6:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After more thought, I'm concerned that my patch's approach might be
invasive for PG14. Given that Vignesh’s patch would cover most cases,

I am not sure if your patch is too invasive but OTOH I am also
convinced that Vignesh's patch covers most cases and is much simpler
so we can go ahead with that.

I think that my patch affects also other codes including logical
decoding and decoding context. We will need to write code while
worrying about MyLogicalDecodingContext.

In the attached, I have combined
Vignesh's patch and your doc fix patch. Additionally, I have changed
some comments and some other cosmetic stuff. Let me know what you
think of the attached?

Thank you for updating the patch. The patch looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#178vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#176)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 6, 2021 at 6:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

After more thought, I'm concerned that my patch's approach might be
invasive for PG14. Given that Vignesh’s patch would cover most cases,

I am not sure if your patch is too invasive but OTOH I am also
convinced that Vignesh's patch covers most cases and is much simpler
so we can go ahead with that. In the attached, I have combined
Vignesh's patch and your doc fix patch. Additionally, I have changed
some comments and some other cosmetic stuff. Let me know what you
think of the attached?

The updated patch looks good to me.

Regards,
Vignesh

#179Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#177)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 6, 2021 at 1:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

In the attached, I have combined
Vignesh's patch and your doc fix patch. Additionally, I have changed
some comments and some other cosmetic stuff. Let me know what you
think of the attached?

Thank you for updating the patch. The patch looks good to me.

Pushed!

--
With Regards,
Amit Kapila.

#180Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#179)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 6, 2021 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 6, 2021 at 1:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

In the attached, I have combined
Vignesh's patch and your doc fix patch. Additionally, I have changed
some comments and some other cosmetic stuff. Let me know what you
think of the attached?

Thank you for updating the patch. The patch looks good to me.

Pushed!

Thanks!

All issues pointed out in this thread are resolved and we can remove
this item from the open items?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#181Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#180)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

All issues pointed out in this thread are resolved and we can remove
this item from the open items?

I think so. Do you think we should reply to Andres's original email
stating the commits that fixed the individual review comments to avoid
any confusion later?

--
With Regards,
Amit Kapila.

#182vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#180)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 6, 2021 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 6, 2021 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 6, 2021 at 1:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

In the attached, I have combined
Vignesh's patch and your doc fix patch. Additionally, I have changed
some comments and some other cosmetic stuff. Let me know what you
think of the attached?

Thank you for updating the patch. The patch looks good to me.

Pushed!

Thanks for committing.

All issues pointed out in this thread are resolved and we can remove
this item from the open items?

I felt all the comments listed have been addressed.

Regards,
Vignesh

#183Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#181)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 6, 2021 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

All issues pointed out in this thread are resolved and we can remove
this item from the open items?

I think so. Do you think we should reply to Andres's original email
stating the commits that fixed the individual review comments to avoid
any confusion later?

Good idea. That's also helpful for confirming that all comments are
addressed. Would you like to gather those commits? or shall I?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#184Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#183)
Re: Replication slot stats misgivings

On Thu, May 6, 2021 at 2:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 6, 2021 at 5:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 6, 2021 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

All issues pointed out in this thread are resolved and we can remove
this item from the open items?

I think so. Do you think we should reply to Andres's original email
stating the commits that fixed the individual review comments to avoid
any confusion later?

Good idea. That's also helpful for confirming that all comments are
addressed. Would you like to gather those commits? or shall I?

I am fine either way. I will do it tomorrow unless you have responded
before that.

--
With Regards,
Amit Kapila.

#185Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Andres Freund (#1)
Re: Replication slot stats misgivings

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

I started to write this as a reply to
/messages/by-id/20210318015105.dcfa4ceybdjubf2i@alap3.anarazel.de
but I think it doesn't really fit under that header anymore.

On 2021-03-17 18:51:05 -0700, Andres Freund wrote:

It does make it easier for the shared memory stats patch, because if
there's a fixed number + location, the relevant stats reporting doesn't
need to go through a hashtable with the associated locking. I guess
that may have colored my perception that it's better to just have a
statically sized memory allocation for this. Noteworthy that SLRU stats
are done in a fixed size allocation as well...

Through a long discussion, all review comments pointed out here have
been addressed. I summarized that individual review comments are fixed
by which commit to avoid any confusion later.

As part of reviewing the replication slot stats patch I looked at
replication slot stats a fair bit, and I've a few misgivings. First,
about the pgstat.c side of things:

- If somehow slot stat drop messages got lost (remember pgstat
communication is lossy!), we'll just stop maintaining stats for slots
created later, because there'll eventually be no space for keeping
stats for another slot.

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.

- We do a linear search through all replication slots whenever receiving
stats for a slot. Even though there'd be a perfectly good index to
just use all throughout - the slots index itself. It looks to me like
slots stat reports can be fairly frequent in some workloads, so that
doesn't seem great.

Fixed by 3fa17d37716.

- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

- pgstat_report_replslot() already has a lot of stats parameters, it
seems likely that we'll get more. Seems like we should just use a
struct of stats updates.

Fixed by cca57c1d9bf.

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

Fixed by 592f00f8d.

- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

Fixed by f5fc2f5b23d.

I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a

I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.

We fixed the problem of "running out of replication slot stat slots"
by using HTAB to store slot stats, see 3fa17d37716. The slot stats
could be orphaned if a slot drop message gets lost. But we constantly
check and remove them in pgstat_vacuum_stat().

For the record, there are two known issues that are unlikely to happen
in practice or don't affect users much: (1) if the messages for
creation and drop slot of the same name get lost and create happens
before (auto)vacuum cleans up the dead slot, the stats will be
accumulated into the old slot, and (2) we could miss the total_txn and
total_bytes updates if logical decoding is interrupted.

For (1), there is an idea of having OIDs for each slot to avoid the
accumulation of stats but that doesn't seem worth doing as in practice
this won't happen frequently. For (2), there are some ideas of
reporting slot stats at releasing slot (by keeping stats in
ReplicationSlot or by using callback) but we decided to go with
reporting slot stats after every stream/spill. Because it covers most
cases in practice and is much simpler than other approaches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#186Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#185)
Re: Replication slot stats misgivings

On Fri, May 7, 2021 at 6:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

I started to write this as a reply to
/messages/by-id/20210318015105.dcfa4ceybdjubf2i@alap3.anarazel.de
but I think it doesn't really fit under that header anymore.

On 2021-03-17 18:51:05 -0700, Andres Freund wrote:

It does make it easier for the shared memory stats patch, because if
there's a fixed number + location, the relevant stats reporting doesn't
need to go through a hashtable with the associated locking. I guess
that may have colored my perception that it's better to just have a
statically sized memory allocation for this. Noteworthy that SLRU stats
are done in a fixed size allocation as well...

Through a long discussion, all review comments pointed out here have
been addressed. I summarized that individual review comments are fixed
by which commit to avoid any confusion later.

As part of reviewing the replication slot stats patch I looked at
replication slot stats a fair bit, and I've a few misgivings. First,
about the pgstat.c side of things:

- If somehow slot stat drop messages got lost (remember pgstat
communication is lossy!), we'll just stop maintaining stats for slots
created later, because there'll eventually be no space for keeping
stats for another slot.

- If max_replication_slots was lowered between a restart,
pgstat_read_statfile() will happily write beyond the end of
replSlotStats.

- pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
think pgstat.c has absolutely no business doing things on that level.

- We do a linear search through all replication slots whenever receiving
stats for a slot. Even though there'd be a perfectly good index to
just use all throughout - the slots index itself. It looks to me like
slots stat reports can be fairly frequent in some workloads, so that
doesn't seem great.

Fixed by 3fa17d37716.

- PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

- pgstat_report_replslot() already has a lot of stats parameters, it
seems likely that we'll get more. Seems like we should just use a
struct of stats updates.

Fixed by cca57c1d9bf.

And then more generally about the feature:
- If a slot was used to stream out a large amount of changes (say an
initial data load), but then replication is interrupted before the
transaction is committed/aborted, stream_bytes will not reflect the
many gigabytes of data we may have sent.

Fixed by 592f00f8d.

- I seems weird that we went to the trouble of inventing replication
slot stats, but then limit them to logical slots, and even there don't
record the obvious things like the total amount of data sent.

Fixed by f5fc2f5b23d.

I think the best way to address the more fundamental "pgstat related"
complaints is to change how replication slot stats are
"addressed". Instead of using the slots name, report stats using the
index in ReplicationSlotCtl->replication_slots.

That removes the risk of running out of "replication slot stat slots":
If we loose a drop message, the index eventually will be reused and we
likely can detect that the stats were for a different slot by comparing
the slot name.

It also makes it easy to handle the issue of max_replication_slots being
lowered and there still being stats for a slot - we simply can skip
restoring that slots data, because we know the relevant slot can't exist
anymore. And we can make the initial pgstat_report_replslot() during
slot creation use a

I'm wondering if we should just remove the slot name entirely from the
pgstat.c side of things, and have pg_stat_get_replication_slots()
inquire about slots by index as well and get the list of slots to report
stats for from slot.c infrastructure.

We fixed the problem of "running out of replication slot stat slots"
by using HTAB to store slot stats, see 3fa17d37716. The slot stats
could be orphaned if a slot drop message gets lost. But we constantly
check and remove them in pgstat_vacuum_stat().

Thanks for the summarization. I don't find anything that is left
unaddressed. I think we can wait for a day or two to see if Andres or
anyone else sees anything that is left unaddressed and then we can
close the open item.

--
With Regards,
Amit Kapila.

#187Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#186)
Re: Replication slot stats misgivings

On Fri, May 7, 2021 at 8:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for the summarization. I don't find anything that is left
unaddressed. I think we can wait for a day or two to see if Andres or
anyone else sees anything that is left unaddressed and then we can
close the open item.

I have closed this open item.

--
With Regards,
Amit Kapila.

#188Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#187)
Re: Replication slot stats misgivings

Amit Kapila <amit.kapila16@gmail.com> writes:

I have closed this open item.

That seems a little premature, considering that the
contrib/test_decoding/sql/stats.sql test case is still failing regularly.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&amp;dt=2021-05-11%2019%3A14%3A53

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2021-05-07%2010%3A20%3A21

regards, tom lane

#189Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#188)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 6:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

I have closed this open item.

That seems a little premature, considering that the
contrib/test_decoding/sql/stats.sql test case is still failing regularly.

Thank you for reporting.

Ugh, since by commit 592f00f8de we send slot stats every after
spil/stream it’s possible that we report slot stats that have non-zero
counters for spill_bytes/txns and zeroes for total_bytes/txns. It
seems to me it’s legitimate that the slot stats view shows non-zero
values for spill_bytes/txns and zero values for total_bytes/txns
during decoding a large transaction. So I think we can fix the test
script so that it checks only spill_bytes/txns when checking spilled
transactions.

For the record, during streaming transactions, IIUC this kind of thing
doesn’t happen since we update both total_bytes/txns and
stream_bytes/txns before reporting slot stats.

I've attached a patch to fix it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

fix_stats.patchapplication/x-patch; name=fix_stats.patchDownload
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 7d174ee8d2..19bbd76c10 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -111,10 +111,10 @@ SELECT wait_for_decode_stats(false, true);
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
------------------------+------------+-------------+------------+-------------
- regression_slot_stats | t          | t           | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+       slot_name       | spill_txns | spill_count 
+-----------------------+------------+-------------
+ regression_slot_stats | t          | t
 (1 row)
 
 -- reset the slot stats, and wait for stats collector to reset
@@ -149,10 +149,10 @@ SELECT wait_for_decode_stats(false, true);
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
------------------------+------------+-------------+------------+-------------
- regression_slot_stats | t          | t           | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+       slot_name       | spill_txns | spill_count 
+-----------------------+------------+-------------
+ regression_slot_stats | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 263568b00c..93b6103034 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -73,7 +73,7 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL,
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
 SELECT wait_for_decode_stats(false, true);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
 
 -- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot_stats');
@@ -83,7 +83,7 @@ SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_
 -- decode and check stats again.
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
 SELECT wait_for_decode_stats(false, true);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
#190Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#189)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 4:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 12, 2021 at 6:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

I have closed this open item.

That seems a little premature, considering that the
contrib/test_decoding/sql/stats.sql test case is still failing regularly.

Thank you for reporting.

Ugh, since by commit 592f00f8de we send slot stats every after
spil/stream it’s possible that we report slot stats that have non-zero
counters for spill_bytes/txns and zeroes for total_bytes/txns. It
seems to me it’s legitimate that the slot stats view shows non-zero
values for spill_bytes/txns and zero values for total_bytes/txns
during decoding a large transaction. So I think we can fix the test
script so that it checks only spill_bytes/txns when checking spilled
transactions.

Your analysis and fix look correct to me. I'll test and push your
patch if I don't see any problem with it.

For the record, during streaming transactions, IIUC this kind of thing
doesn’t happen since we update both total_bytes/txns and
stream_bytes/txns before reporting slot stats.

Right, because during streaming, we send the data to the decoding plugin.

--
With Regards,
Amit Kapila.

#191Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#190)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 7:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 12, 2021 at 4:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Ugh, since by commit 592f00f8de we send slot stats every after
spil/stream it’s possible that we report slot stats that have non-zero
counters for spill_bytes/txns and zeroes for total_bytes/txns. It
seems to me it’s legitimate that the slot stats view shows non-zero
values for spill_bytes/txns and zero values for total_bytes/txns
during decoding a large transaction. So I think we can fix the test
script so that it checks only spill_bytes/txns when checking spilled
transactions.

Your analysis and fix look correct to me.

I think the part of the test that tests the stats after resetting it
might give different results. This can happen because in the previous
test we spill multiple times (spill_count is 12 in my testing) and it
is possible that some of the spill stats messages is received by stats
collector after the reset message. If this theory is correct then it
better that we remove the test for reset stats and the test after it
"decode and check stats again.". This is not directly related to your
patch or buildfarm failure but I guess this can happen and we might
see such a failure in future.

--
With Regards,
Amit Kapila.

#192vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#191)
1 attachment(s)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 9:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 12, 2021 at 7:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 12, 2021 at 4:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Ugh, since by commit 592f00f8de we send slot stats every after
spil/stream it’s possible that we report slot stats that have non-zero
counters for spill_bytes/txns and zeroes for total_bytes/txns. It
seems to me it’s legitimate that the slot stats view shows non-zero
values for spill_bytes/txns and zero values for total_bytes/txns
during decoding a large transaction. So I think we can fix the test
script so that it checks only spill_bytes/txns when checking spilled
transactions.

Your analysis and fix look correct to me.

I think the part of the test that tests the stats after resetting it
might give different results. This can happen because in the previous
test we spill multiple times (spill_count is 12 in my testing) and it
is possible that some of the spill stats messages is received by stats
collector after the reset message. If this theory is correct then it
better that we remove the test for reset stats and the test after it
"decode and check stats again.". This is not directly related to your
patch or buildfarm failure but I guess this can happen and we might
see such a failure in future.

I agree with your analysis to remove that test. Attached patch has the
changes for the same.

Regards,
Vignesh

Attachments:

Replication_slot_stats_test_fix.patchtext/x-patch; charset=US-ASCII; name=Replication_slot_stats_test_fix.patchDownload
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 7d174ee8d2..206c0a126e 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -111,48 +111,10 @@ SELECT wait_for_decode_stats(false, true);
  
 (1 row)
 
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
------------------------+------------+-------------+------------+-------------
- regression_slot_stats | t          | t           | t          | t
-(1 row)
-
--- reset the slot stats, and wait for stats collector to reset
-SELECT pg_stat_reset_replication_slot('regression_slot_stats');
- pg_stat_reset_replication_slot 
---------------------------------
- 
-(1 row)
-
-SELECT wait_for_decode_stats(true, true);
- wait_for_decode_stats 
------------------------
- 
-(1 row)
-
-SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
-       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
------------------------+------------+-------------+------------+-------------
- regression_slot_stats |          0 |           0 |          0 |           0
-(1 row)
-
--- decode and check stats again.
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
- count 
--------
-  5002
-(1 row)
-
-SELECT wait_for_decode_stats(false, true);
- wait_for_decode_stats 
------------------------
- 
-(1 row)
-
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-       slot_name       | spill_txns | spill_count | total_txns | total_bytes 
------------------------+------------+-------------+------------+-------------
- regression_slot_stats | t          | t           | t          | t
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
+       slot_name       | spill_txns | spill_count 
+-----------------------+------------+-------------
+ regression_slot_stats | t          | t
 (1 row)
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
diff --git a/contrib/test_decoding/sql/stats.sql b/contrib/test_decoding/sql/stats.sql
index 263568b00c..67462ca27f 100644
--- a/contrib/test_decoding/sql/stats.sql
+++ b/contrib/test_decoding/sql/stats.sql
@@ -73,17 +73,7 @@ SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL,
 -- exact stats count as that can vary if any background transaction (say by
 -- autovacuum) happens in parallel to the main transaction.
 SELECT wait_for_decode_stats(false, true);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
-
--- reset the slot stats, and wait for stats collector to reset
-SELECT pg_stat_reset_replication_slot('regression_slot_stats');
-SELECT wait_for_decode_stats(true, true);
-SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_replication_slots;
-
--- decode and check stats again.
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot_stats', NULL, NULL, 'skip-empty-xacts', '1');
-SELECT wait_for_decode_stats(false, true);
-SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
+SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count FROM pg_stat_replication_slots;
 
 -- Ensure stats can be repeatedly accessed using the same stats snapshot. See
 -- https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i%40alap3.anarazel.de
#193Tom Lane
tgl@sss.pgh.pa.us
In reply to: vignesh C (#192)
Re: Replication slot stats misgivings

vignesh C <vignesh21@gmail.com> writes:

I agree with your analysis to remove that test. Attached patch has the
changes for the same.

Is there any value in converting the test case into a TAP test that
could be more flexible about the expected output? I'm mainly wondering
whether there are any code paths that this test forces the server through,
which would otherwise lack coverage.

regards, tom lane

#194vignesh C
vignesh21@gmail.com
In reply to: Tom Lane (#193)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 9:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

vignesh C <vignesh21@gmail.com> writes:

I agree with your analysis to remove that test. Attached patch has the
changes for the same.

Is there any value in converting the test case into a TAP test that
could be more flexible about the expected output? I'm mainly wondering
whether there are any code paths that this test forces the server through,
which would otherwise lack coverage.

Removing this test does not reduce code coverage. This test is
basically to decode and check the stats again, it is kind of a
repetitive test. The problem with this test here is that when a
transaction is spilled, the statistics for the spill transaction will
be sent to the statistics collector as and when the transaction is
spilled. This test sends spill stats around 12 times. The test expects
to reset the stats and check the stats gets updated when we get the
changes. We cannot validate reset slot stats results here, as it could
be possible that in some machines the stats collector receives the
spilled transaction stats after getting reset slots. This same problem
will exist with tap tests too. So I feel it is better to remove this
test.

Regards,
Vignesh

#195Tom Lane
tgl@sss.pgh.pa.us
In reply to: vignesh C (#194)
Re: Replication slot stats misgivings

vignesh C <vignesh21@gmail.com> writes:

On Wed, May 12, 2021 at 9:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is there any value in converting the test case into a TAP test that
could be more flexible about the expected output? I'm mainly wondering
whether there are any code paths that this test forces the server through,
which would otherwise lack coverage.

Removing this test does not reduce code coverage. This test is
basically to decode and check the stats again, it is kind of a
repetitive test. The problem with this test here is that when a
transaction is spilled, the statistics for the spill transaction will
be sent to the statistics collector as and when the transaction is
spilled. This test sends spill stats around 12 times. The test expects
to reset the stats and check the stats gets updated when we get the
changes. We cannot validate reset slot stats results here, as it could
be possible that in some machines the stats collector receives the
spilled transaction stats after getting reset slots. This same problem
will exist with tap tests too. So I feel it is better to remove this
test.

OK, I'm satisfied as long as we've considered the code-coverage angle.

regards, tom lane

#196Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#192)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 1:19 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, May 12, 2021 at 9:08 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 12, 2021 at 7:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 12, 2021 at 4:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Ugh, since by commit 592f00f8de we send slot stats every after
spil/stream it’s possible that we report slot stats that have non-zero
counters for spill_bytes/txns and zeroes for total_bytes/txns. It
seems to me it’s legitimate that the slot stats view shows non-zero
values for spill_bytes/txns and zero values for total_bytes/txns
during decoding a large transaction. So I think we can fix the test
script so that it checks only spill_bytes/txns when checking spilled
transactions.

Your analysis and fix look correct to me.

I think the part of the test that tests the stats after resetting it
might give different results. This can happen because in the previous
test we spill multiple times (spill_count is 12 in my testing) and it
is possible that some of the spill stats messages is received by stats
collector after the reset message. If this theory is correct then it
better that we remove the test for reset stats and the test after it
"decode and check stats again.". This is not directly related to your
patch or buildfarm failure but I guess this can happen and we might
see such a failure in future.

Good point. I agree to remove this test.

I agree with your analysis to remove that test. Attached patch has the
changes for the same.

Thank you for the patch. The patch looks good to me. I also agree that
removing the test doesn't reduce the test coverage.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#197Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#196)
Re: Replication slot stats misgivings

On Wed, May 12, 2021 at 4:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 12, 2021 at 1:19 PM vignesh C <vignesh21@gmail.com> wrote:

I think the part of the test that tests the stats after resetting it
might give different results. This can happen because in the previous
test we spill multiple times (spill_count is 12 in my testing) and it
is possible that some of the spill stats messages is received by stats
collector after the reset message. If this theory is correct then it
better that we remove the test for reset stats and the test after it
"decode and check stats again.". This is not directly related to your
patch or buildfarm failure but I guess this can happen and we might
see such a failure in future.

Good point. I agree to remove this test.

I agree with your analysis to remove that test. Attached patch has the
changes for the same.

Thank you for the patch. The patch looks good to me. I also agree that
removing the test doesn't reduce the test coverage.

Thanks, I have pushed the patch.

--
With Regards,
Amit Kapila.

#198vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#197)
Re: Replication slot stats misgivings

On Thu, May 13, 2021 at 11:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, May 12, 2021 at 4:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 12, 2021 at 1:19 PM vignesh C <vignesh21@gmail.com> wrote:

I think the part of the test that tests the stats after resetting it
might give different results. This can happen because in the previous
test we spill multiple times (spill_count is 12 in my testing) and it
is possible that some of the spill stats messages is received by stats
collector after the reset message. If this theory is correct then it
better that we remove the test for reset stats and the test after it
"decode and check stats again.". This is not directly related to your
patch or buildfarm failure but I guess this can happen and we might
see such a failure in future.

Good point. I agree to remove this test.

I agree with your analysis to remove that test. Attached patch has the
changes for the same.

Thank you for the patch. The patch looks good to me. I also agree that
removing the test doesn't reduce the test coverage.

Thanks, I have pushed the patch.

Thanks for pushing the patch.

Regards,
Vignesh

#199Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#198)
Re: Replication slot stats misgivings

On Thu, May 13, 2021 at 11:30 AM vignesh C <vignesh21@gmail.com> wrote:

Do we want to update the information about pg_stat_replication_slots
at the following place in docs
https://www.postgresql.org/docs/devel/logicaldecoding-catalogs.html?

If so, feel free to submit the patch for it?

--
With Regards,
Amit Kapila.

#200vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#199)
1 attachment(s)
Re: Replication slot stats misgivings

On Mon, May 24, 2021 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 13, 2021 at 11:30 AM vignesh C <vignesh21@gmail.com> wrote:

Do we want to update the information about pg_stat_replication_slots
at the following place in docs
https://www.postgresql.org/docs/devel/logicaldecoding-catalogs.html?

If so, feel free to submit the patch for it?

Adding it will be useful, the attached patch has the changes for the same.

Regards,
Vignesh

Attachments:

v1-0001-Added-pg_stat_replication_slots-view-information-.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Added-pg_stat_replication_slots-view-information-.patchDownload
From 133856438c7e027c392e640e380a644063fd53f8 Mon Sep 17 00:00:00 2001
From: vignesh <vignesh21@gmail.com>
Date: Mon, 24 May 2021 10:03:12 +0530
Subject: [PATCH v1] Added pg_stat_replication_slots view information in
 "System Catalogs Related to Logical Decoding" documentation.

pg_stat_replication_slots information was missing in "System Catalogs Related
to Logical Decoding" documentation. Fixed it by adding it.
---
 doc/src/sgml/logicaldecoding.sgml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index a7ec5c3577..87a37a5587 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -401,6 +401,8 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
     view provide information about the current state of replication slots and
     streaming replication connections respectively. These views apply to both physical and
     logical replication.
+    The <link linkend="monitoring-pg-stat-replication-slots-view"><structname>pg_stat_replication_slots</structname></link>
+    view provides statistics information about the logical replication slots.
    </para>
   </sect1>
 
-- 
2.25.1

#201Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#200)
Re: Replication slot stats misgivings

On Mon, May 24, 2021 at 10:09 AM vignesh C <vignesh21@gmail.com> wrote:

On Mon, May 24, 2021 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 13, 2021 at 11:30 AM vignesh C <vignesh21@gmail.com> wrote:

Do we want to update the information about pg_stat_replication_slots
at the following place in docs
https://www.postgresql.org/docs/devel/logicaldecoding-catalogs.html?

If so, feel free to submit the patch for it?

Adding it will be useful, the attached patch has the changes for the same.

Thanks for the patch, pushed.

--
With Regards,
Amit Kapila.