Report bytes and transactions actually sent downtream

Started by Ashutosh Bapat8 months ago77 messages
Jump to latest
#1Ashutosh Bapat
ashutosh.bapat@enterprisedb.com

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

I first thought of incrementing sent_bytes in OutputPluginWrite()
which is a central function for all logical replication message
writes. But that calls LogicalDecodingContext::write() which may
further add bytes to the message e.g. WalSndWriteData() and
LogicalOutputWrite(). So it's better to increment the counter in
implementations of LogicalDecodingContext::write(), so that we count
the exact number of bytes. These implementations are within core code
so they won't miss updating sent_bytes.

I think we should rename total_txns and total_bytes to reordered_txns
and reordered_bytes respectively, and also update the documentation
accordingly to make better sense of those numbers. But these patches
do not contain that change. If others feel the same way, I will
provide a patch with that change.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Report-data-sent-statistics-in-pg_stat_repl-20250630.patchtext/x-patch; charset=US-ASCII; name=0001-Report-data-sent-statistics-in-pg_stat_repl-20250630.patchDownload+126-47
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#1)
Re: Report bytes and transactions actually sent downtream

On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

What if some plugin didn't implemented it or does it incorrectly?
Users will then complain that PG view is showing incorrect value.
Shouldn't the plugin specific stats be shown differently, for example,
one may be interested in how much plugin has filtered the data because
it was not published or because something like row_filter caused it
skip sending such data?

--
With Regards,
Amit Kapila.

#3Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#2)
Re: Report bytes and transactions actually sent downtream

On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

What if some plugin didn't implemented it or does it incorrectly?
Users will then complain that PG view is showing incorrect value.

That is right.

To fix the problem of plugins not implementing the counter increment
logic we could use logic similar to how we track whether
OutputPluginPrepareWrite() has been called or not. In
ReorderBufferTxn, we add a new member sent_status which would be an
enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
= UNKNOWN. We provide a function called
plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
sent_status = SENT when sent = true and sent_status = NOT_SENT when
sent = false. In all the end transaction callback wrappers like
commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
tsent_status = UNKNOWN, we throw an error. If sent_status = SENT, we
increment sent_txns. That will catch any plugin which does not call
plugin_set_txn(). The plugin may still call plugin_sent_txn() with
sent = true when it should have called it with sent = false or vice
versa, but that's hard to monitor and control.

Additionally, we should highlight in the document that sent_txns is as
per report from the output plugin so that users know where to look
for in case they see a wrong/dubious value. I see this similar to what
we do with pg_stat_replication::reply_time which may be wrong if a
non-PG standby reports the wrong value. Documentation says "Send time
of last reply message received from standby server", so the users know
where to look for incase they spot the error.

Does that look good?

I am open to other suggestions.

Shouldn't the plugin specific stats be shown differently, for example,
one may be interested in how much plugin has filtered the data because
it was not published or because something like row_filter caused it
skip sending such data?

That looks useful, we could track the ReorderBufferChange's that were
not sent downstream and add their sizes to another counter
ReorderBuffer::filtered_bytes and report it in
pg_stat_replication_slots. I think we will need to devise a mechanism
similar to above by which the plugin tells core whether a change has
been filtered or not. However, that will not be a replacement for
sent_bytes, since filtered_bytes or total_bytes - filtered_bytes won't
tell us how much data was sent downstream, which is crucial to the
purpose stated in my earlier email.

--
Best Wishes,
Ashutosh Bapat

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#3)
Re: Report bytes and transactions actually sent downtream

On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

What if some plugin didn't implemented it or does it incorrectly?
Users will then complain that PG view is showing incorrect value.

That is right.

To fix the problem of plugins not implementing the counter increment
logic we could use logic similar to how we track whether
OutputPluginPrepareWrite() has been called or not. In
ReorderBufferTxn, we add a new member sent_status which would be an
enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
= UNKNOWN. We provide a function called
plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
sent_status = SENT when sent = true and sent_status = NOT_SENT when
sent = false. In all the end transaction callback wrappers like
commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
tsent_status = UNKNOWN, we throw an error.

I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc. I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.

--
With Regards,
Amit Kapila.

#5Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#4)
Re: Report bytes and transactions actually sent downtream

On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

What if some plugin didn't implemented it or does it incorrectly?
Users will then complain that PG view is showing incorrect value.

That is right.

To fix the problem of plugins not implementing the counter increment
logic we could use logic similar to how we track whether
OutputPluginPrepareWrite() has been called or not. In
ReorderBufferTxn, we add a new member sent_status which would be an
enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
= UNKNOWN. We provide a function called
plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
sent_status = SENT when sent = true and sent_status = NOT_SENT when
sent = false. In all the end transaction callback wrappers like
commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
tsent_status = UNKNOWN, we throw an error.

I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc.

Not making mandatory looks useful. I can try your suggestion. Rather
than having stats_available as a member of OutputPluginStats, it's
better to have a NULL value for the corresponding member in
LogicalDecodingContext. We don't want an output plugin to reset
stats_available once set. Will that work?

I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.

As you have previously pointed out, the view should make it explicit
that the new stats are maintained by the plugin and not core. I agree
with that intention. However, already have three views
pg_replication_slots (which has slot name and plugin name), then
pg_replication_stats which is about stats maintained by a WAL sender
or running replication and then pg_stat_replication_slots, which is
about accumulated statistics for a replication through a given
replication slot. It's already a bit hard to keep track of who's who
when debugging an issue. Adding one more view will add to confusion.

Instead of adding a new view how about
a. name the columns as plugin_sent_txns, plugin_sent_bytes,
plugin_filtered_change_bytes to make it clear that these columns are
maintained by plugin
b. report these NULL if stats_available = false OR OutputPluginStats
is not set in LogicalDecodingContext
c. Document that NULL value for these columns indicates that the
plugin is not maintaining/reporting these stats
d. adding plugin name to pg_stat_replication_slots, that will make it
easy for users to know which plugin they should look at in case of
dubious or unavailable stats

--
Best Wishes,
Ashutosh Bapat

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#5)
Re: Report bytes and transactions actually sent downtream

On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc.

Not making mandatory looks useful. I can try your suggestion. Rather
than having stats_available as a member of OutputPluginStats, it's
better to have a NULL value for the corresponding member in
LogicalDecodingContext. We don't want an output plugin to reset
stats_available once set. Will that work?

We can try that.

I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.

As you have previously pointed out, the view should make it explicit
that the new stats are maintained by the plugin and not core. I agree
with that intention. However, already have three views
pg_replication_slots (which has slot name and plugin name), then
pg_replication_stats which is about stats maintained by a WAL sender
or running replication and then pg_stat_replication_slots, which is
about accumulated statistics for a replication through a given
replication slot. It's already a bit hard to keep track of who's who
when debugging an issue. Adding one more view will add to confusion.

Instead of adding a new view how about
a. name the columns as plugin_sent_txns, plugin_sent_bytes,
plugin_filtered_change_bytes to make it clear that these columns are
maintained by plugin
b. report these NULL if stats_available = false OR OutputPluginStats
is not set in LogicalDecodingContext
c. Document that NULL value for these columns indicates that the
plugin is not maintaining/reporting these stats
d. adding plugin name to pg_stat_replication_slots, that will make it
easy for users to know which plugin they should look at in case of
dubious or unavailable stats

Sounds reasonable.

--
With Regards,
Amit Kapila.

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#6)
Re: Report bytes and transactions actually sent downtream

Hi Amit,

On Mon, Jul 14, 2025 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 14, 2025 at 10:55 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sun, Jul 13, 2025 at 4:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc.

Not making mandatory looks useful. I can try your suggestion. Rather
than having stats_available as a member of OutputPluginStats, it's
better to have a NULL value for the corresponding member in
LogicalDecodingContext. We don't want an output plugin to reset
stats_available once set. Will that work?

We can try that.

I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.

As you have previously pointed out, the view should make it explicit
that the new stats are maintained by the plugin and not core. I agree
with that intention. However, already have three views
pg_replication_slots (which has slot name and plugin name), then
pg_replication_stats which is about stats maintained by a WAL sender
or running replication and then pg_stat_replication_slots, which is
about accumulated statistics for a replication through a given
replication slot. It's already a bit hard to keep track of who's who
when debugging an issue. Adding one more view will add to confusion.

Instead of adding a new view how about
a. name the columns as plugin_sent_txns, plugin_sent_bytes,
plugin_filtered_change_bytes to make it clear that these columns are
maintained by plugin
b. report these NULL if stats_available = false OR OutputPluginStats
is not set in LogicalDecodingContext
c. Document that NULL value for these columns indicates that the
plugin is not maintaining/reporting these stats
d. adding plugin name to pg_stat_replication_slots, that will make it
easy for users to know which plugin they should look at in case of
dubious or unavailable stats

Sounds reasonable.

Here's the next patch which considers all the discussion so far. It
adds four fields to pg_stat_replication_slots.
- plugin - name of the output plugin
- plugin_filtered_bytes - reports the amount of changes filtered
out by the output plugin
- plugin_sent_txns - the amount of transactions sent downstream by
the output plugin
- plugin_sent_bytes - the amount of data sent downstream by the
outputplugin.

There are some points up for a discussion:
1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
calling pgstat_reset() or pgstat_reset_of_kind() which don't know
about the contents of the entry. So
PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
stats are reported as NULL, instead of zero, immediately after reset.
This is the same case when the stats is queried immediately after the
statistics is initialized and before any stats are reported. We could
instead make it report
zero, if we save the plugin_has_stats and restore it after reset. But
doing that in pgstat_reset_of_kind() seems like an extra overhead + we
will need to write a function to find all replication slot entries.
Resetting the stats would be a rare event in practice. Trying to
report 0 instead of NULL in that rare case doesn't seem to be worth
the efforts and code. Given that the core code doesn't know whether a
given plugin reports stats or not, I think this behaviour is
appropriate as long as we document it. Please let me know if the
documentation in the patch is clear enough.

2. There's also a bit of asymmetry in the way sent_bytes is handled.
The code which actually sends the logical changes to the downstream is
part of the core code
but the format of the change and hence the number of bytes sent is
decided by the plugin. It's a stat related to plugin but maintained by
the core code. The patch implements it as a plugin stat (so the
corresponding column has "plugin" prefix and is also reported as NULL
upon reset etc.), but we may want to reconsider how to report and
maintain it.

3. The names of new columns have the prefix "plugin_" but the internal
variables tracking those don't for the sake of brevity. If you prefer
to have the same prefix for the internal variables, I can change that.

I think I have covered all the cases where filteredbytes should be
incremented, but please let me know if I have missed any.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Report-output-plugin-statistics-in-pg_stat_-20250724.patchtext/x-patch; charset=US-ASCII; name=0001-Report-output-plugin-statistics-in-pg_stat_-20250724.patchDownload+275-62
#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#7)
Re: Report bytes and transactions actually sent downtream

Hi,

On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:

Here's the next patch which considers all the discussion so far. It
adds four fields to pg_stat_replication_slots.
- plugin - name of the output plugin

Is this one needed? (we could get it with a join on pg_replication_slots)

- plugin_filtered_bytes - reports the amount of changes filtered
out by the output plugin
- plugin_sent_txns - the amount of transactions sent downstream by
the output plugin
- plugin_sent_bytes - the amount of data sent downstream by the
outputplugin.

There are some points up for a discussion:
1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
calling pgstat_reset() or pgstat_reset_of_kind() which don't know
about the contents of the entry. So
PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
stats are reported as NULL, instead of zero, immediately after reset.
This is the same case when the stats is queried immediately after the
statistics is initialized and before any stats are reported. We could
instead make it report
zero, if we save the plugin_has_stats and restore it after reset. But
doing that in pgstat_reset_of_kind() seems like an extra overhead + we
will need to write a function to find all replication slot entries.

Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That
way it would not be reset. We would need to access ReplicationSlotPersistentData
in pg_stat_get_replication_slot though.

Also would that make sense to expose plugin_has_stats in pg_replication_slots?

2. There's also a bit of asymmetry in the way sent_bytes is handled.
The code which actually sends the logical changes to the downstream is
part of the core code
but the format of the change and hence the number of bytes sent is
decided by the plugin. It's a stat related to plugin but maintained by
the core code. The patch implements it as a plugin stat (so the
corresponding column has "plugin" prefix

The way it is done makes sense to me.

3. The names of new columns have the prefix "plugin_" but the internal
variables tracking those don't for the sake of brevity. If you prefer
to have the same prefix for the internal variables, I can change that.

Just my taste: I do prefer when they match.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#9shveta malik
shveta.malik@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: Report bytes and transactions actually sent downtream

On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:

Here's the next patch which considers all the discussion so far. It
adds four fields to pg_stat_replication_slots.
- plugin - name of the output plugin

Is this one needed? (we could get it with a join on pg_replication_slots)

In my opinion, when there are other plugin_* fields present, including
the plugin name directly here seems like a better approach. So, +1 for
the plugin field.

- plugin_filtered_bytes - reports the amount of changes filtered
out by the output plugin
- plugin_sent_txns - the amount of transactions sent downstream by
the output plugin
- plugin_sent_bytes - the amount of data sent downstream by the
outputplugin.

There are some points up for a discussion:
1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
calling pgstat_reset() or pgstat_reset_of_kind() which don't know
about the contents of the entry. So
PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
stats are reported as NULL, instead of zero, immediately after reset.
This is the same case when the stats is queried immediately after the
statistics is initialized and before any stats are reported. We could
instead make it report
zero, if we save the plugin_has_stats and restore it after reset. But
doing that in pgstat_reset_of_kind() seems like an extra overhead + we
will need to write a function to find all replication slot entries.

I tried to think of an approach where we can differentiate between the
cases 'not initialized' and 'reset' ones with the values. Say instead
of plugin_has_stats, if we have plugin_stats_status, then we can
maintain status like -1(not initialized), 0(reset). But this too will
complicate the code further. Personally, I’m okay with NULL values
appearing even after a reset, especially since the documentation
explains this clearly.

2. There's also a bit of asymmetry in the way sent_bytes is handled.
The code which actually sends the logical changes to the downstream is
part of the core code
but the format of the change and hence the number of bytes sent is
decided by the plugin. It's a stat related to plugin but maintained by
the core code. The patch implements it as a plugin stat (so the
corresponding column has "plugin" prefix

The way it is done makes sense to me.

3. The names of new columns have the prefix "plugin_" but the internal
variables tracking those don't for the sake of brevity. If you prefer
to have the same prefix for the internal variables, I can change that.

I am okay either way.

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
pg_stat_replication_slots order by slot_name;
slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
slot1 | 800636 | 793188 | 211
sub1 | 401496 | 132712 | 84041
sub2 | 401496 | 396184 | 674
sub3 | 401496 | 145912 | 79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

2)
Tried to verify all filtered data tests, seems to work well. Also I
tried tracking the usage of OutputPluginWrite() to see if there is any
other place where data needs to be considered as filtered-data.
Encountered this:

send_relation_and_attrs has:
if (!logicalrep_should_publish_column(att, columns,

include_gencols_type))
continue;
if (att->atttypid < FirstGenbkiObjectId)
continue;

But I don't think it needs to be considered as filtered data. This is
mostly schema related info. But I wanted to confirm once. Thoughts?

3)
+-- total_txns may vary based on the background activity but sent_txns should
+-- always be 1 since the background transactions are always skipped. Filtered
+-- bytes would be set only when there's a change that was passed to the plugin
+-- but was filtered out. Depending upon the background transactions, filtered
+-- bytes may or may not be zero.
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
plugin_filtered_bytes >= 0 AS filtered_bytes FROM
pg_stat_replication_slots ORDER BY slot_name;

In comment either we can say plugin_sent_txns instead of sent_txns or
in the query we can fetch plugin_sent_txns AS sent_txns, so that we
can relate comment and query.

4)
+      <literal>sentTxns</literal> is the number of transactions sent downstream
+      by the output plugin. <literal>sentBytes</literal> is the amount of data
+      sent downstream by the output plugin.
+      <function>OutputPluginWrite</function> is expected to update this counter
+      if <literal>ctx-&gt;stats</literal> is initialized by the output plugin.
+      <literal>filteredBytes</literal> is the size of changes in bytes that are
+      filtered out by the output plugin. Function
+      <literal>ReorderBufferChangeSize</literal> may be used to find
the size of
+      filtered <literal>ReorderBufferChange</literal>.
+     </para>

Either we can mention units as 'bytes' for both filteredBytes and
sentBytes or for none. Currently filteredBytes says 'in bytes' while
sentBytes does not.

thanks
Shveta

#10Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#9)
Re: Report bytes and transactions actually sent downtream

Hi Shveta, Bertrand,

Replying to both of your review comments together.

On Thu, Sep 18, 2025 at 10:52 AM shveta malik <shveta.malik@gmail.com> wrote:

On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:

Here's the next patch which considers all the discussion so far. It
adds four fields to pg_stat_replication_slots.
- plugin - name of the output plugin

Is this one needed? (we could get it with a join on pg_replication_slots)

In my opinion, when there are other plugin_* fields present, including
the plugin name directly here seems like a better approach. So, +1 for
the plugin field.

Yeah. I think so too.

- plugin_filtered_bytes - reports the amount of changes filtered
out by the output plugin
- plugin_sent_txns - the amount of transactions sent downstream by
the output plugin
- plugin_sent_bytes - the amount of data sent downstream by the
outputplugin.

There are some points up for a discussion:
1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
calling pgstat_reset() or pgstat_reset_of_kind() which don't know
about the contents of the entry. So
PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
stats are reported as NULL, instead of zero, immediately after reset.
This is the same case when the stats is queried immediately after the
statistics is initialized and before any stats are reported. We could
instead make it report
zero, if we save the plugin_has_stats and restore it after reset. But
doing that in pgstat_reset_of_kind() seems like an extra overhead + we
will need to write a function to find all replication slot entries.

I tried to think of an approach where we can differentiate between the
cases 'not initialized' and 'reset' ones with the values. Say instead
of plugin_has_stats, if we have plugin_stats_status, then we can
maintain status like -1(not initialized), 0(reset). But this too will
complicate the code further. Personally, I’m okay with NULL values
appearing even after a reset, especially since the documentation
explains this clearly.

Ok. Great.

Could we store plugin_has_stats in ReplicationSlotPersistentData instead? That
way it would not be reset. We would need to access ReplicationSlotPersistentData
in pg_stat_get_replication_slot though.

Also would that make sense to expose plugin_has_stats in pg_replication_slots?

A plugin may change its decision to support the stats across versions,
we won't be able to tell when it changes that decision and thus
reflect it accurately in ReplicationSlotPersistentData. Doing it in
startup gives the opportunity to the plugin to change it as often as
it wants OR even based on some plugin specific configurations. Further
ReplicationSlotPersistentData is maintained by the core. It will not
be a good place to store something plugin specific.

2. There's also a bit of asymmetry in the way sent_bytes is handled.
The code which actually sends the logical changes to the downstream is
part of the core code
but the format of the change and hence the number of bytes sent is
decided by the plugin. It's a stat related to plugin but maintained by
the core code. The patch implements it as a plugin stat (so the
corresponding column has "plugin" prefix

The way it is done makes sense to me.

Great.

3. The names of new columns have the prefix "plugin_" but the internal
variables tracking those don't for the sake of brevity. If you prefer
to have the same prefix for the internal variables, I can change that.

I am okay either way.

Just my taste: I do prefer when they match.

I don't see a strong preference to change what's there in the patch.
Let's wait for more reviews.

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
pg_stat_replication_slots order by slot_name;
slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
slot1 | 800636 | 793188 | 211
sub1 | 401496 | 132712 | 84041
sub2 | 401496 | 396184 | 674
sub3 | 401496 | 145912 | 79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

I also agree with that. But that will break backward compatibility. Do
you think other columns like spill_* and stream_* should also be
renamed with the prefix "reordered"?

2)
Tried to verify all filtered data tests, seems to work well. Also I
tried tracking the usage of OutputPluginWrite() to see if there is any
other place where data needs to be considered as filtered-data.
Encountered this:

send_relation_and_attrs has:
if (!logicalrep_should_publish_column(att, columns,

include_gencols_type))
continue;
if (att->atttypid < FirstGenbkiObjectId)
continue;

But I don't think it needs to be considered as filtered data. This is
mostly schema related info. But I wanted to confirm once. Thoughts?

Yeah. It's part of metadata which in turn is sent only when needed.
It's not part of, say, transaction changes. So it can't be considered
as filtering.

3)
+-- total_txns may vary based on the background activity but sent_txns should
+-- always be 1 since the background transactions are always skipped. Filtered
+-- bytes would be set only when there's a change that was passed to the plugin
+-- but was filtered out. Depending upon the background transactions, filtered
+-- bytes may or may not be zero.
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
plugin_filtered_bytes >= 0 AS filtered_bytes FROM
pg_stat_replication_slots ORDER BY slot_name;

In comment either we can say plugin_sent_txns instead of sent_txns or
in the query we can fetch plugin_sent_txns AS sent_txns, so that we
can relate comment and query.

Used plugin_sent_txns in the comment as well as query.

4)
+      <literal>sentTxns</literal> is the number of transactions sent downstream
+      by the output plugin. <literal>sentBytes</literal> is the amount of data
+      sent downstream by the output plugin.
+      <function>OutputPluginWrite</function> is expected to update this counter
+      if <literal>ctx-&gt;stats</literal> is initialized by the output plugin.
+      <literal>filteredBytes</literal> is the size of changes in bytes that are
+      filtered out by the output plugin. Function
+      <literal>ReorderBufferChangeSize</literal> may be used to find
the size of
+      filtered <literal>ReorderBufferChange</literal>.
+     </para>

Either we can mention units as 'bytes' for both filteredBytes and
sentBytes or for none. Currently filteredBytes says 'in bytes' while
sentBytes does not.

Used 'in bytes' in both the places.

Thanks for your review. I will include these changes in the next set of patches.

--
Best Wishes,
Ashutosh Bapat

#11shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#10)
Re: Report bytes and transactions actually sent downtream

On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
pg_stat_replication_slots order by slot_name;
slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
slot1 | 800636 | 793188 | 211
sub1 | 401496 | 132712 | 84041
sub2 | 401496 | 396184 | 674
sub3 | 401496 | 145912 | 79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

I also agree with that. But that will break backward compatibility.

Yes, that it will do.

Do
you think other columns like spill_* and stream_* should also be
renamed with the prefix "reordered"?

Okay, I see that all fields in pg_stat_replication_slots are related
to the ReorderBuffer. On reconsideration, I’m unsure whether it's
appropriate to prefix all of them with reorderd_. For example,
renaming spill_bytes and stream_bytes to reordered_spill_bytes and
reordered_stream_bytes. These names start to feel overly long, and I
also noticed that ReorderBuffer isn’t clearly defined anywhere in the
documentation (or at least I couldn’t find it), even though the term
'reorder buffer' does appear in a few places.

As an example, see ReorderBufferRead, ReorderBufferWrite wait-types
at [1]https://www.postgresql.org/docs/17/monitoring-stats.html#WAIT-EVENT-IO-TABLE. Also in plugin-doc [2]https://www.postgresql.org/docs/17/logicaldecoding-output-plugin.html, we use 'ReorderBufferTXN'. And now, we
are adding: ReorderBufferChangeSize, ReorderBufferChange

This gives me a feeling, will it be better to let
pg_stat_replication_slots as is and add a brief ReorderBuffer section
under Logical Decoding concepts [3]https://www.postgresql.org/docs/17/logicaldecoding-explanation.html#LOGICALDECODING-EXPLANATION just before Output Plugins. And
then, pg_stat_replication_slots can refer to that section, clarifying
that the bytes, counts, and txn fields pertain to ReorderBuffer
(without changing any of the fields).

And then to define plugin related data, we can have a new view, say
pg_stat_plugin_stats (as Amit suggested earlier) or
pg_stat_replication_plugins. I understand that adding a new view might
not be desirable, but it provides better clarity without requiring
changes to the existing fields in pg_stat_replication_slots. I also
strongly feel that to properly tie all this information together, a
brief definition of the ReorderBuffer is needed. Other pages that
reference this term can then point to that section. Thoughts?

[1]: https://www.postgresql.org/docs/17/monitoring-stats.html#WAIT-EVENT-IO-TABLE
[2]: https://www.postgresql.org/docs/17/logicaldecoding-output-plugin.html
[3]: https://www.postgresql.org/docs/17/logicaldecoding-explanation.html#LOGICALDECODING-EXPLANATION

thanks
Shveta

#12Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#11)
Re: Report bytes and transactions actually sent downtream

On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
pg_stat_replication_slots order by slot_name;
slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
slot1 | 800636 | 793188 | 211
sub1 | 401496 | 132712 | 84041
sub2 | 401496 | 396184 | 674
sub3 | 401496 | 145912 | 79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

I also agree with that. But that will break backward compatibility.

Yes, that it will do.

Do
you think other columns like spill_* and stream_* should also be
renamed with the prefix "reordered"?

Okay, I see that all fields in pg_stat_replication_slots are related
to the ReorderBuffer. On reconsideration, I’m unsure whether it's
appropriate to prefix all of them with reorderd_. For example,
renaming spill_bytes and stream_bytes to reordered_spill_bytes and
reordered_stream_bytes. These names start to feel overly long, and I
also noticed that ReorderBuffer isn’t clearly defined anywhere in the
documentation (or at least I couldn’t find it), even though the term
'reorder buffer' does appear in a few places.

As an example, see ReorderBufferRead, ReorderBufferWrite wait-types
at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
are adding: ReorderBufferChangeSize, ReorderBufferChange

This gives me a feeling, will it be better to let
pg_stat_replication_slots as is and add a brief ReorderBuffer section
under Logical Decoding concepts [3] just before Output Plugins. And
then, pg_stat_replication_slots can refer to that section, clarifying
that the bytes, counts, and txn fields pertain to ReorderBuffer
(without changing any of the fields).

And then to define plugin related data, we can have a new view, say
pg_stat_plugin_stats (as Amit suggested earlier) or
pg_stat_replication_plugins. I understand that adding a new view might
not be desirable, but it provides better clarity without requiring
changes to the existing fields in pg_stat_replication_slots. I also
strongly feel that to properly tie all this information together, a
brief definition of the ReorderBuffer is needed. Other pages that
reference this term can then point to that section. Thoughts?

Even if we keep two views, when they are joined, users will still get
confused by total_* names. So it's not solving the underlying problem.
Andres had raised the point about renaming total_* fields with me
off-list earlier. He suggested names total_wal_bytes, and
total_wal_txns in an off list discussion today. I think those convey
the true meaning - that these are txns and bytes that come from WAL.
Used those in the attached patches. Prefix reordered would give away
lower level details, so I didn't use it.

I agree that it would be good to mention ReorderBuffer in the logical
decoding concepts section since it mentions structures ReorderBuffer*.
But that would be a separate patch since we aren't using "reordered"
in the names of the fields.

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

--
Best Wishes,
Ashutosh Bapat

Attachments:

0001-Report-output-plugin-statistics-in-pg_stat_-20250919.patchtext/x-patch; charset=US-ASCII; name=0001-Report-output-plugin-statistics-in-pg_stat_-20250919.patchDownload+275-62
0002-Address-review-comments-20250919.patchtext/x-patch; charset=US-ASCII; name=0002-Address-review-comments-20250919.patchDownload+61-46
#13shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#12)
Re: Report bytes and transactions actually sent downtream

On Fri, Sep 19, 2025 at 8:11 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:

On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Few comments:

1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
pg_stat_replication_slots order by slot_name;
slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
slot1 | 800636 | 793188 | 211
sub1 | 401496 | 132712 | 84041
sub2 | 401496 | 396184 | 674
sub3 | 401496 | 145912 | 79959
(4 rows)

Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.

I also agree with that. But that will break backward compatibility.

Yes, that it will do.

Do
you think other columns like spill_* and stream_* should also be
renamed with the prefix "reordered"?

Okay, I see that all fields in pg_stat_replication_slots are related
to the ReorderBuffer. On reconsideration, I’m unsure whether it's
appropriate to prefix all of them with reorderd_. For example,
renaming spill_bytes and stream_bytes to reordered_spill_bytes and
reordered_stream_bytes. These names start to feel overly long, and I
also noticed that ReorderBuffer isn’t clearly defined anywhere in the
documentation (or at least I couldn’t find it), even though the term
'reorder buffer' does appear in a few places.

As an example, see ReorderBufferRead, ReorderBufferWrite wait-types
at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
are adding: ReorderBufferChangeSize, ReorderBufferChange

This gives me a feeling, will it be better to let
pg_stat_replication_slots as is and add a brief ReorderBuffer section
under Logical Decoding concepts [3] just before Output Plugins. And
then, pg_stat_replication_slots can refer to that section, clarifying
that the bytes, counts, and txn fields pertain to ReorderBuffer
(without changing any of the fields).

And then to define plugin related data, we can have a new view, say
pg_stat_plugin_stats (as Amit suggested earlier) or
pg_stat_replication_plugins. I understand that adding a new view might
not be desirable, but it provides better clarity without requiring
changes to the existing fields in pg_stat_replication_slots. I also
strongly feel that to properly tie all this information together, a
brief definition of the ReorderBuffer is needed. Other pages that
reference this term can then point to that section. Thoughts?

Even if we keep two views, when they are joined, users will still get
confused by total_* names. So it's not solving the underlying problem.

Okay, I see your point.

Andres had raised the point about renaming total_* fields with me
off-list earlier. He suggested names total_wal_bytes, and
total_wal_txns in an off list discussion today. I think those convey
the true meaning - that these are txns and bytes that come from WAL.

I agree.

Used those in the attached patches. Prefix reordered would give away
lower level details, so I didn't use it.

I agree that it would be good to mention ReorderBuffer in the logical
decoding concepts section since it mentions structures ReorderBuffer*.
But that would be a separate patch since we aren't using "reordered"
in the names of the fields.

Okay.

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

Few trivial comments:

1)
Currently the doc says:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. OutputPluginWrite will update this counter if
ctx->stats is initialized by the output plugin. filteredBytes is the
size of changes, in bytes, that are filtered out by the output plugin.
Function ReorderBufferChangeSize may be used to find the size of
filtered ReorderBufferChange.

Shall we rearrange it to:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. filteredBytes is the size of changes, in bytes,
that are filtered out by the output plugin. OutputPluginWrite will
update these counters if ctx->stats is initialized by the output
plugin.
The function ReorderBufferChangeSize can be used to compute the size
of a filtered ReorderBufferChange, i.e., the filteredBytes.

2)
My preference will be to rename the fields 'total_txns' and
'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
'total_wal_bytes' for better clarity. Additionally, upon rethinking,
it seems better to me that plugin-related fields are also named as
plugin_* to clearly indicate their association. OTOH, in
OutputPluginStats, the field names are fine as is, since the structure
name itself clearly indicates these are plugin-related fields.
PgStat_StatReplSlotEntry lacks such context and thus using full
descriptive names there would improve clarity.

3)
LogicalOutputWrite:
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);
  p->returned_rows++;

A blank line after the new change will increase readability.

~~

In my testing, the patch works as expected. Thanks!

thanks
Shveta

#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#12)
Re: Report bytes and transactions actually sent downtream

Hi,

On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote:

On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

Thanks for the new patch version!

I did not look closely to the code yet but did some testing and I've one remark
regarding plugin_filtered_bytes: It looks ok when a publication is doing rows
filtering but when I:

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0')
then I see plugin_sent_bytes increasing (which makes sense).

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
see plugin_filtered_bytes increasing. I think that would make sense to also increase
plugin_filtered_bytes in this case (and for the other options that would skip
sending data). Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#15Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#13)
Re: Report bytes and transactions actually sent downtream

On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote:

Few trivial comments:

1)
Currently the doc says:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. OutputPluginWrite will update this counter if
ctx->stats is initialized by the output plugin. filteredBytes is the
size of changes, in bytes, that are filtered out by the output plugin.
Function ReorderBufferChangeSize may be used to find the size of
filtered ReorderBufferChange.

Shall we rearrange it to:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. filteredBytes is the size of changes, in bytes,
that are filtered out by the output plugin. OutputPluginWrite will
update these counters if ctx->stats is initialized by the output
plugin.
The function ReorderBufferChangeSize can be used to compute the size
of a filtered ReorderBufferChange, i.e., the filteredBytes.

Only sentBytes is incremented by OutputPluginWrite(), so saying that
it will update counters is not correct. But I think you intend to keep
description of all the fields together followed by any additional
information. How about the following
<literal>sentTxns</literal> is the number of transactions sent downstream
by the output plugin. <literal>sentBytes</literal> is the amount of data,
in bytes, sent downstream by the output plugin.
<literal>filteredBytes</literal> is the size of changes, in bytes, that
are filtered out by the output plugin.
<function>OutputPluginWrite</function> will update
<literal>sentBytes</literal> if <literal>ctx-&gt;stats</literal> is
initialized by the output plugin. Function
<literal>ReorderBufferChangeSize</literal> may be used to find the size of
filtered <literal>ReorderBufferChange</literal>.

2)
My preference will be to rename the fields 'total_txns' and
'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
'total_wal_bytes' for better clarity. Additionally, upon rethinking,
it seems better to me that plugin-related fields are also named as
plugin_* to clearly indicate their association. OTOH, in
OutputPluginStats, the field names are fine as is, since the structure
name itself clearly indicates these are plugin-related fields.
PgStat_StatReplSlotEntry lacks such context and thus using full
descriptive names there would improve clarity.

Ok. Done.

3)
LogicalOutputWrite:
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);
p->returned_rows++;

A blank line after the new change will increase readability.

Ok.

~~

In my testing, the patch works as expected. Thanks!

Thanks for testing. Can we include any of your tests in the patch? Are
the tests in patch enough?

Applied those suggestions in my repository. Do you have any further
review comments?

--
Best Wishes,
Ashutosh Bapat

#16Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#14)
Re: Report bytes and transactions actually sent downtream

On Tue, Sep 23, 2025 at 12:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Fri, Sep 19, 2025 at 08:11:23PM +0530, Ashutosh Bapat wrote:

On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta.malik@gmail.com> wrote:

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

Thanks for the new patch version!

I did not look closely to the code yet but did some testing and I've one remark
regarding plugin_filtered_bytes: It looks ok when a publication is doing rows
filtering but when I:

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '0')
then I see plugin_sent_bytes increasing (which makes sense).

- create a table and use pg_logical_slot_get_changes with ('skip-empty-xacts', '1')
then I don't see plugin_sent_bytes increasing (which makes sense) but I also don't
see plugin_filtered_bytes increasing. I think that would make sense to also increase
plugin_filtered_bytes in this case (and for the other options that would skip
sending data). Thoughts?

Thanks for bringing this up. I don't think we discussed this
explicitly in the thread. The changes which are filtered out by the
core itself e.g. changes to the catalogs or changes to other databases
or changes from undesired origins are not added to the reorder buffer.
They are not counted in total_bytes. The transactions containing only
such changes are not added to reorder buffer, so even total_txns does
not count such empty transactions. If we count these changes and
transactions in plugin_filtered_bytes, and plugin_filtered_txns, that
would create an anomaly - filtered counts being higher than total
counts. Further since core does not add these changes and transactions
to the reorder buffer, there is no way for a plugin to know about
their existence and hence count them. Does that make sense?

--
Best Wishes,
Ashutosh Bapat

#17Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Bapat (#12)
Re: Report bytes and transactions actually sent downtream

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

@@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
/* output previously gathered data in a CopyData packet */
pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);

+ /*
+ * If output plugin maintains statistics, update the amount of data sent
+ * downstream.
+ */
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
+

Just a small observation: I think it’s actually pq_flush_if_writable()
that writes the buffered data to the socket, not pq_putmessage_noblock
(which is actually gathering data in the buffer and not sending). So
it might make more sense to increment the sent pointer after the call
to pq_flush_if_writable().

Should we also consider - pg_hton32((uint32) (len + 4)); -- the
additional 4 bytes of data added to the send buffer.

--
With Regards,
Ashutosh Sharma.

#18shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#15)
Re: Report bytes and transactions actually sent downtream

On Tue, Sep 23, 2025 at 4:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Mon, Sep 22, 2025 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote:

Few trivial comments:

1)
Currently the doc says:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. OutputPluginWrite will update this counter if
ctx->stats is initialized by the output plugin. filteredBytes is the
size of changes, in bytes, that are filtered out by the output plugin.
Function ReorderBufferChangeSize may be used to find the size of
filtered ReorderBufferChange.

Shall we rearrange it to:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. filteredBytes is the size of changes, in bytes,
that are filtered out by the output plugin. OutputPluginWrite will
update these counters if ctx->stats is initialized by the output
plugin.
The function ReorderBufferChangeSize can be used to compute the size
of a filtered ReorderBufferChange, i.e., the filteredBytes.

Only sentBytes is incremented by OutputPluginWrite(), so saying that
it will update counters is not correct. But I think you intend to keep
description of all the fields together followed by any additional
information. How about the following
<literal>sentTxns</literal> is the number of transactions sent downstream
by the output plugin. <literal>sentBytes</literal> is the amount of data,
in bytes, sent downstream by the output plugin.
<literal>filteredBytes</literal> is the size of changes, in bytes, that
are filtered out by the output plugin.
<function>OutputPluginWrite</function> will update
<literal>sentBytes</literal> if <literal>ctx-&gt;stats</literal> is
initialized by the output plugin. Function
<literal>ReorderBufferChangeSize</literal> may be used to find the size of
filtered <literal>ReorderBufferChange</literal>.

Yes, this looks good.

2)
My preference will be to rename the fields 'total_txns' and
'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
'total_wal_bytes' for better clarity. Additionally, upon rethinking,
it seems better to me that plugin-related fields are also named as
plugin_* to clearly indicate their association. OTOH, in
OutputPluginStats, the field names are fine as is, since the structure
name itself clearly indicates these are plugin-related fields.
PgStat_StatReplSlotEntry lacks such context and thus using full
descriptive names there would improve clarity.

Ok. Done.

3)
LogicalOutputWrite:
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);
p->returned_rows++;

A blank line after the new change will increase readability.

Ok.

~~

In my testing, the patch works as expected. Thanks!

Thanks for testing. Can we include any of your tests in the patch? Are
the tests in patch enough?

I tested the flows with
a) logical replication slot and get-changes.
b) filtered data flows: pub-sub creation with row_filters, 'publish'
options. I tried to verify plugin fields as compared to total_wal*
fields.
c) reset flow.

While tests for a and c are present already. I don't see tests for b
anywhere when it comes to stats. Do you think we shall add a test for
filtered data using row-filter somewhere?

Applied those suggestions in my repository. Do you have any further
review comments?

No, I think that is all.

thanks
Shveta

#19Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Sharma (#17)
Re: Report bytes and transactions actually sent downtream

On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

@@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
/* output previously gathered data in a CopyData packet */
pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);

+ /*
+ * If output plugin maintains statistics, update the amount of data sent
+ * downstream.
+ */
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
+

Just a small observation: I think it’s actually pq_flush_if_writable()
that writes the buffered data to the socket, not pq_putmessage_noblock
(which is actually gathering data in the buffer and not sending). So
it might make more sense to increment the sent pointer after the call
to pq_flush_if_writable().

That's a good point. I placed it after pq_putmessage_noblock() so that
it's easy to link the increment to sentBytes and the actual bytes
being sent. You are right that the bytes won't be sent unless
pq_flush_if_writable() is called but it will be called for sure before
the next UpdateDecodingStats(). So the reported bytes are never wrong.
I would prefer readability over seeming accuracy.

Should we also consider - pg_hton32((uint32) (len + 4)); -- the
additional 4 bytes of data added to the send buffer.

In WalSndWriteData() we can't rely on what happens in a low level API
like socket_putmessage(). And we are counting the number of bytes in
the logically decoded message. So, I actually wonder whether we should
count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?

--
Best Wishes,
Ashutosh Bapat

#20shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#19)
Re: Report bytes and transactions actually sent downtream

On Wed, Sep 24, 2025 at 11:08 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Sep 23, 2025 at 6:28 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

0001 is the previous patch
0002 changes addressing your and Bertrand's comments.

@@ -1573,6 +1573,13 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
/* output previously gathered data in a CopyData packet */
pq_putmessage_noblock(PqMsg_CopyData, ctx->out->data, ctx->out->len);

+ /*
+ * If output plugin maintains statistics, update the amount of data sent
+ * downstream.
+ */
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + 1; /* +1 for the 'd' */
+

Just a small observation: I think it’s actually pq_flush_if_writable()
that writes the buffered data to the socket, not pq_putmessage_noblock
(which is actually gathering data in the buffer and not sending). So
it might make more sense to increment the sent pointer after the call
to pq_flush_if_writable().

That's a good point. I placed it after pq_putmessage_noblock() so that
it's easy to link the increment to sentBytes and the actual bytes
being sent. You are right that the bytes won't be sent unless
pq_flush_if_writable() is called but it will be called for sure before
the next UpdateDecodingStats(). So the reported bytes are never wrong.
I would prefer readability over seeming accuracy.

Should we also consider - pg_hton32((uint32) (len + 4)); -- the
additional 4 bytes of data added to the send buffer.

In WalSndWriteData() we can't rely on what happens in a low level API
like socket_putmessage(). And we are counting the number of bytes in
the logically decoded message. So, I actually wonder whether we should
count 1 byte of 'd' in sentBytes. Shveta, Bertand, what do you think?

If we are not counting all such metadata bytes ((or can't reliably do
so), then IMO, we shall skip counting msgtype as well.

thanks
Shveta

#21Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: shveta malik (#20)
#22Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#16)
#23Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#21)
#24Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#22)
#25Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#24)
#26shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#23)
#27Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#25)
#28Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#26)
#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#27)
#30Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#28)
#31shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#28)
#32Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#29)
#33Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#30)
#34Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#33)
#35Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#34)
#36Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#35)
#37Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#36)
#38Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#37)
#39Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#38)
#40Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#39)
#41Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#40)
#42Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#41)
#43Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#42)
#44Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#43)
#45shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#40)
#46shveta malik
shveta.malik@gmail.com
In reply to: shveta malik (#45)
#47Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#46)
#48shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#47)
#49Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#48)
#50shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#49)
#51Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#50)
#52shveta malik
shveta.malik@gmail.com
In reply to: Ashutosh Bapat (#51)
#53Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: shveta malik (#52)
#54Andres Freund
andres@anarazel.de
In reply to: Ashutosh Bapat (#53)
#55Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Andres Freund (#54)
#56Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#55)
#57Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#56)
#58Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#57)
#59Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#58)
#60Chao Li
li.evan.chao@gmail.com
In reply to: Ashutosh Bapat (#59)
#61Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Chao Li (#60)
#62Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#59)
#63Chao Li
li.evan.chao@gmail.com
In reply to: Ashutosh Bapat (#61)
#64Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#62)
#65Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Chao Li (#63)
#66Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#64)
#67Chao Li
li.evan.chao@gmail.com
In reply to: Ashutosh Bapat (#65)
#68Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#66)
#69Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#68)
#70Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Chao Li (#67)
#71Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ashutosh Bapat (#59)
#72Andres Freund
andres@anarazel.de
In reply to: Ashutosh Bapat (#59)
#73Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Andres Freund (#72)
#74Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Ashutosh Bapat (#73)
#75Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Bertrand Drouvot (#74)
#76Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Bapat (#75)
#77Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Michael Paquier (#76)