logical replication empty transactions

Started by Jeff Janesover 6 years ago108 messageshackers
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

After setting up logical replication of a slowly changing table using the
built in pub/sub facility, I noticed way more network traffic than made
sense. Looking into I see that every transaction in that database on the
master gets sent to the replica. 99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables. Is doing it this way necessary for some reason? Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?

There is a comment for WalSndPrepareWrite which seems to foreshadow such a
thing, but I don't really see how to use it in this case. I want to drop
two messages, not one.

* Don't do anything lasting in here, it's quite possible that nothing will
be done
* with the data.

This applies to all version which have support for pub/sub, including the
recent commits to 13dev.

I've searched through the voluminous mailing list threads for when this
feature was being presented to see if it was already discussed, but since
every word I can think to search on occurs in virtually every message in
the threads in some context or another, I didn't have much luck.

Cheers,

Jeff

In reply to: Jeff Janes (#1)
Re: logical replication empty transactions

Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:

After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?

That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Attachments:

0001-Skip-empty-transactions-for-logical-replication.patchtext/x-patch; charset=US-ASCII; name=0001-Skip-empty-transactions-for-logical-replication.patchDownload+37-1
#3Jeff Janes
jeff.janes@gmail.com
In reply to: Euler Taveira de Oliveira (#2)
Re: logical replication empty transactions

On Fri, Nov 8, 2019 at 8:59 PM Euler Taveira <euler@timbira.com.br> wrote:

Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:

After setting up logical replication of a slowly changing table using

the built in pub/sub facility, I noticed way more network traffic than made
sense. Looking into I see that every transaction in that database on the
master gets sent to the replica. 99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables. Is doing it this way necessary for some reason? Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?

That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.

Thanks. I didn't think it would be that simple, because I thought we would
need some way to fake an acknowledgement for any dropped empty
transactions, to keep the LSN advancing and allow WAL to get recycled on
the master. But it turns out the opposite. While your patch drops the
network traffic by a lot, there is still a lot of traffic. Now it is
keep-alives, rather than 'B' and 'C'. I don't know why I am getting a few
hundred keep alives every second when the timeouts are at their defaults,
but it is better than several thousand 'B' and 'C'.

My setup here was just to create, publish, and subscribe to a inactive
dummy table, while having pgbench running on the master (with unpublished
tables). I have not created an intentionally slow network, but I am
testing it over wifi, which is inherently kind of slow.

Cheers,

Jeff

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Euler Taveira de Oliveira (#2)
Re: logical replication empty transactions

On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:

Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:

After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?

That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.

I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#4)
Re: logical replication empty transactions

On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:

Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:

After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?

That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.

I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#5)
Re: logical replication empty transactions

On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:

Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:

After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?

That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.

I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?

It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber. This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#6)
Re: logical replication empty transactions

On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?

It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.

Will this work after restart? Do you want to persist the information
of last streamed commit LSN?

This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#7)
Re: logical replication empty transactions

On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?

It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.

Will this work after restart? Do you want to persist the information
of last streamed commit LSN?

We will not persist the last streamed commit LSN, this variable is in
memory just to track whether we have got confirmation up to that
location or not, once we have confirmation up to that location and if
we are not streaming any transaction (because those are empty
transactions) then we can just advance the confirmed flush location
and based on that we can update the restart point as well and those
will be persisted. Basically, "last streamed commit LSN" is just a
marker that their still something pending to be confirmed from the
subscriber so until that we can not simply advance the confirm flush
location or restart point based on the empty transactions. But, if
there is nothing pending to be confirmed we can advance. So if we are
streaming then we will get confirmation from subscriber otherwise we
can advance it locally. So, in either case, the confirmed flush
location and restart point will keep moving.

This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Yeah, this could be also an option.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#8)
Re: logical replication empty transactions

On Tue, Mar 3, 2020 at 2:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?

It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.

Will this work after restart? Do you want to persist the information
of last streamed commit LSN?

We will not persist the last streamed commit LSN, this variable is in
memory just to track whether we have got confirmation up to that
location or not, once we have confirmation up to that location and if
we are not streaming any transaction (because those are empty
transactions) then we can just advance the confirmed flush location
and based on that we can update the restart point as well and those
will be persisted. Basically, "last streamed commit LSN" is just a
marker that their still something pending to be confirmed from the
subscriber so until that we can not simply advance the confirm flush
location or restart point based on the empty transactions. But, if
there is nothing pending to be confirmed we can advance. So if we are
streaming then we will get confirmation from subscriber otherwise we
can advance it locally. So, in either case, the confirmed flush
location and restart point will keep moving.

Okay, so this might work out, but it might look a bit ad-hoc.

This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Yeah, this could be also an option.

Okay.

Peter E, Petr J, others, do you have any opinion on what is the best
way forward for this thread? I think it would be really good if we
can reduce the network traffic due to these empty transactions.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#10Euler Taveira
euler.taveira@2ndquadrant.com
In reply to: Amit Kapila (#7)
Re: logical replication empty transactions

On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Amit, I suggest an interval to control this setting. Time is something we

have control; transactions aren't (depending on workload).
pg_stat_replication query interval usually is not milliseconds, however,
you can execute thousands of transactions in a second. If we agree on that
idea I can add it to the patch.

Regards,

--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#10)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:

On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.

Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#12Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#11)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:

On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.

Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.

IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart. OTOH, if we set based on an interval
then even if there is not much work going on, still we end up sending
the empty transaction as pointed by Amit.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#12)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:

On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.

Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.

IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.

Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#14Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#13)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:

On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:

Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.

Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.

Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.

IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.

Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?

I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#14)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.

Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?

I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.

You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#16Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#15)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.

Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?

I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.

You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?

Ok, that make sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#16)
Re: logical replication empty transactions

On Wed, Mar 4, 2020 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.

Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?

I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.

You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?

Ok, that make sense.

Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#18Euler Taveira
euler.taveira@2ndquadrant.com
In reply to: Amit Kapila (#17)
Re: logical replication empty transactions

On Thu, 5 Mar 2020 at 05:45, Amit Kapila <amit.kapila16@gmail.com> wrote:

Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?

I will do.

--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Craig Ringer
craig@2ndquadrant.com
In reply to: Amit Kapila (#5)
Re: logical replication empty transactions

On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?

Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.

The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

#20Andres Freund
andres@anarazel.de
In reply to: Craig Ringer (#19)
Re: logical replication empty transactions

Hi,

On 2020-03-06 13:53:02 +0800, Craig Ringer wrote:

On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?

Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.

It'll still delay it a bit.

The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.

Somewhat independent from the issue at hand: It'd be really good if we
could evolve the syncrep framework to support per-database waiting... It
shouldn't be that hard, and the current situation sucks quite a bit (and
yes, I'm to blame).

I'm not quite sure what you mean by "poke the walsender"? Kinda sounds
like sending a signal, but decoding happens inside after the walsender,
so there's no need for that. Do you just mean somehow requesting that
walsender sends a feedback message?

To address the volume we could:

1a) Introduce a pgoutput message type to indicate that the LSN has
advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is
21 bytes, COMMIT is 26. But we really don't need that much here. A
single message should do the trick.

1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and
possibly rename) that indicates that we are intentionally "ignoring"
WAL. For walsender that callback then could check if it could just
forward the position of the client (if it was entirely caught up
before), or if it should send a feedback request (if syncrep is
enabled, or distance is big).

2) Reduce the rate of 'empty transaction'/feedback request messages. If
we know that we're not going to be blocked waiting for more WAL, or
blocked sending messages out to the network, we don't immediately need
to send out the messages. Instead we could continue decoding until
there's actual data, or until we're going to get blocked.

We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.

Greetings,

Andres Freund

#21Craig Ringer
craig@2ndquadrant.com
In reply to: Andres Freund (#20)
#22Ajin Cherian
itsajin@gmail.com
In reply to: Craig Ringer (#21)
#23Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#22)
#24Rahila Syed
rahila.syed@2ndquadrant.com
In reply to: Craig Ringer (#21)
#25Michael Paquier
michael@paquier.xyz
In reply to: Rahila Syed (#24)
#26Ajin Cherian
itsajin@gmail.com
In reply to: Michael Paquier (#25)
#27Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#26)
#28Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#27)
#29Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#28)
#30Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#29)
#31Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#29)
#32Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#31)
#33Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#32)
#34vignesh C
vignesh21@gmail.com
In reply to: Ajin Cherian (#33)
#35Ajin Cherian
itsajin@gmail.com
In reply to: vignesh C (#34)
#36osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#35)
#37Peter Smith
smithpb2250@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#36)
#38Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#37)
#39Ajin Cherian
itsajin@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#36)
#40Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#39)
#41Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#40)
#42Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#41)
#43Greg Nancarrow
gregn4422@gmail.com
In reply to: Ajin Cherian (#41)
#44Ajin Cherian
itsajin@gmail.com
In reply to: Greg Nancarrow (#43)
#45Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#42)
#46Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#45)
#47Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#46)
#48Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#47)
#49Greg Nancarrow
gregn4422@gmail.com
In reply to: Ajin Cherian (#47)
#50osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#47)
#51Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#47)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#47)
#53Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#52)
#54Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#53)
#55Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#52)
#56Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#55)
#57Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#56)
#58Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#57)
#59Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#58)
#60Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#59)
#61osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#60)
#62osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#60)
#63Ajin Cherian
itsajin@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#61)
#64Ajin Cherian
itsajin@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#62)
#65osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#63)
#66Ajin Cherian
itsajin@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#65)
#67osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#66)
#68osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Ajin Cherian (#66)
#69Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#66)
#70Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#68)
#71Ajin Cherian
itsajin@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#68)
#72Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#69)
#73Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#67)
#74osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#73)
#75Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#74)
#76Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#66)
#77Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#69)
#78wangw.fnst@fujitsu.com
wangw.fnst@fujitsu.com
In reply to: Ajin Cherian (#77)
#79Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#77)
#80Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#75)
#81Ajin Cherian
itsajin@gmail.com
In reply to: Peter Smith (#79)
#82shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Ajin Cherian (#81)
#83Ajin Cherian
itsajin@gmail.com
In reply to: shiy.fnst@fujitsu.com (#82)
#84Ajin Cherian
itsajin@gmail.com
In reply to: shiy.fnst@fujitsu.com (#82)
#85Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#84)
#86Peter Smith
smithpb2250@gmail.com
In reply to: Ajin Cherian (#85)
#87shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Ajin Cherian (#85)
#88Ajin Cherian
itsajin@gmail.com
In reply to: shiy.fnst@fujitsu.com (#87)
#89Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#88)
#90Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#89)
#91Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#90)
#92Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#91)
#93Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#92)
#94Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#93)
#95Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#94)
#96Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#95)
#97shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#96)
#98Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#96)
#99Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#98)
#100Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#99)
#101Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#100)
#102Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#101)
#103Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#102)
#104Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#103)
#105Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#104)
#106shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Zhijie Hou (Fujitsu) (#105)
#107Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#105)
#108Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#107)