logical replication empty transactions
After setting up logical replication of a slowly changing table using the
built in pub/sub facility, I noticed way more network traffic than made
sense. Looking into I see that every transaction in that database on the
master gets sent to the replica. 99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables. Is doing it this way necessary for some reason? Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?
There is a comment for WalSndPrepareWrite which seems to foreshadow such a
thing, but I don't really see how to use it in this case. I want to drop
two messages, not one.
* Don't do anything lasting in here, it's quite possible that nothing will
be done
* with the data.
This applies to all version which have support for pub/sub, including the
recent commits to 13dev.
I've searched through the voluminous mailing list threads for when this
feature was being presented to see if it was already discussed, but since
every word I can think to search on occurs in virtually every message in
the threads in some context or another, I didn't have much luck.
Cheers,
Jeff
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:
After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.
--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Attachments:
0001-Skip-empty-transactions-for-logical-replication.patchtext/x-patch; charset=US-ASCII; name=0001-Skip-empty-transactions-for-logical-replication.patchDownload+37-1
On Fri, Nov 8, 2019 at 8:59 PM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using
the built in pub/sub facility, I noticed way more network traffic than made
sense. Looking into I see that every transaction in that database on the
master gets sent to the replica. 99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables. Is doing it this way necessary for some reason? Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.
Thanks. I didn't think it would be that simple, because I thought we would
need some way to fake an acknowledgement for any dropped empty
transactions, to keep the LSN advancing and allow WAL to get recycled on
the master. But it turns out the opposite. While your patch drops the
network traffic by a lot, there is still a lot of traffic. Now it is
keep-alives, rather than 'B' and 'C'. I don't know why I am getting a few
hundred keep alives every second when the timeouts are at their defaults,
but it is better than several thousand 'B' and 'C'.
My setup here was just to create, publish, and subscribe to a inactive
dummy table, while having pgbench running on the master (with unpublished
tables). I have not created an intentionally slow network, but I am
testing it over wifi, which is inherently kind of slow.
Cheers,
Jeff
On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.
I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?
It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber. This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.
Will this work after restart? Do you want to persist the information
of last streamed commit LSN?
This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.Will this work after restart? Do you want to persist the information
of last streamed commit LSN?
We will not persist the last streamed commit LSN, this variable is in
memory just to track whether we have got confirmation up to that
location or not, once we have confirmation up to that location and if
we are not streaming any transaction (because those are empty
transactions) then we can just advance the confirmed flush location
and based on that we can update the restart point as well and those
will be persisted. Basically, "last streamed commit LSN" is just a
marker that their still something pending to be confirmed from the
subscriber so until that we can not simply advance the confirm flush
location or restart point based on the empty transactions. But, if
there is nothing pending to be confirmed we can advance. So if we are
streaming then we will get confirmation from subscriber otherwise we
can advance it locally. So, in either case, the confirmed flush
location and restart point will keep moving.
This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.
Yeah, this could be also an option.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 2:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.Will this work after restart? Do you want to persist the information
of last streamed commit LSN?We will not persist the last streamed commit LSN, this variable is in
memory just to track whether we have got confirmation up to that
location or not, once we have confirmation up to that location and if
we are not streaming any transaction (because those are empty
transactions) then we can just advance the confirmed flush location
and based on that we can update the restart point as well and those
will be persisted. Basically, "last streamed commit LSN" is just a
marker that their still something pending to be confirmed from the
subscriber so until that we can not simply advance the confirm flush
location or restart point based on the empty transactions. But, if
there is nothing pending to be confirmed we can advance. So if we are
streaming then we will get confirmation from subscriber otherwise we
can advance it locally. So, in either case, the confirmed flush
location and restart point will keep moving.
Okay, so this might work out, but it might look a bit ad-hoc.
This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Yeah, this could be also an option.
Okay.
Peter E, Petr J, others, do you have any opinion on what is the best
way forward for this thread? I think it would be really good if we
can reduce the network traffic due to these empty transactions.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we
have control; transactions aren't (depending on workload).
pg_stat_replication query interval usually is not milliseconds, however,
you can execute thousands of transactions in a second. If we agree on that
idea I can add it to the patch.
Regards,
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:
On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart. OTOH, if we set based on an interval
then even if there is not much work going on, still we end up sending
the empty transaction as pointed by Amit.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.
Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?
I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.
You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?
Ok, that make sense.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?Ok, that make sense.
Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, 5 Mar 2020 at 05:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?I will do.
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?
Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.
The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.
--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise
Hi,
On 2020-03-06 13:53:02 +0800, Craig Ringer wrote:
On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.
It'll still delay it a bit.
The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.
Somewhat independent from the issue at hand: It'd be really good if we
could evolve the syncrep framework to support per-database waiting... It
shouldn't be that hard, and the current situation sucks quite a bit (and
yes, I'm to blame).
I'm not quite sure what you mean by "poke the walsender"? Kinda sounds
like sending a signal, but decoding happens inside after the walsender,
so there's no need for that. Do you just mean somehow requesting that
walsender sends a feedback message?
To address the volume we could:
1a) Introduce a pgoutput message type to indicate that the LSN has
advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is
21 bytes, COMMIT is 26. But we really don't need that much here. A
single message should do the trick.
1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and
possibly rename) that indicates that we are intentionally "ignoring"
WAL. For walsender that callback then could check if it could just
forward the position of the client (if it was entirely caught up
before), or if it should send a feedback request (if syncrep is
enabled, or distance is big).
2) Reduce the rate of 'empty transaction'/feedback request messages. If
we know that we're not going to be blocked waiting for more WAL, or
blocked sending messages out to the network, we don't immediately need
to send out the messages. Instead we could continue decoding until
there's actual data, or until we're going to get blocked.
We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.
Greetings,
Andres Freund