logical replication empty transactions
After setting up logical replication of a slowly changing table using the
built in pub/sub facility, I noticed way more network traffic than made
sense. Looking into I see that every transaction in that database on the
master gets sent to the replica. 99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables. Is doing it this way necessary for some reason? Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?
There is a comment for WalSndPrepareWrite which seems to foreshadow such a
thing, but I don't really see how to use it in this case. I want to drop
two messages, not one.
* Don't do anything lasting in here, it's quite possible that nothing will
be done
* with the data.
This applies to all version which have support for pub/sub, including the
recent commits to 13dev.
I've searched through the voluminous mailing list threads for when this
feature was being presented to see if it was already discussed, but since
every word I can think to search on occurs in virtually every message in
the threads in some context or another, I didn't have much luck.
Cheers,
Jeff
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:
After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.
--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Attachments:
0001-Skip-empty-transactions-for-logical-replication.patchtext/x-patch; charset=US-ASCII; name=0001-Skip-empty-transactions-for-logical-replication.patchDownload
From 433ea40a02ab823f3aa70c18928b9862f0eb004b Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler@timbira.com.br>
Date: Fri, 8 Nov 2019 12:48:03 -0300
Subject: [PATCH] Skip empty transactions for logical replication
The current logical replication behavior is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit those empty transactions.
Postpone the BEGIN message until the first change. While processing a
COMMIT message, if there is not a previous wrote change for that
transaction, does not send COMMIT message. It means that pgoutput will
skip BEGIN / COMMIT messages for transactions that do not wrote changes.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
src/backend/replication/pgoutput/pgoutput.c | 34 +++++++++++++++++++++++++++++
src/include/replication/pgoutput.h | 3 +++
2 files changed, 37 insertions(+)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 9c08757..eed1093 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -212,6 +212,22 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, common scenarios is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were to
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->xact_wrote_changes = false;
+}
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
OutputPluginPrepareWrite(ctx, !send_replication_origin);
@@ -249,8 +265,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->xact_wrote_changes)
+ return;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -335,6 +357,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes)
+ pgoutput_begin(ctx, txn);
+
+ data->xact_wrote_changes = true;
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -415,6 +443,12 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes)
+ pgoutput_begin(ctx, txn);
+
+ data->xact_wrote_changes = true;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
nrelids,
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 8870721..cb57e76 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ /* control wether messages can already be sent */
+ bool xact_wrote_changes;
+
/* client info */
uint32 protocol_version;
--
2.7.4
On Fri, Nov 8, 2019 at 8:59 PM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using
the built in pub/sub facility, I noticed way more network traffic than made
sense. Looking into I see that every transaction in that database on the
master gets sent to the replica. 99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables. Is doing it this way necessary for some reason? Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.
Thanks. I didn't think it would be that simple, because I thought we would
need some way to fake an acknowledgement for any dropped empty
transactions, to keep the LSN advancing and allow WAL to get recycled on
the master. But it turns out the opposite. While your patch drops the
network traffic by a lot, there is still a lot of traffic. Now it is
keep-alives, rather than 'B' and 'C'. I don't know why I am getting a few
hundred keep alives every second when the timeouts are at their defaults,
but it is better than several thousand 'B' and 'C'.
My setup here was just to create, publish, and subscribe to a inactive
dummy table, while having pgbench running on the master (with unpublished
tables). I have not created an intentionally slow network, but I am
testing it over wifi, which is inherently kind of slow.
Cheers,
Jeff
On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.
I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira <euler@timbira.com.br> wrote:
Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
<jeff.janes@gmail.com> escreveu:After setting up logical replication of a slowly changing table using the built in pub/sub facility, I noticed way more network traffic than made sense. Looking into I see that every transaction in that database on the master gets sent to the replica. 99.999+% of them are empty transactions ('B' message and 'C' message with nothing in between) because the transactions don't touch any tables in the publication, only non-replicated tables. Is doing it this way necessary for some reason? Couldn't we hold the transmission of 'B' until something else comes along, and then if that next thing is 'C' drop both of them?
That is not optimal. Those empty transactions is a waste of bandwidth.
We can suppress them if no changes will be sent. test_decoding
implements "skip empty transaction" as you described above and I did
something similar to it. Patch is attached.I think this significantly reduces the network bandwidth for empty
transactions. I have briefly reviewed the patch and it looks good to
me.One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?
It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber. This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.
Will this work after restart? Do you want to persist the information
of last streamed commit LSN?
This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.Will this work after restart? Do you want to persist the information
of last streamed commit LSN?
We will not persist the last streamed commit LSN, this variable is in
memory just to track whether we have got confirmation up to that
location or not, once we have confirmation up to that location and if
we are not streaming any transaction (because those are empty
transactions) then we can just advance the confirmed flush location
and based on that we can update the restart point as well and those
will be persisted. Basically, "last streamed commit LSN" is just a
marker that their still something pending to be confirmed from the
subscriber so until that we can not simply advance the confirm flush
location or restart point based on the empty transactions. But, if
there is nothing pending to be confirmed we can advance. So if we are
streaming then we will get confirmation from subscriber otherwise we
can advance it locally. So, in either case, the confirmed flush
location and restart point will keep moving.
This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.
Yeah, this could be also an option.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 2:17 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber. After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn. Is that correct, if so, why do we think that is
acceptable? One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough. What do you think?It seems like a valid point. One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far. So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.Will this work after restart? Do you want to persist the information
of last streamed commit LSN?We will not persist the last streamed commit LSN, this variable is in
memory just to track whether we have got confirmation up to that
location or not, once we have confirmation up to that location and if
we are not streaming any transaction (because those are empty
transactions) then we can just advance the confirmed flush location
and based on that we can update the restart point as well and those
will be persisted. Basically, "last streamed commit LSN" is just a
marker that their still something pending to be confirmed from the
subscriber so until that we can not simply advance the confirm flush
location or restart point based on the empty transactions. But, if
there is nothing pending to be confirmed we can advance. So if we are
streaming then we will get confirmation from subscriber otherwise we
can advance it locally. So, in either case, the confirmed flush
location and restart point will keep moving.
Okay, so this might work out, but it might look a bit ad-hoc.
This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Yeah, this could be also an option.
Okay.
Peter E, Petr J, others, do you have any opinion on what is the best
way forward for this thread? I think it would be really good if we
can reduce the network traffic due to these empty transactions.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we
have control; transactions aren't (depending on workload).
pg_stat_replication query interval usually is not milliseconds, however,
you can execute thousands of transactions in a second. If we agree on that
idea I can add it to the patch.
Regards,
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:
On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart. OTOH, if we set based on an interval
then even if there is not much work going on, still we end up sending
the empty transaction as pointed by Amit.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.
Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
Another idea could be that we stream the transaction after some
threshold number (say 100 or anything we think is reasonable) of empty
xacts. This will reduce the traffic without tinkering with the core
design too much.Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
Do you mean to say that if for some threshold interval we didn't
stream any transaction, then we can send the next empty transaction to
the subscriber? If so, then isn't it possible that the empty xacts
happen irregularly after the specified interval and then we still end
up sending them all. I might be missing something here, so can you
please explain your idea in detail? Basically, how will it work and
how will it solve the problem.IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?
I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.
You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?
Ok, that make sense.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Mar 4, 2020 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
IMHO, the threshold should be based on the commit LSN. Our main
reason we want to send empty transactions after a certain
transaction/duration is that we want the restart_lsn to be moving
forward so that if we need to restart the replication slot we don't
need to process a lot of extra WAL. So assume we set the threshold
based on transaction count then there is still a possibility that we
might process a few very big transactions then we will have to process
them again after the restart.Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.You have a point and it may be better to keep threshold based on LSN
if we want to keep any threshold, but keeping on transaction count
seems to be a bit straightforward. Let us see if anyone else has any
opinion on this matter?Ok, that make sense.
Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, 5 Mar 2020 at 05:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?I will do.
--
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?
Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.
The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.
--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise
Hi,
On 2020-03-06 13:53:02 +0800, Craig Ringer wrote:
On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.
It'll still delay it a bit.
The bigger issue here is that if you don't send empty txns on logical
replication you don't get an eager, timely response from the
replica(s), which delays synchronous replication. You need to send
empty txns when synchronous replication is enabled, or instead poke
the walsender to force immediate feedback with reply requested.
Somewhat independent from the issue at hand: It'd be really good if we
could evolve the syncrep framework to support per-database waiting... It
shouldn't be that hard, and the current situation sucks quite a bit (and
yes, I'm to blame).
I'm not quite sure what you mean by "poke the walsender"? Kinda sounds
like sending a signal, but decoding happens inside after the walsender,
so there's no need for that. Do you just mean somehow requesting that
walsender sends a feedback message?
To address the volume we could:
1a) Introduce a pgoutput message type to indicate that the LSN has
advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is
21 bytes, COMMIT is 26. But we really don't need that much here. A
single message should do the trick.
1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and
possibly rename) that indicates that we are intentionally "ignoring"
WAL. For walsender that callback then could check if it could just
forward the position of the client (if it was entirely caught up
before), or if it should send a feedback request (if syncrep is
enabled, or distance is big).
2) Reduce the rate of 'empty transaction'/feedback request messages. If
we know that we're not going to be blocked waiting for more WAL, or
blocked sending messages out to the network, we don't immediately need
to send out the messages. Instead we could continue decoding until
there's actual data, or until we're going to get blocked.
We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.
Greetings,
Andres Freund
On Tue, 10 Mar 2020 at 02:30, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-03-06 13:53:02 +0800, Craig Ringer wrote:
On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit.kapila16@gmail.com>
wrote:
One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?Same way we already do it for writes that are not replicated over
logical replication, like vacuum work etc. The upstream sends feedback
with reply-requested. The downstream replies. The upstream advances
confirmed_flush_lsn, and that lazily updates restart_lsn.It'll still delay it a bit.
Right, but we don't generally care because there's no sync rep txn waiting
for confirmation. If we lose progress due to a crash it doesn't matter. It
does delay removal of old WAL a little, but it hardly matters.
Somewhat independent from the issue at hand: It'd be really good if we
could evolve the syncrep framework to support per-database waiting... It
shouldn't be that hard, and the current situation sucks quite a bit (and
yes, I'm to blame).
Hardly, you just didn't get the chance to fix that on top of the umpteen
other things you had to change to make all the logical stuff work. You
didn't break it, just didn't implement every single possible enhancement
all at once. Shocking, I tell you.
I'm not quite sure what you mean by "poke the walsender"? Kinda sounds
like sending a signal, but decoding happens inside after the walsender,
so there's no need for that. Do you just mean somehow requesting that
walsender sends a feedback message?
Right. I had in mind something like sending a ProcSignal via our funky
multiplexed signal mechanism to ask the walsender to immediately generate a
keepalive message with a reply-requested flag, then set the walsender's
latch so we wake it promptly.
To address the volume we could:
1a) Introduce a pgoutput message type to indicate that the LSN has
advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is
21 bytes, COMMIT is 26. But we really don't need that much here. A
single message should do the trick.
It would. Is it worth caring though? Especially since it seems rather
unlikely that the actual network data volume of begin/commit msgs will be
much of a concern. It's not like we're PITRing logical streams, and if we
did, we could just filter out empty commits on the receiver side.
That message pretty much already exists in the form of a walsender
keepalive anyway so we might as well re-use that and not upset the protocol.
1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and
possibly rename) that indicates that we are intentionally "ignoring"
WAL. For walsender that callback then could check if it could just
forward the position of the client (if it was entirely caught up
before), or if it should send a feedback request (if syncrep is
enabled, or distance is big).
I can see something like that being very useful, because at present only
the output plugin knows if a txn is "empty" as far as that particular slot
and output plugin is concerned. The reorder buffering mechanism cannot do
relation-level filtering before it sends the changes to the output plugin
during ReorderBufferCommit, since it only knows about relfilenodes not
relation oids. And the output plugin might be doing finer grained filtering
using row-filter expressions or who knows what else.
But as described above that will only help for txns done in DBs other than
the one the logical slot is for or txns known to have an empty
ReorderBuffer when the commit is seen.
If there's a txn in the slot's db with a non-empty reorderbuffer, the
output plugin won't know if the txn is empty or not until it finishes
processing all callbacks and sees the commit for the txn. So it will
generally have emitted the Begin message on the wire by the time it knows
it has nothing useful to say. And Pg won't know that this txn is empty as
far as this output plugin with this particular slot, set of output plugin
params, and current user-catalog state is concerned, so it won't have any
way to call the output plugin's "update progress" callback instead of the
usual begin/change/commit callbacks.
But I think we can already skip empty txns unless sync-rep is enabled with
no core changes, and send empty txns as walsender keepalives instead, by
altering only output plugins, like this:
* Stash BEGIN data in plugin's LogicalDecodingContext.output_plugin_private
when plugin's begin callback called, don't write anything to the outstream
* Write out BEGIN message lazily when any other callback generates a
message that does need to be written out
* If no BEGIN written by the time COMMIT callback called, discard the
COMMIT too. Check if sync rep enabled. if it is,
call LogicalDecodingContext.update_progress from within the output plugin
commit handler, otherwise just ignore the commit totally. Probably by
calling OutputPluginUpdateProgress().
We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.
I can see that being handy, yes. But it won't necessarily help with the
sync rep issue, since other sync rep txns may continue to generate WAL
while others wait for commit-confirmations that won't come from the logical
replica.
While we're speaking of adding output plugin hooks, I keep on trying to
think of a sensible way to do a plugin-defined reply handler, so the
downstream end can send COPY BOTH messages of some new msgkind back to the
walsender, which will pass them to the output plugin if it implements the
appropriate handle_reply_message (or whatever) callback. That much is
trivial to implement, where I keep getting a bit stuck is with whether
there's a sensible snapshot that can be set to call the output plugin reply
handler with. We wouldn't want to switch to a current non-historic snapshot
because of all the cache flushes that'd cause, but there isn't necessarily
a valid and safe historic snapshot to set when we're not within
ReorderBufferCommit is there?
I'd love to get rid of the need to "connect back" to a provider over plain
libpq connections to communicate with it. The ability to run SQL on the
walsender conn helps. But really, so much more would be possible if we
could just have the downstream end *reply* on the same connection using
COPY BOTH, much like it sends replay progress updates right now. It'd let
us manage relation/attribute/type metadata caches better for example.
Thoughts?
--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise
The patch no longer applies, because of additions in the test source. Otherwise, I have tested the patch and confirmed that updates and deletes on tables with deferred primary keys work with logical replication.
The new status of this patch is: Waiting on Author
Sorry, I replied in the wrong thread. Please ignore above mail.
Show quoted text
Hi,
Please see below review of the
0001-Skip-empty-transactions-for-logical-replication.patch
The make check passes.
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes)
+ pgoutput_begin(ctx, txn);
+
+ data->xact_wrote_changes = true;
+
IMO, xact_wrote_changes flag is better set inside the if condition as it
does not need to
be set repeatedly in subsequent calls to the same function.
* Stash BEGIN data in plugin's
LogicalDecodingContext.output_plugin_private when plugin's begin
callback called, don't write anything to the outstream
* Write out BEGIN message lazily when any other callback generates a
message that does need to be written out
* If no BEGIN written by the time COMMIT callback called, discard the
COMMIT too. Check if sync rep enabled. if it is,
call LogicalDecodingContext.update_progress
from within the output plugin commit handler, otherwise just ignore
the commit totally. Probably by calling OutputPluginUpdateProgress().
I think the code in the patch is similar to what has been described by
Craig in the above snippet,
except instead of stashing the BEGIN message and sending the message
lazily, it simply maintains a flag
in LogicalDecodingContext.output_plugin_private which defers calling
output plugin's begin callback,
until any other callback actually generates a remote write.
Also, the patch does not contain the last part where he describes
having OutputPluginUpdateProgress()
for synchronous replication enabled transactions.
However, some basic testing suggests that the patch does not have any
notable adverse effect on
either the replication lag or the sync_rep performance.
I performed tests by setting up publisher and subscriber on the same
machine with synchronous_commit = on and
ran pgbench -c 12 -j 6 -T 300 on unpublished pgbench tables.
I see that confirmed_flush_lsn is catching up just fine without any
notable delay as compared to the test results without
the patch.
Also, the TPS for synchronous replication of empty txns with and without
the patch remains similar.
Having said that, these are initial findings and I understand better
performance tests are required to measure
reduction in consumption of network bandwidth and impact on synchronous
replication and replication lag.
Thank you,
Rahila Syed
On Wed, Jul 29, 2020 at 08:08:06PM +0530, Rahila Syed wrote:
The make check passes.
Since then, the patch is failing to apply, waiting on author and the
thread has died 6 weeks or so ago, so I am marking it as RwF in the
CF.
--
Michael
On Thu, Sep 17, 2020 at 3:29 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 29, 2020 at 08:08:06PM +0530, Rahila Syed wrote:
The make check passes.
Since then, the patch is failing to apply, waiting on author and the
thread has died 6 weeks or so ago, so I am marking it as RwF in the
CF.
I've rebased the patch and made changes so that the patch supports
"streaming in-progress transactions" and handling of logical decoding
messages (transactional and non-transactional).
I see that this patch not only makes sure that empty transactions are not
sent but also does call OutputPluginUpdateProgress when an empty
transaction is not sent, as a result the confirmed_flush_lsn is kept
moving. I also see no hangs when synchronous_standby is configured.
Do let me know your thoughts on this patch.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v2-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v2-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From 3763a3b454f319f561c8c8bac4eedd81488d8160 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Apr 2021 22:54:52 -0400
Subject: [PATCH v2] Skip empty transactions for logical replication.
The current logical replication behavior is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
Postpone the BEGIN message until the first change. While processing a
COMMIT message, if there is no other change for that
transaction, do not send COMMIT message. It means that pgoutput will
skip BEGIN / COMMIT messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
src/backend/replication/pgoutput/pgoutput.c | 45 +++++++++++++++++++++++++++++
src/include/replication/pgoutput.h | 3 ++
src/test/subscription/t/020_messages.pl | 5 ++--
3 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f68348d..64c76d1 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -345,10 +345,28 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, common scenarios is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were to
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->xact_wrote_changes = false;
+ elog(LOG,"Holding of begin");
+}
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ elog(LOG,"Sending begin");
if (send_replication_origin)
{
@@ -384,8 +402,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->xact_wrote_changes)
+ return;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -551,6 +575,13 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -693,6 +724,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -725,6 +763,13 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */
+ if (!data->xact_wrote_changes && !in_streaming && transactional)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 51e7c03..e820790 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ /* control wether messages can already be sent */
+ bool xact_wrote_changes;
+
/* client-supplied info: */
uint32 protocol_version;
List *publication_names;
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index c8be26b..2ea790f 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -78,9 +78,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is($result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot');
$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
--
1.8.3.1
On Thu, Apr 15, 2021 at 1:29 PM Ajin Cherian <itsajin@gmail.com> wrote:
I've rebased the patch and made changes so that the patch supports
"streaming in-progress transactions" and handling of logical decoding
messages (transactional and non-transactional).
I see that this patch not only makes sure that empty transactions are not
sent but also does call OutputPluginUpdateProgress when an empty
transaction is not sent, as a result the confirmed_flush_lsn is kept
moving. I also see no hangs when synchronous_standby is configured.
Do let me know your thoughts on this patch.
Removed some debug logs and typos.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v3-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v3-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From 07f17491ca2263d152c1651a9da93adbada0aeaf Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Apr 2021 22:54:52 -0400
Subject: [PATCH v3] Skip empty transactions for logical replication.
The current logical replication behavior is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
Postpone the BEGIN message until the first change. While processing a
COMMIT message, if there is no other change for that
transaction, do not send COMMIT message. It means that pgoutput will
skip BEGIN / COMMIT messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
src/backend/replication/pgoutput/pgoutput.c | 44 +++++++++++++++++++++++++++++
src/include/replication/pgoutput.h | 3 ++
src/test/subscription/t/020_messages.pl | 5 ++--
3 files changed, 49 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f68348d..0aa5729 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->xact_wrote_changes = false;
+ elog(LOG,"Holding of begin");
+}
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
OutputPluginPrepareWrite(ctx, !send_replication_origin);
@@ -384,8 +401,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->xact_wrote_changes)
+ return;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -551,6 +574,13 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -693,6 +723,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -725,6 +762,13 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */
+ if (!data->xact_wrote_changes && !in_streaming && transactional)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 51e7c03..acd43b3 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ /* flag indicating whether messages have previously been sent */
+ bool xact_wrote_changes;
+
/* client-supplied info: */
uint32 protocol_version;
List *publication_names;
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index c8be26b..2ea790f 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -78,9 +78,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is($result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot');
$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
--
1.8.3.1
On Thu, Apr 15, 2021 at 4:39 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Thu, Apr 15, 2021 at 1:29 PM Ajin Cherian <itsajin@gmail.com> wrote:
I've rebased the patch and made changes so that the patch supports "streaming in-progress transactions" and handling of logical decoding
messages (transactional and non-transactional).
I see that this patch not only makes sure that empty transactions are not sent but also does call OutputPluginUpdateProgress when an empty
transaction is not sent, as a result the confirmed_flush_lsn is kept moving. I also see no hangs when synchronous_standby is configured.
Do let me know your thoughts on this patch.
REVIEW COMMENTS
I applied this patch to today's HEAD and successfully ran "make check"
and also the subscription TAP tests.
Here are a some review comments:
------
1. The patch v3 applied OK but with whitespace warnings
[postgres@CentOS7-x64 oss_postgres_2PC]$ git apply
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98:
indent with spaces.
/* output BEGIN if we haven't yet, avoid for streaming and
non-transactional messages */
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99:
indent with spaces.
if (!data->xact_wrote_changes && !in_streaming && transactional)
warning: 2 lines add whitespace errors.
------
2. Please create a CF entry in [1]https://commitfest.postgresql.org/33/ for this patch.
------
3. Patch comment
The comment describes the problem and then suddenly just says
"Postpone the BEGIN message until the first change."
I suggest changing it to say more like... "(blank line) This patch
addresses the above problem by postponing the BEGIN message until the
first change."
------
4. pgoutput.h
Maybe for consistency with the context member, the comment for the new
member should be to the right instead of above it?
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ /* flag indicating whether messages have previously been sent */
+ bool xact_wrote_changes;
+
------
5. pgoutput.h
+ /* flag indicating whether messages have previously been sent */
"previously been sent" --> "already been sent" ??
------
6. pgoutput.h - misleading member name
Actually, now that I have read all the rest of the code and how this
member is used I feel that this name is very misleading. e.g. For
"streaming" case then you still are writing changes but are not
setting this member at all - therefore it does not always mean what it
says.
I feel a better name for this would be something like
"sent_begin_txn". Then if you have sent BEGIN it is true. If you
haven't sent BEGIN it is false. It eliminates all ambiguity naming it
this way instead.
(This makes my feedback #5 redundant because the comment will be a bit
different if you do this).
------
7. pgoutput.c - function pgoutput_begin_txn
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
I guess that you still needed to pass the txn because that is how the
API is documented, right?
But I am wondering if you ought to flag it as unused so you wont get
some BF machine giving warnings about it.
e.g. Syntax like this?
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) {
(void)txn;
...
------
8. pgoutput.c - function pgoutput_begin_txn
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->xact_wrote_changes = false;
+ elog(LOG,"Holding of begin");
+}
Why is this loglevel LOG? Looks like leftover debugging.
------
9. pgoutput.c - function pgoutput_commit_txn
@@ -384,8 +401,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->xact_wrote_changes)
+ return;
+
In the case where you decided to do nothing does it make sense that
you still called the function OutputPluginUpdateProgress(ctx); ?
I thought perhaps that your new check should come first so this call
would never happen.
------
10. pgoutput.c - variable declarations without casts
+ PGOutputData *data = ctx->output_plugin_private;
I noticed the new stack variable you declare have no casts.
This differs from the existing code which always looks like:
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
There are a couple of examples of this so please search new code to
find them all.
------
11. pgoutput.c - function pgoutput_change
@@ -551,6 +574,13 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.
------
12. pgoutput.c - pgoutput_truncate function
@@ -693,6 +723,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->xact_wrote_changes && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
(same comment as above)
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.
13. pgoutput.c - pgoutput_message
@@ -725,6 +762,13 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and
non-transactional messages */
+ if (!data->xact_wrote_changes && !in_streaming && transactional)
+ {
+ pgoutput_begin(ctx, txn);
+ data->xact_wrote_changes = true;
+ }
(same comment as above)
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.
------
14. Test Code.
I noticed that there is no test code specifically for seeing if empty
transactions get sent or not. Is it possible to write such a test or
is this traffic improvement only observable using the debugger?
------
[1]: https://commitfest.postgresql.org/33/
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Apr 19, 2021 at 6:22 PM Peter Smith <smithpb2250@gmail.com> wrote:
Here are a some review comments:
------
1. The patch v3 applied OK but with whitespace warnings
[postgres@CentOS7-x64 oss_postgres_2PC]$ git apply
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98:
indent with spaces.
/* output BEGIN if we haven't yet, avoid for streaming and
non-transactional messages */../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99:
indent with spaces.
if (!data->xact_wrote_changes && !in_streaming && transactional)
warning: 2 lines add whitespace errors.------
Fixed.
2. Please create a CF entry in [1] for this patch.
------
3. Patch comment
The comment describes the problem and then suddenly just says
"Postpone the BEGIN message until the first change."I suggest changing it to say more like... "(blank line) This patch
addresses the above problem by postponing the BEGIN message until the
first change."------
Updated.
4. pgoutput.h
Maybe for consistency with the context member, the comment for the new
member should be to the right instead of above it?@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */+ /* flag indicating whether messages have previously been sent */ + bool xact_wrote_changes; +------
5. pgoutput.h
+ /* flag indicating whether messages have previously been sent */
"previously been sent" --> "already been sent" ??
------
6. pgoutput.h - misleading member name
Actually, now that I have read all the rest of the code and how this
member is used I feel that this name is very misleading. e.g. For
"streaming" case then you still are writing changes but are not
setting this member at all - therefore it does not always mean what it
says.I feel a better name for this would be something like
"sent_begin_txn". Then if you have sent BEGIN it is true. If you
haven't sent BEGIN it is false. It eliminates all ambiguity naming it
this way instead.(This makes my feedback #5 redundant because the comment will be a bit
different if you do this).------
Fixed above comments.
7. pgoutput.c - function pgoutput_begin_txn
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{I guess that you still needed to pass the txn because that is how the
API is documented, right?But I am wondering if you ought to flag it as unused so you wont get
some BF machine giving warnings about it.e.g. Syntax like this?
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) {
(void)txn;
...
Updated.
------
8. pgoutput.c - function pgoutput_begin_txn
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputData *data = ctx->output_plugin_private; + + /* + * Don't send BEGIN message here. Instead, postpone it until the first + * change. In logical replication, a common scenario is to replicate a set + * of tables (instead of all tables) and transactions whose changes were on + * table(s) that are not published will produce empty transactions. These + * empty transactions will send BEGIN and COMMIT messages to subscribers, + * using bandwidth on something with little/no use for logical replication. + */ + data->xact_wrote_changes = false; + elog(LOG,"Holding of begin"); +}Why is this loglevel LOG? Looks like leftover debugging.
Removed.
------
9. pgoutput.c - function pgoutput_commit_txn
@@ -384,8 +401,14 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr commit_lsn) { + PGOutputData *data = ctx->output_plugin_private; + OutputPluginUpdateProgress(ctx);+ /* skip COMMIT message if nothing was sent */ + if (!data->xact_wrote_changes) + return; +In the case where you decided to do nothing does it make sense that
you still called the function OutputPluginUpdateProgress(ctx); ?
I thought perhaps that your new check should come first so this call
would never happen.
Even though the empty transaction is not sent, the LSN is tracked as
decoded, hence the progress needs to be updated.
------
10. pgoutput.c - variable declarations without casts
+ PGOutputData *data = ctx->output_plugin_private;
I noticed the new stack variable you declare have no casts.
This differs from the existing code which always looks like:
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;There are a couple of examples of this so please search new code to
find them all.-----
Fixed.
11. pgoutput.c - function pgoutput_change
@@ -551,6 +574,13 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}+ /* output BEGIN if we haven't yet */ + if (!data->xact_wrote_changes && !in_streaming) + { + pgoutput_begin(ctx, txn); + data->xact_wrote_changes = true; + }If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.------
Updated.
12. pgoutput.c - pgoutput_truncate function
@@ -693,6 +723,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,if (nrelids > 0) { + /* output BEGIN if we haven't yet */ + if (!data->xact_wrote_changes && !in_streaming) + { + pgoutput_begin(ctx, txn); + data->xact_wrote_changes = true; + }(same comment as above)
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.13. pgoutput.c - pgoutput_message
@@ -725,6 +762,13 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ + if (!data->xact_wrote_changes && !in_streaming && transactional) + { + pgoutput_begin(ctx, txn); + data->xact_wrote_changes = true; + }(same comment as above)
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.------
Fixed.
14. Test Code.
I noticed that there is no test code specifically for seeing if empty
transactions get sent or not. Is it possible to write such a test or
is this traffic improvement only observable using the debugger?
The 020_messages.pl actually has a test case for tracking empty messages
even though it is part of the messages test.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v4-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v4-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From e2ebbc83c09c11b2751e2dd3b03b57e7bb8aeae0 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 23 Apr 2021 00:39:07 -0400
Subject: [PATCH v4] Skip empty transactions for logical replication.
The current logical replication behavior is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN message
until the first change. While processing a COMMIT message,
if there is no other change for that transaction,
do not send COMMIT message. It means that pgoutput will
skip BEGIN / COMMIT messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
src/backend/replication/pgoutput/pgoutput.c | 43 +++++++++++++++++++++++++++++
src/include/replication/pgoutput.h | 3 ++
src/test/subscription/t/020_messages.pl | 5 ++--
3 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f68348d..f4a3576 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -345,10 +345,29 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+ (void)txn; /* keep compiler quiet */
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->sent_begin_txn = false;
+}
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ data->sent_begin_txn = true;
if (send_replication_origin)
{
@@ -384,8 +403,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->sent_begin_txn)
+ return;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -551,6 +576,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -693,6 +724,12 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -725,6 +762,12 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */
+ if (!data->sent_begin_txn && !in_streaming && transactional)
+ {
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 51e7c03..abd92bd 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ bool sent_begin_txn; /* flag indicating whether begin
+ * has already been sent */
+
/* client-supplied info: */
uint32 protocol_version;
List *publication_names;
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index c8be26b..2ea790f 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -78,9 +78,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is($result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot');
$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
--
1.8.3.1
An earlier comment from Anders:
We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.
I think the current Keep Alive messages already achieve this by
sending the current LSN as part of the Keep Alive messages.
/* construct the message... */
resetStringInfo(&output_message);
pq_sendbyte(&output_message, 'k');
pq_sendint64(&output_message, sentPtr); <=== Last sent WAL LSN
pq_sendint64(&output_message, GetCurrentTimestamp());
pq_sendbyte(&output_message, requestReply ? 1 : 0);
I'm not sure if anything more is required to keep empty transactions
updated as part of synchronous replicas. If my understanding on this
is not correct, let me know.
regards,
Ajin Cherian
Fujitsu Australia
On Fri, Apr 23, 2021 at 3:46 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Mon, Apr 19, 2021 at 6:22 PM Peter Smith <smithpb2250@gmail.com> wrote:
Here are a some review comments:
------
1. The patch v3 applied OK but with whitespace warnings
[postgres@CentOS7-x64 oss_postgres_2PC]$ git apply
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98:
indent with spaces.
/* output BEGIN if we haven't yet, avoid for streaming and
non-transactional messages */
../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99:
indent with spaces.
if (!data->xact_wrote_changes && !in_streaming && transactional)
warning: 2 lines add whitespace errors.------
Fixed.
2. Please create a CF entry in [1] for this patch.
------
3. Patch comment
The comment describes the problem and then suddenly just says
"Postpone the BEGIN message until the first change."I suggest changing it to say more like... "(blank line) This patch
addresses the above problem by postponing the BEGIN message until the
first change."------
Updated.
4. pgoutput.h
Maybe for consistency with the context member, the comment for the new
member should be to the right instead of above it?@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */+ /* flag indicating whether messages have previously been sent */ + bool xact_wrote_changes; +------
5. pgoutput.h
+ /* flag indicating whether messages have previously been sent */
"previously been sent" --> "already been sent" ??
------
6. pgoutput.h - misleading member name
Actually, now that I have read all the rest of the code and how this
member is used I feel that this name is very misleading. e.g. For
"streaming" case then you still are writing changes but are not
setting this member at all - therefore it does not always mean what it
says.I feel a better name for this would be something like
"sent_begin_txn". Then if you have sent BEGIN it is true. If you
haven't sent BEGIN it is false. It eliminates all ambiguity naming it
this way instead.(This makes my feedback #5 redundant because the comment will be a bit
different if you do this).------
Fixed above comments.
7. pgoutput.c - function pgoutput_begin_txn
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{I guess that you still needed to pass the txn because that is how the
API is documented, right?But I am wondering if you ought to flag it as unused so you wont get
some BF machine giving warnings about it.e.g. Syntax like this?
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) {
(void)txn;
...Updated.
------
8. pgoutput.c - function pgoutput_begin_txn
@@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputData *data = ctx->output_plugin_private; + + /* + * Don't send BEGIN message here. Instead, postpone it until the first + * change. In logical replication, a common scenario is to replicate a set + * of tables (instead of all tables) and transactions whose changes were on + * table(s) that are not published will produce empty transactions. These + * empty transactions will send BEGIN and COMMIT messages to subscribers, + * using bandwidth on something with little/no use for logical replication. + */ + data->xact_wrote_changes = false; + elog(LOG,"Holding of begin"); +}Why is this loglevel LOG? Looks like leftover debugging.
Removed.
------
9. pgoutput.c - function pgoutput_commit_txn
@@ -384,8 +401,14 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr commit_lsn) { + PGOutputData *data = ctx->output_plugin_private; + OutputPluginUpdateProgress(ctx);+ /* skip COMMIT message if nothing was sent */ + if (!data->xact_wrote_changes) + return; +In the case where you decided to do nothing does it make sense that
you still called the function OutputPluginUpdateProgress(ctx); ?
I thought perhaps that your new check should come first so this call
would never happen.Even though the empty transaction is not sent, the LSN is tracked as decoded, hence the progress needs to be updated.
------
10. pgoutput.c - variable declarations without casts
+ PGOutputData *data = ctx->output_plugin_private;
I noticed the new stack variable you declare have no casts.
This differs from the existing code which always looks like:
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;There are a couple of examples of this so please search new code to
find them all.-----
Fixed.
11. pgoutput.c - function pgoutput_change
@@ -551,6 +574,13 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}+ /* output BEGIN if we haven't yet */ + if (!data->xact_wrote_changes && !in_streaming) + { + pgoutput_begin(ctx, txn); + data->xact_wrote_changes = true; + }If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.------
Updated.
12. pgoutput.c - pgoutput_truncate function
@@ -693,6 +723,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,if (nrelids > 0) { + /* output BEGIN if we haven't yet */ + if (!data->xact_wrote_changes && !in_streaming) + { + pgoutput_begin(ctx, txn); + data->xact_wrote_changes = true; + }(same comment as above)
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.13. pgoutput.c - pgoutput_message
@@ -725,6 +762,13 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ + if (!data->xact_wrote_changes && !in_streaming && transactional) + { + pgoutput_begin(ctx, txn); + data->xact_wrote_changes = true; + }(same comment as above)
If the variable is renamed as previously suggested then the assignment
data->sent_BEGIN_txn = true; can be assigned in just 1 common place
INSIDE the pgoutput_begin function.------
Fixed.
14. Test Code.
I noticed that there is no test code specifically for seeing if empty
transactions get sent or not. Is it possible to write such a test or
is this traffic improvement only observable using the debugger?The 020_messages.pl actually has a test case for tracking empty messages even though it is part of the messages test.
regards,
Ajin Cherian
Fujitsu Australia
Thanks for addressing my v3 review comments above.
I tested the latest v4.
The v4 patch applied cleanly.
make check-world completed successfully.
So this patch v4 looks LGTM, apart from the following 2 nitpick comments:
======
1. Suggest to add a blank line after the (void)txn; ?
@@ -345,10 +345,29 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+ (void)txn; /* keep compiler quiet */
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
======
2. Unnecessary statement blocks?
AFAIK those { } are not the usual PG code-style when there is only one
statement, so suggest to remove them.
Appies to 3 places:
@@ -551,6 +576,12 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ }
@@ -693,6 +724,12 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ {
+ pgoutput_begin(ctx, txn);
+ }
@@ -725,6 +762,12 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and
non-transactional messages */
+ if (!data->sent_begin_txn && !in_streaming && transactional)
+ {
+ pgoutput_begin(ctx, txn);
+ }
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Apr 26, 2021 at 4:29 PM Peter Smith <smithpb2250@gmail.com> wrote:
The v4 patch applied cleanly.
make check-world completed successfully.
So this patch v4 looks LGTM, apart from the following 2 nitpick comments:
======
1. Suggest to add a blank line after the (void)txn; ?
@@ -345,10 +345,29 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + + (void)txn; /* keep compiler quiet */ + /* + * Don't send BEGIN message here. Instead, postpone it until the first
Fixed.
======
2. Unnecessary statement blocks?
AFAIK those { } are not the usual PG code-style when there is only one
statement, so suggest to remove them.Appies to 3 places:
@@ -551,6 +576,12 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}+ /* output BEGIN if we haven't yet */ + if (!data->sent_begin_txn && !in_streaming) + { + pgoutput_begin(ctx, txn); + }@@ -693,6 +724,12 @@ pgoutput_truncate(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,if (nrelids > 0) { + /* output BEGIN if we haven't yet */ + if (!data->sent_begin_txn && !in_streaming) + { + pgoutput_begin(ctx, txn); + }@@ -725,6 +762,12 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ + if (!data->sent_begin_txn && !in_streaming && transactional) + { + pgoutput_begin(ctx, txn); + }
Fixed.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v5-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v5-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From 11bc909ec45dac329c963ad722271788afbf331f Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 26 Apr 2021 23:39:38 -0400
Subject: [PATCH v5] Skip empty transactions for logical replication.
The current logical replication behavior is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN message
until the first change. While processing a COMMIT message,
if there is no other change for that transaction,
do not send COMMIT message. It means that pgoutput will
skip BEGIN / COMMIT messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
src/backend/replication/pgoutput/pgoutput.c | 38 +++++++++++++++++++++++++++++
src/include/replication/pgoutput.h | 3 +++
src/test/subscription/t/020_messages.pl | 5 ++--
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f68348d..666bd7f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -345,10 +345,30 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+ (void)txn; /* keep compiler quiet */
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->sent_begin_txn = false;
+}
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ data->sent_begin_txn = true;
if (send_replication_origin)
{
@@ -384,8 +404,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->sent_begin_txn)
+ return;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -551,6 +577,10 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ pgoutput_begin(ctx, txn);
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -693,6 +723,10 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ pgoutput_begin(ctx, txn);
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -725,6 +759,10 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */
+ if (!data->sent_begin_txn && !in_streaming && transactional)
+ pgoutput_begin(ctx, txn);
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 51e7c03..abd92bd 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ bool sent_begin_txn; /* flag indicating whether begin
+ * has already been sent */
+
/* client-supplied info: */
uint32 protocol_version;
List *publication_names;
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index c8be26b..2ea790f 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -78,9 +78,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is($result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot');
$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
--
1.8.3.1
On Tue, Apr 27, 2021 at 1:49 PM Ajin Cherian <itsajin@gmail.com> wrote:
Rebased the patch as it was no longer applying.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v6-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v6-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From bc4d6e0d6566051a87c5fb194609bf6ccfabd9df Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 25 May 2021 08:57:44 -0400
Subject: [PATCH v6] Skip empty transactions for logical replication.
The current logical replication behavior is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN message
until the first change. While processing a COMMIT message,
if there is no other change for that transaction,
do not send COMMIT message. It means that pgoutput will
skip BEGIN / COMMIT messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
src/backend/replication/pgoutput/pgoutput.c | 38 +++++++++++++++++++++++++++++
src/include/replication/pgoutput.h | 3 +++
src/test/subscription/t/020_messages.pl | 5 ++--
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index f68348d..666bd7f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -345,10 +345,30 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
+ (void)txn; /* keep compiler quiet */
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->sent_begin_txn = false;
+}
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ data->sent_begin_txn = true;
if (send_replication_origin)
{
@@ -384,8 +404,14 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (!data->sent_begin_txn)
+ return;
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -551,6 +577,10 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ pgoutput_begin(ctx, txn);
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -693,6 +723,10 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!data->sent_begin_txn && !in_streaming)
+ pgoutput_begin(ctx, txn);
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -725,6 +759,10 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */
+ if (!data->sent_begin_txn && !in_streaming && transactional)
+ pgoutput_begin(ctx, txn);
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 51e7c03..abd92bd 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,9 @@ typedef struct PGOutputData
MemoryContext context; /* private memory context for transient
* allocations */
+ bool sent_begin_txn; /* flag indicating whether begin
+ * has already been sent */
+
/* client-supplied info: */
uint32 protocol_version;
List *publication_names;
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index 52bd92d..2b43ae0 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -86,9 +86,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is( $result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot'
);
--
1.8.3.1
On Tue, May 25, 2021 at 6:36 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Tue, Apr 27, 2021 at 1:49 PM Ajin Cherian <itsajin@gmail.com> wrote:
Rebased the patch as it was no longer applying.
Thanks for the updated patch, few comments:
1) I'm not sure if we could add some tests for skip empty
transactions, if possible add a few tests.
2) We could add some debug level log messages for the transaction that
will be skipped.
3) You could keep this variable below the other bool variables in the structure:
+ bool sent_begin_txn; /* flag indicating whether begin
+
* has already been sent */
+
4) You can split the comments to multi-line as it exceeds 80 chars
+ /* output BEGIN if we haven't yet, avoid for streaming and
non-transactional messages */
+ if (!data->sent_begin_txn && !in_streaming && transactional)
+ pgoutput_begin(ctx, txn);
Regards,
Vignesh
On Thu, May 27, 2021 at 8:58 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks for the updated patch, few comments:
1) I'm not sure if we could add some tests for skip empty
transactions, if possible add a few tests.
Added a few tests for prepared transactions as well as the existing
test in 020_messages.pl also tests regular transactions.
2) We could add some debug level log messages for the transaction that
will be skipped.
Added.
3) You could keep this variable below the other bool variables in the structure: + bool sent_begin_txn; /* flag indicating whether begin + * has already been sent */ +
I've moved this variable around, so this comment no longer is valid.
4) You can split the comments to multi-line as it exceeds 80 chars + /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ + if (!data->sent_begin_txn && !in_streaming && transactional) + pgoutput_begin(ctx, txn);
Done.
I've had to rebase the patch after a recent commit by Amit Kapila of
supporting two-phase commits in pub-sub [1]/messages/by-id/CAHut+PueG6u3vwG8DU=JhJiWa2TwmZ=bDqPchZkBky7ykzA7MA@mail.gmail.com.
Also I've modified the patch to also skip replicating empty prepared
transactions. Do let me know if you have any comments.
regards,
Ajin Cherian
Fujitsu Australia
[1]: /messages/by-id/CAHut+PueG6u3vwG8DU=JhJiWa2TwmZ=bDqPchZkBky7ykzA7MA@mail.gmail.com
Attachments:
v7-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v7-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From be6e8c62c7484656e7824fd3bd19b9552e023c19 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 14 Jul 2021 08:19:07 -0400
Subject: [PATCH v7] Skip empty transactions for logical replication.
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN / BEGIN PREPARE message
until the first change. While processing a COMMIT message or a PREPARE message,
if there is no other change for that transaction,
do not send COMMIT message or PREPARE message. It means that pgoutput will
skip BEGIN / COMMIT or BEGIN PREPARE / PREPARE messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
contrib/test_decoding/test_decoding.c | 7 +-
doc/src/sgml/logicaldecoding.sgml | 12 +-
doc/src/sgml/protocol.sgml | 15 +++
src/backend/replication/logical/logical.c | 9 +-
src/backend/replication/logical/proto.c | 16 ++-
src/backend/replication/logical/reorderbuffer.c | 2 +-
src/backend/replication/logical/worker.c | 38 ++++--
src/backend/replication/pgoutput/pgoutput.c | 161 +++++++++++++++++++++++-
src/include/replication/logicalproto.h | 8 +-
src/include/replication/output_plugin.h | 4 +-
src/include/replication/reorderbuffer.h | 4 +-
src/test/subscription/t/020_messages.pl | 5 +-
src/test/subscription/t/021_twophase.pl | 46 ++++++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 289 insertions(+), 39 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e5cd84e..408dbfc 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -86,7 +86,9 @@ static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_lsn);
static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -390,7 +392,8 @@ pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
/* COMMIT PREPARED callback */
static void
pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
TestDecodingData *data = ctx->output_plugin_private;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 002efc8..123d2f1 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -884,11 +884,19 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> in which case
+ it can commit the transaction, otherwise, it can skip the commit. The
+ <parameter>gid</parameter> alone is not sufficient because the downstream
+ node can have a prepared transaction with the same identifier.
<programlisting>
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
</programlisting>
</para>
</sect3>
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e8cb78f..5e68dfb 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -7550,6 +7550,13 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ The end LSN of the prepare.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
The LSN of the commit prepared.
</para></listitem>
</varlistentry>
@@ -7564,6 +7571,14 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ Prepare timestamp of the transaction. The value is in number
+ of microseconds since PostgreSQL epoch (2000-01-01).
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
Commit timestamp of the transaction. The value is in number
of microseconds since PostgreSQL epoch (2000-01-01).
</para></listitem>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d61ef4c..67c762a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -63,7 +63,8 @@ static void begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn
static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn);
static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn, TimestampTz prepare_time);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -936,7 +937,8 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void
commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
LogicalDecodingContext *ctx = cache->private_data;
LogicalErrorCallbackState state;
@@ -972,7 +974,8 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"commit_prepared_cb")));
/* do the actual work: call callback */
- ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 13c8c3b..8f17007 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -206,7 +206,9 @@ logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
*/
void
logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
uint8 flags = 0;
@@ -222,8 +224,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, flags);
/* send fields */
+ pq_sendint64(out, prepare_end_lsn);
pq_sendint64(out, commit_lsn);
pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, prepare_time);
pq_sendint64(out, txn->xact_time.commit_time);
pq_sendint32(out, txn->xid);
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);
/* read fields */
+ prepare_data->prepare_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR,"prepare_end_lsn is not set in commit prepared message");
prepare_data->commit_lsn = pq_getmsgint64(in);
if (prepare_data->commit_lsn == InvalidXLogRecPtr)
elog(ERROR, "commit_lsn is not set in commit prepared message");
- prepare_data->end_lsn = pq_getmsgint64(in);
- if (prepare_data->end_lsn == InvalidXLogRecPtr)
- elog(ERROR, "end_lsn is not set in commit prepared message");
+ prepare_data->commit_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->commit_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "commit_end_lsn is not set in commit prepared message");
+ prepare_data->prepare_time = pq_getmsgint64(in);
prepare_data->commit_time = pq_getmsgint64(in);
prepare_data->xid = pq_getmsgint(in, 4);
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7378beb..5a707e2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2794,7 +2794,7 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->origin_lsn = origin_lsn;
if (is_commit)
- rb->commit_prepared(rb, txn, commit_lsn);
+ rb->commit_prepared(rb, txn, commit_lsn, prepare_end_lsn, prepare_time);
else
rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b9a7a7f..069dc31 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -966,27 +966,39 @@ apply_handle_commit_prepared(StringInfo s)
/* Compute GID for two_phase transactions. */
TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,
gid, sizeof(gid));
-
- /* There is no transaction when COMMIT PREPARED is called */
- begin_replication_step();
-
/*
- * Update origin state so we can restart streaming from correct position
- * in case of crash.
+ * It is possible that we haven't received the prepare because
+ * the transaction did not have any changes relevant to this
+ * subscription and was essentially an empty prepare. In which case,
+ * the walsender is optimized to drop the empty transaction and the
+ * accompanying prepare. Silently ignore if we don't find the prepared
+ * transaction.
*/
- replorigin_session_origin_lsn = prepare_data.end_lsn;
- replorigin_session_origin_timestamp = prepare_data.commit_time;
+ if (LookupGXact(gid, prepare_data.prepare_end_lsn,
+ prepare_data.prepare_time))
+ {
- FinishPreparedTransaction(gid, true);
- end_replication_step();
- CommitTransactionCommand();
+ /* There is no transaction when COMMIT PREPARED is called */
+ begin_replication_step();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.commit_end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.commit_time;
+
+ FinishPreparedTransaction(gid, true);
+ end_replication_step();
+ CommitTransactionCommand();
+ }
pgstat_report_stat(false);
- store_flush_position(prepare_data.end_lsn);
+ store_flush_position(prepare_data.commit_end_lsn);
in_remote_transaction = false;
/* Process any tables that are being synchronized in parallel. */
- process_syncing_tables(prepare_data.end_lsn);
+ process_syncing_tables(prepare_data.commit_end_lsn);
pgstat_report_activity(STATE_IDLE, NULL);
}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index e4314af..f7d808f 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -56,7 +56,9 @@ static void pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx,
static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
- ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -130,6 +132,11 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;
+typedef struct PGOutputTxnData
+{
+ bool sent_begin_txn; /* flag indicating whether begin has been sent */
+} PGOutputTxnData;
+
/* Map used to remember which relation schemas we sent. */
static HTAB *RelationSyncCache = NULL;
@@ -410,10 +417,32 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *data = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->sent_begin_txn = false;
+ txn->output_plugin_private = data;
+}
+
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(data);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ data->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -428,8 +457,22 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ bool skip;
+
+ Assert(data);
+ skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (skip)
+ {
+ elog(DEBUG1, "Skipping replication of an empty transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
@@ -441,10 +484,28 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
static void
pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ /*
+ * Don't send BEGIN PREPARE message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN PREPARE and COMMIT PREPARED messages
+ * to subscribers, using bandwidth on something with little/no use
+ * for logical replication.
+ */
+ pgoutput_begin_txn(ctx, txn);
+}
+
+static void
+pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(data);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin_prepare(ctx->out, txn);
+ data->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -459,8 +520,18 @@ static void
pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(data);
OutputPluginUpdateProgress(ctx);
+ /* skip PREPARE message if nothing was sent */
+ if (!data->sent_begin_txn)
+ {
+ elog(DEBUG1, "Skipping replication of an empty prepared transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
@@ -471,12 +542,33 @@ pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
*/
static void
pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * skip sending COMMIT PREPARED message if prepared transaction
+ * has not been sent.
+ */
+ if (data)
+ {
+ bool skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "Skipping replication of COMMIT PREPARED of an empty transaction");
+ return;
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
OutputPluginWrite(ctx, true);
}
@@ -489,8 +581,26 @@ pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_end_lsn,
TimestampTz prepare_time)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * skip sending ROLLBACK PREPARED message if prepared transaction
+ * has not been sent.
+ */
+ if (data)
+ {
+ bool skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "Skipping replication of ROLLBACK of an empty transaction");
+ return;
+ }
+ }
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_rollback_prepared(ctx->out, txn, prepare_end_lsn,
prepare_time);
@@ -639,11 +749,16 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+
if (!is_publishable_relation(relation))
return;
@@ -677,6 +792,15 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /* output BEGIN if we haven't yet */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -779,6 +903,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
int nrelations, Relation relations[], ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
int i;
@@ -786,6 +911,10 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Oid *relids;
TransactionId xid = InvalidTransactionId;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+
/* Remember the xid for the change in streaming mode. See pgoutput_change. */
if (in_streaming)
xid = change->txn->xid;
@@ -822,6 +951,15 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /* output BEGIN if we haven't yet */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -842,6 +980,7 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
const char *message)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata;
TransactionId xid = InvalidTransactionId;
if (!data->messages)
@@ -854,6 +993,22 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /*
+ * Output BEGIN if we haven't yet.
+ * Avoid for streaming and non-transactional messages
+ */
+ if (!in_streaming && transactional)
+ {
+ txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ if (!txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 63de90d..0be0a07 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -148,8 +148,10 @@ typedef struct LogicalRepPreparedTxnData
*/
typedef struct LogicalRepCommitPreparedTxnData
{
+ XLogRecPtr prepare_end_lsn;
XLogRecPtr commit_lsn;
- XLogRecPtr end_lsn;
+ XLogRecPtr commit_end_lsn;
+ TimestampTz prepare_time;
TimestampTz commit_time;
TransactionId xid;
char gid[GIDSIZE];
@@ -188,7 +190,9 @@ extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
extern void logicalrep_read_prepare(StringInfo in,
LogicalRepPreparedTxnData *prepare_data);
extern void logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
extern void logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *prepare_data);
extern void logicalrep_write_rollback_prepared(StringInfo out, ReorderBufferTXN *txn,
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 810495e..0d28306 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -128,7 +128,9 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
*/
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/*
* Called for ROLLBACK PREPARED.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff7..11e2e1e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -442,7 +442,9 @@ typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
/* commit prepared callback signature */
typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/* rollback prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index 0e218e0..3d246be 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -87,9 +87,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is( $result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot'
);
diff --git a/src/test/subscription/t/021_twophase.pl b/src/test/subscription/t/021_twophase.pl
index c6ada92..677ca50 100644
--- a/src/test/subscription/t/021_twophase.pl
+++ b/src/test/subscription/t/021_twophase.pl
@@ -6,7 +6,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 24;
+use Test::More tests => 25;
###############################
# Setup
@@ -318,10 +318,9 @@ $node_publisher->safe_psql('postgres', "
$node_publisher->wait_for_catchup($appname_copy);
-# Check that the transaction has been prepared on the subscriber, there will be 2
-# prepared transactions for the 2 subscriptions.
+# Check that the transaction has been prepared on the subscriber
$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;");
-is($result, qq(2), 'transaction is prepared on subscriber');
+is($result, qq(1), 'transaction is prepared on subscriber');
# Now commit the insert and verify that it IS replicated
$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'mygid';");
@@ -337,6 +336,45 @@ is($result, qq(2), 'replicated data in subscriber table');
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_copy;");
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_copy;");
+##############################
+# Test empty prepares
+##############################
+
+# create a table that is not part of the publication
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_nopub (a int PRIMARY KEY)");
+
+# disable the subscription so that we can peek at the slot
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE");
+
+# wait for the replication slot to become inactive in the publisher
+$node_publisher->poll_query_until('postgres',
+ "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1);
+
+# create a transaction with no changes relevant to the slot
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_nopub SELECT generate_series(1,10);
+ PREPARE TRANSACTION 'empty_transaction';
+ COMMIT PREPARED 'empty_transaction';");
+
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+ 'postgres', qq(
+ SELECT get_byte(data, 0)
+ FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+ 'proto_version', '1',
+ 'publication_names', 'tap_pub')
+));
+
+# the empty transaction should be skipped
+is($result, qq(),
+ 'empty transaction dropped on slot'
+);
+
+# enable the subscription to test cleanup
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
+
###############################
# check all the cleanup
###############################
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..75639ab 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1606,6 +1606,7 @@ PGMessageField
PGModuleMagicFunction
PGNoticeHooks
PGOutputData
+PGOutputTxnData
PGPROC
PGP_CFB
PGP_Context
--
1.8.3.1
On Wednesday, July 14, 2021 9:30 PM Ajin Cherian <itsajin@gmail.com> wrote:
I've had to rebase the patch after a recent commit by Amit Kapila of supporting
two-phase commits in pub-sub [1].
Also I've modified the patch to also skip replicating empty prepared
transactions. Do let me know if you have any comments.
Hi
I started to test this patch but will give you some really minor quick feedbacks.
(1) pg_logical_slot_get_binary_changes() params.
Technically, looks better to have proto_version 3 & two_phase option for the function
to test empty prepare ? I felt proto_version 1 doesn't support 2PC.
[1]: https://www.postgresql.org/docs/devel/protocol-logicalrep-message-formats.html
are available since protocol version 3." Then, if the test wants to skip empty *prepares*,
I suggest to update the proto_version and set two_phase 'on'.
+##############################
+# Test empty prepares
+##############################
...
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+ 'postgres', qq(
+ SELECT get_byte(data, 0)
+ FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+ 'proto_version', '1',
+ 'publication_names', 'tap_pub')
+));
(2) The following sentences may start with a lowercase letter.
There are other similar codes for this.
+ elog(DEBUG1, "Skipping replication of an empty transaction");
[1]: https://www.postgresql.org/docs/devel/protocol-logicalrep-message-formats.html
Best Regards,
Takamichi Osumi
Hi Ajin,
I have reviewed the v7 patch and given my feedback comments below.
Apply OK
Build OK
make check OK
TAP (subscriptions) make check OK
Build PG Docs (html) OK
Although I made lots of review comments below, the important point is
that none of them are functional - they are only minore re-wordings
and some code refactoring that I thought would make the code simpler
and/or easier to read. YMMV, so please feel free to disagree with any
of them.
//////////
1a. Commit Comment - wording
BEFORE
This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE message until the first change.
AFTER
This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE messages until the first change is encountered.
------
1b. Commit Comment - wording
BEFORE
While processing a COMMIT message or a PREPARE message, if there is no
other change for that transaction, do not send COMMIT message or
PREPARE message.
AFTER
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message.
------
2. doc/src/sgml/logicaldecoding.sgml - wording
@@ -884,11 +884,19 @@ typedef void (*LogicalDecodePrepareCB) (struct
LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has
been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> in which case
+ it can commit the transaction, otherwise, it can skip the commit. The
+ <parameter>gid</parameter> alone is not sufficient because the downstream
+ node can have a prepared transaction with the same identifier.
=>
(some minor rewording of the last part)
AFTER:
The parameters <parameter>prepare_end_lsn</parameter> and
<parameter>prepare_time</parameter> can be used to check if the plugin
has received this <command>PREPARE TRANSACTION</command> or not. If
yes, it can commit the transaction, otherwise, it can skip the commit.
The <parameter>gid</parameter> alone is not sufficient to determine
this because the downstream node may already have a prepared
transaction with the same identifier.
------
3. src/backend/replication/logical/proto.c - whitespace
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);
/* read fields */
+ prepare_data->prepare_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR,"prepare_end_lsn is not set in commit prepared message");
=>
There is missing space before the 2nd elog param.
------
4. src/backend/replication/logical/worker.c - comment typos
/*
- * Update origin state so we can restart streaming from correct position
- * in case of crash.
+ * It is possible that we haven't received the prepare because
+ * the transaction did not have any changes relevant to this
+ * subscription and was essentially an empty prepare. In which case,
+ * the walsender is optimized to drop the empty transaction and the
+ * accompanying prepare. Silently ignore if we don't find the prepared
+ * transaction.
*/
4a. =>
"and was essentially an empty prepare" --> "so was essentially an empty prepare"
4b. =>
"In which case" --> "In this case"
------
5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn
@@ -410,10 +417,32 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *data = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
+
+ /*
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
+ */
+ data->sent_begin_txn = false;
+ txn->output_plugin_private = data;
+}
=>
I felt that since this message postponement is now the new behaviour
of this function then probably this should all be a function level
comment instead of the comment being in the body of the function
------
6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin
+
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
=>
Even though it is kind of obvious, it is probably better to provide a
function comment here too
------
7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_commit_txn
@@ -428,8 +457,22 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ bool skip;
+
+ Assert(data);
+ skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
OutputPluginUpdateProgress(ctx);
+ /* skip COMMIT message if nothing was sent */
+ if (skip)
+ {
+ elog(DEBUG1, "Skipping replication of an empty transaction");
+ return;
+ }
+
7a. =>
I felt that the comment "skip COMMIT message if nothing was sent"
should be done at the point where you *decide* to skip or not. So you
could either move that comment to where the skip variable is assigned.
Or (my preference) leave the comment where it is but change the
variable name to be sent_begin = !data->sent_begin_txn;
------
Regardless I think the comment should be elaborated a bit to describe
the reason more.
7b. =>
BEFORE
/* skip COMMIT message if nothing was sent */
AFTER
/* If a BEGIN message was not yet sent, then it means there were no
relevant changes encountered, so we can skip the COMMIT message too.
*/
------
8. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_prepare_txn
@@ -441,10 +484,28 @@ pgoutput_commit_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
static void
pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ /*
+ * Don't send BEGIN PREPARE message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN PREPARE and COMMIT PREPARED messages
+ * to subscribers, using bandwidth on something with little/no use
+ * for logical replication.
+ */
+ pgoutput_begin_txn(ctx, txn);
+}
8a. =>
Like previously, I felt that this big comment should be at the
function level of pgoutput_begin_prepare_txn instead of in the body of
the function.
------
8b. =>
And then the body comment would be something simple like:
/* Delegate to assign the begin sent flag as false same as for the
BEGIN message. */
pgoutput_begin_txn(ctx, txn);
------
9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_prepare
+
+static void
+pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
=>
Probably this needs a function comment.
------
10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_prepare_txn
@@ -459,8 +520,18 @@ static void
pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(data);
OutputPluginUpdateProgress(ctx);
+ /* skip PREPARE message if nothing was sent */
+ if (!data->sent_begin_txn)
=>
Maybe elaborate on that "skip PREPARE message if nothing was sent"
comment in a way similar to my review comment 7b. For example,
AFTER
/* If the BEGIN was not yet sent, then it means there were no relevant
changes encountered, so we can skip the PREPARE message too. */
------
11. src/backend/replication/pgoutput/pgoutput.c - pgoutput_commit_prepared_txn
@@ -471,12 +542,33 @@ pgoutput_prepare_txn(LogicalDecodingContext
*ctx, ReorderBufferTXN *txn,
*/
static void
pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * skip sending COMMIT PREPARED message if prepared transaction
+ * has not been sent.
+ */
+ if (data)
=>
Similar to previous review comment 10, I think the reason for the skip
should be elaborated a little bit. For example,
AFTER
/* If the BEGIN PREPARE was not yet sent, then it means there were no
relevant changes encountered, so we can skip the COMMIT PREPARED
message too. */
------
12. src/backend/replication/pgoutput/pgoutput.c - pgoutput_rollback_prepared_txn
=> Similar as for pgoutput_comment_prepared_txn (see review comment 11)
------
13. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change
@@ -639,11 +749,16 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
+ /* If not streaming, should have setup txndata as part of
BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+
if (!is_publishable_relation(relation))
return;
13a. =>
I felt the streaming logic with the txndata is a bit confusing. I
think it would be easier to have another local variable (sent_begin)
and use it like this...
bool sent_begin;
if (in_streaming)
{
sent_begin = true;
else
{
PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
Assert(txndata)
sent_begin = txn->sent_begin_txn;
}
...
------
+ /* output BEGIN if we haven't yet */
13b. =>
I thought the comment is not quite right
AFTER
/* Output BEGIN / BEGIN PREPARE if we haven't yet */
------
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
13.c =>
If you introduce the variable (as suggested in 13a) this code becomes
much simpler:
AFTER
if (!sent_begin)
{
if (rbtxn_prepared(txn))
pgoutput_begin_prepare(ctx, txn)
else
pgoutput_begin(ctx, txn);
}
------
14. src/backend/replication/pgoutput/pgoutput.c - pgoutput_truncate
=>
All the similar review comments made for pg_change (13a, 13b, 13c)
apply to pgoutput_truncate here also.
------
15. src/backend/replication/pgoutput/pgoutput.c - pgoutput_message
@@ -842,6 +980,7 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
const char *message)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata;
TransactionId xid = InvalidTransactionId;
=>
This variable should be declared in the block where it is used,
similar to the suggestion 13a.
Also is it just an accidental omission that you did Assert(txndata)
for all the other places but not here?
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Mon, Jul 19, 2021 at 3:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
1a. Commit Comment - wording
updated.
1b. Commit Comment - wording
updated.
2. doc/src/sgml/logicaldecoding.sgml - wording
@@ -884,11 +884,19 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx, The required <function>commit_prepared_cb</function> callback is called whenever a transaction <command>COMMIT PREPARED</command> has been decoded. The <parameter>gid</parameter> field, which is part of the - <parameter>txn</parameter> parameter, can be used in this callback. + <parameter>txn</parameter> parameter, can be used in this callback. The + parameters <parameter>prepare_end_lsn</parameter> and + <parameter>prepare_time</parameter> can be used to check if the plugin + has received this <command>PREPARE TRANSACTION</command> in which case + it can commit the transaction, otherwise, it can skip the commit. The + <parameter>gid</parameter> alone is not sufficient because the downstream + node can have a prepared transaction with the same identifier.=>
(some minor rewording of the last part)
updated.
3. src/backend/replication/logical/proto.c - whitespace
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);/* read fields */ + prepare_data->prepare_end_lsn = pq_getmsgint64(in); + if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr) + elog(ERROR,"prepare_end_lsn is not set in commit prepared message");=>
There is missing space before the 2nd elog param.
fixed.
4a. =>
"and was essentially an empty prepare" --> "so was essentially an empty prepare"
4b. =>
"In which case" --> "In this case"
------
fixed.
I felt that since this message postponement is now the new behaviour
of this function then probably this should all be a function level
comment instead of the comment being in the body of the function------
6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin
+ +static void +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)=>
Even though it is kind of obvious, it is probably better to provide a
function comment here too------
Changed accordingly.
I felt that the comment "skip COMMIT message if nothing was sent"
should be done at the point where you *decide* to skip or not. So you
could either move that comment to where the skip variable is assigned.
Or (my preference) leave the comment where it is but change the
variable name to be sent_begin = !data->sent_begin_txn;
Updated the comment to where the skip variable is assigned.
------
Regardless I think the comment should be elaborated a bit to describe
the reason more.7b. =>
BEFORE
/* skip COMMIT message if nothing was sent */AFTER
/* If a BEGIN message was not yet sent, then it means there were no
relevant changes encountered, so we can skip the COMMIT message too.
*/
Updated accordingly.
------
Like previously, I felt that this big comment should be at the
function level of pgoutput_begin_prepare_txn instead of in the body of
the function.------
8b. =>
And then the body comment would be something simple like:
/* Delegate to assign the begin sent flag as false same as for the
BEGIN message. */
pgoutput_begin_txn(ctx, txn);
Updated accordingly.
------
9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_prepare
+ +static void +pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)=>
Probably this needs a function comment.
Updated.
------
10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_prepare_txn
@@ -459,8 +520,18 @@ static void pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr prepare_lsn) { + PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private; + + Assert(data); OutputPluginUpdateProgress(ctx);+ /* skip PREPARE message if nothing was sent */ + if (!data->sent_begin_txn)=>
Maybe elaborate on that "skip PREPARE message if nothing was sent"
comment in a way similar to my review comment 7b. For example,AFTER
/* If the BEGIN was not yet sent, then it means there were no relevant
changes encountered, so we can skip the PREPARE message too. */
Updated.
------
11. src/backend/replication/pgoutput/pgoutput.c - pgoutput_commit_prepared_txn
@@ -471,12 +542,33 @@ pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, */ static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, - XLogRecPtr commit_lsn) + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, + TimestampTz prepare_time) { + PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private; + OutputPluginUpdateProgress(ctx);+ /* + * skip sending COMMIT PREPARED message if prepared transaction + * has not been sent. + */ + if (data)=>
Similar to previous review comment 10, I think the reason for the skip
should be elaborated a little bit. For example,AFTER
/* If the BEGIN PREPARE was not yet sent, then it means there were no
relevant changes encountered, so we can skip the COMMIT PREPARED
message too. */------
Updated accordingly.
12. src/backend/replication/pgoutput/pgoutput.c - pgoutput_rollback_prepared_txn
=> Similar as for pgoutput_comment_prepared_txn (see review comment 11)
------
Updated,
13. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change
@@ -639,11 +749,16 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Relation relation, ReorderBufferChange *change) { PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; MemoryContext old; RelationSyncEntry *relentry; TransactionId xid = InvalidTransactionId; Relation ancestor = NULL;+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */ + if (!in_streaming) + Assert(txndata); + if (!is_publishable_relation(relation)) return;13a. =>
I felt the streaming logic with the txndata is a bit confusing. I
think it would be easier to have another local variable (sent_begin)
and use it like this...bool sent_begin;
if (in_streaming)
{
sent_begin = true;
else
{
PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
Assert(txndata)
sent_begin = txn->sent_begin_txn;
}
I did not make the change, because in case of streaming "Sent_begin"
is not true, so it seemed incorrect coding it
that way. Instead , I have modified the comment to mention that
streaming transaction do not send BEG / BEGIN PREPARE.
...
------
+ /* output BEGIN if we haven't yet */
13b. =>
I thought the comment is not quite right
AFTER
/* Output BEGIN / BEGIN PREPARE if we haven't yet */------
Updated.
+ if (!in_streaming && !txndata->sent_begin_txn) + { + if (rbtxn_prepared(txn)) + pgoutput_begin_prepare(ctx, txn); + else + pgoutput_begin(ctx, txn); + } +13.c =>
If you introduce the variable (as suggested in 13a) this code becomes
much simpler:
Skipped this. (reason mentioned above)
------
14. src/backend/replication/pgoutput/pgoutput.c - pgoutput_truncate
=>
All the similar review comments made for pg_change (13a, 13b, 13c)
apply to pgoutput_truncate here also.------
Updated.
15. src/backend/replication/pgoutput/pgoutput.c - pgoutput_message
@@ -842,6 +980,7 @@ pgoutput_message(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
const char *message)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata;
TransactionId xid = InvalidTransactionId;=>
This variable should be declared in the block where it is used,
similar to the suggestion 13a.Also is it just an accidental omission that you did Assert(txndata)
for all the other places but not here?
Moved location of the variable and added an assert.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v8-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v8-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From 7c0c403625bef87ef67b3930be7fd3171628cc3e Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 21 Jul 2021 06:29:57 -0400
Subject: [PATCH v8] Skip empty transactions for logical replication.
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE messages until the first change is encountered.
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE messages for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
contrib/test_decoding/test_decoding.c | 7 +-
doc/src/sgml/logicaldecoding.sgml | 13 +-
doc/src/sgml/protocol.sgml | 15 ++
src/backend/replication/logical/logical.c | 9 +-
src/backend/replication/logical/proto.c | 16 +-
src/backend/replication/logical/reorderbuffer.c | 2 +-
src/backend/replication/logical/worker.c | 38 +++--
src/backend/replication/pgoutput/pgoutput.c | 188 +++++++++++++++++++++++-
src/include/replication/logicalproto.h | 8 +-
src/include/replication/output_plugin.h | 4 +-
src/include/replication/reorderbuffer.h | 4 +-
src/test/subscription/t/020_messages.pl | 5 +-
src/test/subscription/t/021_twophase.pl | 46 +++++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 316 insertions(+), 40 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e5cd84e..408dbfc 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -86,7 +86,9 @@ static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_lsn);
static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -390,7 +392,8 @@ pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
/* COMMIT PREPARED callback */
static void
pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
TestDecodingData *data = ctx->output_plugin_private;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 89b8090..27811e5 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> command or not.
+ If yes, it can commit the transaction, otherwise, it can skip the commit.
+ The <parameter>gid</parameter> alone is not sufficient to determine this
+ because the downstream may already have a prepared transaction with the
+ same identifier.
<programlisting>
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
</programlisting>
</para>
</sect3>
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e8cb78f..5e68dfb 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -7550,6 +7550,13 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ The end LSN of the prepare.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
The LSN of the commit prepared.
</para></listitem>
</varlistentry>
@@ -7564,6 +7571,14 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ Prepare timestamp of the transaction. The value is in number
+ of microseconds since PostgreSQL epoch (2000-01-01).
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
Commit timestamp of the transaction. The value is in number
of microseconds since PostgreSQL epoch (2000-01-01).
</para></listitem>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d61ef4c..67c762a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -63,7 +63,8 @@ static void begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn
static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn);
static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn, TimestampTz prepare_time);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -936,7 +937,8 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void
commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
LogicalDecodingContext *ctx = cache->private_data;
LogicalErrorCallbackState state;
@@ -972,7 +974,8 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"commit_prepared_cb")));
/* do the actual work: call callback */
- ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index a245252..47a7489 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -206,7 +206,9 @@ logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
*/
void
logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
uint8 flags = 0;
@@ -222,8 +224,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, flags);
/* send fields */
+ pq_sendint64(out, prepare_end_lsn);
pq_sendint64(out, commit_lsn);
pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, prepare_time);
pq_sendint64(out, txn->xact_time.commit_time);
pq_sendint32(out, txn->xid);
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);
/* read fields */
+ prepare_data->prepare_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "prepare_end_lsn is not set in commit prepared message");
prepare_data->commit_lsn = pq_getmsgint64(in);
if (prepare_data->commit_lsn == InvalidXLogRecPtr)
elog(ERROR, "commit_lsn is not set in commit prepared message");
- prepare_data->end_lsn = pq_getmsgint64(in);
- if (prepare_data->end_lsn == InvalidXLogRecPtr)
- elog(ERROR, "end_lsn is not set in commit prepared message");
+ prepare_data->commit_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->commit_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "commit_end_lsn is not set in commit prepared message");
+ prepare_data->prepare_time = pq_getmsgint64(in);
prepare_data->commit_time = pq_getmsgint64(in);
prepare_data->xid = pq_getmsgint(in, 4);
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7378beb..5a707e2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2794,7 +2794,7 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->origin_lsn = origin_lsn;
if (is_commit)
- rb->commit_prepared(rb, txn, commit_lsn);
+ rb->commit_prepared(rb, txn, commit_lsn, prepare_end_lsn, prepare_time);
else
rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b9a7a7f..63e19bc 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -966,27 +966,39 @@ apply_handle_commit_prepared(StringInfo s)
/* Compute GID for two_phase transactions. */
TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,
gid, sizeof(gid));
-
- /* There is no transaction when COMMIT PREPARED is called */
- begin_replication_step();
-
/*
- * Update origin state so we can restart streaming from correct position
- * in case of crash.
+ * It is possible that we haven't received the prepare because
+ * the transaction did not have any changes relevant to this
+ * subscription and so was essentially an empty prepare. In this case,
+ * the walsender is optimized to drop the empty transaction and the
+ * accompanying prepare. Silently ignore if we don't find the prepared
+ * transaction.
*/
- replorigin_session_origin_lsn = prepare_data.end_lsn;
- replorigin_session_origin_timestamp = prepare_data.commit_time;
+ if (LookupGXact(gid, prepare_data.prepare_end_lsn,
+ prepare_data.prepare_time))
+ {
- FinishPreparedTransaction(gid, true);
- end_replication_step();
- CommitTransactionCommand();
+ /* There is no transaction when COMMIT PREPARED is called */
+ begin_replication_step();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.commit_end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.commit_time;
+
+ FinishPreparedTransaction(gid, true);
+ end_replication_step();
+ CommitTransactionCommand();
+ }
pgstat_report_stat(false);
- store_flush_position(prepare_data.end_lsn);
+ store_flush_position(prepare_data.commit_end_lsn);
in_remote_transaction = false;
/* Process any tables that are being synchronized in parallel. */
- process_syncing_tables(prepare_data.end_lsn);
+ process_syncing_tables(prepare_data.commit_end_lsn);
pgstat_report_activity(STATE_IDLE, NULL);
}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index e4314af..d82db45 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -56,7 +56,9 @@ static void pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx,
static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
- ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -130,6 +132,11 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;
+typedef struct PGOutputTxnData
+{
+ bool sent_begin_txn; /* flag indicating whether begin has been sent */
+} PGOutputTxnData;
+
/* Map used to remember which relation schemas we sent. */
static HTAB *RelationSyncCache = NULL;
@@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
/*
* BEGIN callback
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
*/
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *data = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
+
+ data->sent_begin_txn = false;
+ txn->output_plugin_private = data;
+}
+
+/*
+ * Send BEGIN.
+ * This is where the BEGIN is actually sent. This is called
+ * while processing the first change of the transaction.
+ */
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(data);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ data->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -428,23 +459,66 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ bool skip;
+
+ Assert(data);
+
+ /*
+ * If a BEGIN message was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the COMMIT message too.
+ */
+ skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
OutputPluginUpdateProgress(ctx);
+ if (skip)
+ {
+ elog(DEBUG1, "skipping replication of an empty transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
}
/*
- * BEGIN PREPARE callback
+ * BEGIN PREPARE callback.
+ * Don't send BEGIN PREPARE message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN PREPARE and COMMIT PREPARED messages
+ * to subscribers, using bandwidth on something with little/no use
+ * for logical replication.
*/
static void
pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ /*
+ * Delegate to assign the begin sent flag as false same as for the
+ * BEGIN message.
+ */
+ pgoutput_begin_txn(ctx, txn);
+}
+
+/*
+ * Send BEGIN PREPARE.
+ * This is where the BEGIN PREPARE is actually sent. This is called while
+ * processing the first change of the prepared transaction.
+ */
+static void
+pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(data);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin_prepare(ctx->out, txn);
+ data->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -459,8 +533,21 @@ static void
pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(data);
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the PREPARE message too.
+ */
+ if (!data->sent_begin_txn)
+ {
+ elog(DEBUG1, "skipping replication of an empty prepared transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
@@ -471,12 +558,34 @@ pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
*/
static void
pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the COMMIT PREPARED
+ * messsage too.
+ */
+ if (data)
+ {
+ bool skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of COMMIT PREPARED of an empty transaction");
+ return;
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
OutputPluginWrite(ctx, true);
}
@@ -489,8 +598,27 @@ pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_end_lsn,
TimestampTz prepare_time)
{
+ PGOutputTxnData *data = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the ROLLBACK PREPARED
+ * messsage too.
+ */
+ if (data)
+ {
+ bool skip = !data->sent_begin_txn;
+ pfree(data);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of ROLLBACK of an empty transaction");
+ return;
+ }
+ }
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_rollback_prepared(ctx->out, txn, prepare_end_lsn,
prepare_time);
@@ -639,11 +767,16 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+
if (!is_publishable_relation(relation))
return;
@@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -779,6 +924,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
int nrelations, Relation relations[], ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
int i;
@@ -786,6 +932,10 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Oid *relids;
TransactionId xid = InvalidTransactionId;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+
/* Remember the xid for the change in streaming mode. See pgoutput_change. */
if (in_streaming)
xid = change->txn->xid;
@@ -822,6 +972,18 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -854,6 +1016,24 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /*
+ * Output BEGIN if we haven't yet.
+ * Avoid for streaming and non-transactional messages
+ */
+ if (!in_streaming && transactional)
+ {
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
+ if (!txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 63de90d..0be0a07 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -148,8 +148,10 @@ typedef struct LogicalRepPreparedTxnData
*/
typedef struct LogicalRepCommitPreparedTxnData
{
+ XLogRecPtr prepare_end_lsn;
XLogRecPtr commit_lsn;
- XLogRecPtr end_lsn;
+ XLogRecPtr commit_end_lsn;
+ TimestampTz prepare_time;
TimestampTz commit_time;
TransactionId xid;
char gid[GIDSIZE];
@@ -188,7 +190,9 @@ extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
extern void logicalrep_read_prepare(StringInfo in,
LogicalRepPreparedTxnData *prepare_data);
extern void logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
extern void logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *prepare_data);
extern void logicalrep_write_rollback_prepared(StringInfo out, ReorderBufferTXN *txn,
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 810495e..0d28306 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -128,7 +128,9 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
*/
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/*
* Called for ROLLBACK PREPARED.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff7..11e2e1e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -442,7 +442,9 @@ typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
/* commit prepared callback signature */
typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/* rollback prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index 0e218e0..3d246be 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -87,9 +87,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is( $result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot'
);
diff --git a/src/test/subscription/t/021_twophase.pl b/src/test/subscription/t/021_twophase.pl
index c6ada92..b954630 100644
--- a/src/test/subscription/t/021_twophase.pl
+++ b/src/test/subscription/t/021_twophase.pl
@@ -6,7 +6,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 24;
+use Test::More tests => 25;
###############################
# Setup
@@ -318,10 +318,9 @@ $node_publisher->safe_psql('postgres', "
$node_publisher->wait_for_catchup($appname_copy);
-# Check that the transaction has been prepared on the subscriber, there will be 2
-# prepared transactions for the 2 subscriptions.
+# Check that the transaction has been prepared on the subscriber
$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;");
-is($result, qq(2), 'transaction is prepared on subscriber');
+is($result, qq(1), 'transaction is prepared on subscriber');
# Now commit the insert and verify that it IS replicated
$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'mygid';");
@@ -337,6 +336,45 @@ is($result, qq(2), 'replicated data in subscriber table');
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_copy;");
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_copy;");
+##############################
+# Test empty prepares
+##############################
+
+# create a table that is not part of the publication
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_nopub (a int PRIMARY KEY)");
+
+# disable the subscription so that we can peek at the slot
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE");
+
+# wait for the replication slot to become inactive in the publisher
+$node_publisher->poll_query_until('postgres',
+ "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1);
+
+# create a transaction with no changes relevant to the slot
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_nopub SELECT generate_series(1,10);
+ PREPARE TRANSACTION 'empty_transaction';
+ COMMIT PREPARED 'empty_transaction';");
+
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+ 'postgres', qq(
+ SELECT get_byte(data, 0)
+ FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+ 'proto_version', '3',
+ 'publication_names', 'tap_pub')
+));
+
+# the empty transaction should be skipped
+is($result, qq(),
+ 'empty transaction dropped on slot'
+);
+
+# enable the subscription to test cleanup
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
+
###############################
# check all the cleanup
###############################
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..75639ab 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1606,6 +1606,7 @@ PGMessageField
PGModuleMagicFunction
PGNoticeHooks
PGOutputData
+PGOutputTxnData
PGPROC
PGP_CFB
PGP_Context
--
1.8.3.1
On Thu, Jul 15, 2021 at 3:50 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
I started to test this patch but will give you some really minor quick feedbacks.
(1) pg_logical_slot_get_binary_changes() params.
Technically, looks better to have proto_version 3 & two_phase option for the function
to test empty prepare ? I felt proto_version 1 doesn't support 2PC.
[1] says "The following messages (Begin Prepare, Prepare, Commit Prepared, Rollback Prepared)
are available since protocol version 3." Then, if the test wants to skip empty *prepares*,
I suggest to update the proto_version and set two_phase 'on'.
Updated accordingly.
(2) The following sentences may start with a lowercase letter.
There are other similar codes for this.+ elog(DEBUG1, "Skipping replication of an empty transaction");
Fixed this.
I've addressed these comments in version 8 of the patch.
regards,
Ajin Cherian
Fujitsu Australia
Hi Ajin.
I have reviewed the v8 patch and my feedback comments are below:
//////////
1. Apply v8 gave multiple whitespace warnings.
------
2. Commit comment - wording
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE messages for transactions that are empty.
=>
Shouldn't this also mention some other messages that may be skipped?
- COMMIT PREPARED
- ROLLBACK PREPARED
------
3. doc/src/sgml/logicaldecoding.sgml - wording
@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct
LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has
been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> command or not.
+ If yes, it can commit the transaction, otherwise, it can skip the commit.
+ The <parameter>gid</parameter> alone is not sufficient to determine this
+ because the downstream may already have a prepared transaction with the
+ same identifier.
=>
Typo: Should that say "downstream node" instead of just "downstream" ?
------
4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn
callback comment
@@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
/*
* BEGIN callback
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
=>
Typo: "BEGIN callback" --> "BEGIN callback." (with the period).
And, I think maybe it will be better if it has a separating blank line too.
e.g.
/*
* BEGIN callback.
*
* Don't send BEGIN ....
(NOTE: this review comment applies to other callback function comments
too, so please hunt them all down)
------
5. src/backend/replication/pgoutput/pgoutput.c - data / txndata
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *data = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
=>
There is some inconsistent naming of the local variable in the patch.
Sometimes it is called "data"; Sometimes it is called "txdata" etc. It
would be better to just stick with the same variable name everywhere.
(NOTE: this comment applies to several places in this patch)
------
6. src/backend/replication/pgoutput/pgoutput.c - Strange way to use Assert
+ /* If not streaming, should have setup txndata as part of
BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+
=>
This style of Assert code seemed strange to me. In production mode
isn't that going to evaluate to some condition with a ((void) true)
body? IMO it might be better to just include the streaming check as
part of the Assert. For example:
BEFORE
if (!in_streaming)
Assert(txndata);
AFTER
Assert(in_streaming || txndata);
(NOTE: This same review comment applies in at least 3 places in this
patch, so please hunt them all down)
------
7. src/backend/replication/pgoutput/pgoutput.c - comment wording
@@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}
+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
=>
English not really that comment is. The comment should also start with
uppercase.
(NOTE: This same comment was in couple of places in the patch)
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Jul 22, 2021 at 6:11 PM Peter Smith <smithpb2250@gmail.com> wrote:
Hi Ajin.
I have reviewed the v8 patch and my feedback comments are below:
//////////
1. Apply v8 gave multiple whitespace warnings.
------
2. Commit comment - wording
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE messages for transactions that are empty.=>
Shouldn't this also mention some other messages that may be skipped?
- COMMIT PREPARED
- ROLLBACK PREPARED
Updated.
------
3. doc/src/sgml/logicaldecoding.sgml - wording
@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx, The required <function>commit_prepared_cb</function> callback is called whenever a transaction <command>COMMIT PREPARED</command> has been decoded. The <parameter>gid</parameter> field, which is part of the - <parameter>txn</parameter> parameter, can be used in this callback. + <parameter>txn</parameter> parameter, can be used in this callback. The + parameters <parameter>prepare_end_lsn</parameter> and + <parameter>prepare_time</parameter> can be used to check if the plugin + has received this <command>PREPARE TRANSACTION</command> command or not. + If yes, it can commit the transaction, otherwise, it can skip the commit. + The <parameter>gid</parameter> alone is not sufficient to determine this + because the downstream may already have a prepared transaction with the + same identifier.=>
Typo: Should that say "downstream node" instead of just "downstream" ?
------
Updated.
4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn
callback comment@@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,/* * BEGIN callback + * Don't send BEGIN message here. Instead, postpone it until the first + * change. In logical replication, a common scenario is to replicate a set + * of tables (instead of all tables) and transactions whose changes were on=>
Typo: "BEGIN callback" --> "BEGIN callback." (with the period).
And, I think maybe it will be better if it has a separating blank line too.
e.g.
/*
* BEGIN callback.
*
* Don't send BEGIN ....(NOTE: this review comment applies to other callback function comments
too, so please hunt them all down)------
Updated.
5. src/backend/replication/pgoutput/pgoutput.c - data / txndata
static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputTxnData *data = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData));=>
There is some inconsistent naming of the local variable in the patch.
Sometimes it is called "data"; Sometimes it is called "txdata" etc. It
would be better to just stick with the same variable name everywhere.(NOTE: this comment applies to several places in this patch)
------
I've changed all occurance of PGOutputTxnData to txndata. Note that
there is another structure PGOutputData which still uses the name
data.
6. src/backend/replication/pgoutput/pgoutput.c - Strange way to use Assert
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */ + if (!in_streaming) + Assert(txndata); +=>
This style of Assert code seemed strange to me. In production mode
isn't that going to evaluate to some condition with a ((void) true)
body? IMO it might be better to just include the streaming check as
part of the Assert. For example:BEFORE
if (!in_streaming)
Assert(txndata);AFTER
Assert(in_streaming || txndata);(NOTE: This same review comment applies in at least 3 places in this
patch, so please hunt them all down)
Updated.
------
7. src/backend/replication/pgoutput/pgoutput.c - comment wording
@@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
Assert(false);
}+ /* + * output BEGIN / BEGIN PREPARE if we haven't yet, + * while streaming no need to send BEGIN / BEGIN PREPARE. + */ + if (!in_streaming && !txndata->sent_begin_txn)=>
English not really that comment is. The comment should also start with
uppercase.(NOTE: This same comment was in couple of places in the patch)
Updated.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v9-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v9-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From a9ae97394096b1de31cebd6de0b504619e5a7b34 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 21 Jul 2021 06:29:57 -0400
Subject: [PATCH v9] Skip empty transactions for logical replication.
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction is empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE messages until the first change is encountered.
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE messages for transactions that are empty.
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions which were skipped.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
contrib/test_decoding/test_decoding.c | 7 +-
doc/src/sgml/logicaldecoding.sgml | 13 +-
doc/src/sgml/protocol.sgml | 15 ++
src/backend/replication/logical/logical.c | 9 +-
src/backend/replication/logical/proto.c | 16 +-
src/backend/replication/logical/reorderbuffer.c | 2 +-
src/backend/replication/logical/worker.c | 38 +++--
src/backend/replication/pgoutput/pgoutput.c | 189 +++++++++++++++++++++++-
src/include/replication/logicalproto.h | 8 +-
src/include/replication/output_plugin.h | 4 +-
src/include/replication/reorderbuffer.h | 4 +-
src/test/subscription/t/020_messages.pl | 5 +-
src/test/subscription/t/021_twophase.pl | 46 +++++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 316 insertions(+), 41 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e5cd84e..408dbfc 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -86,7 +86,9 @@ static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_lsn);
static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -390,7 +392,8 @@ pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
/* COMMIT PREPARED callback */
static void
pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
TestDecodingData *data = ctx->output_plugin_private;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 89b8090..beb09ce 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> command or not.
+ If yes, it can commit the transaction, otherwise, it can skip the commit.
+ The <parameter>gid</parameter> alone is not sufficient to determine this
+ because the downstream node may already have a prepared transaction with the
+ same identifier.
<programlisting>
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
</programlisting>
</para>
</sect3>
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e8cb78f..5e68dfb 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -7550,6 +7550,13 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ The end LSN of the prepare.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
The LSN of the commit prepared.
</para></listitem>
</varlistentry>
@@ -7564,6 +7571,14 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ Prepare timestamp of the transaction. The value is in number
+ of microseconds since PostgreSQL epoch (2000-01-01).
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
Commit timestamp of the transaction. The value is in number
of microseconds since PostgreSQL epoch (2000-01-01).
</para></listitem>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d61ef4c..67c762a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -63,7 +63,8 @@ static void begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn
static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn);
static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn, TimestampTz prepare_time);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -936,7 +937,8 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void
commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
LogicalDecodingContext *ctx = cache->private_data;
LogicalErrorCallbackState state;
@@ -972,7 +974,8 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"commit_prepared_cb")));
/* do the actual work: call callback */
- ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index a245252..47a7489 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -206,7 +206,9 @@ logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
*/
void
logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
uint8 flags = 0;
@@ -222,8 +224,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, flags);
/* send fields */
+ pq_sendint64(out, prepare_end_lsn);
pq_sendint64(out, commit_lsn);
pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, prepare_time);
pq_sendint64(out, txn->xact_time.commit_time);
pq_sendint32(out, txn->xid);
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);
/* read fields */
+ prepare_data->prepare_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "prepare_end_lsn is not set in commit prepared message");
prepare_data->commit_lsn = pq_getmsgint64(in);
if (prepare_data->commit_lsn == InvalidXLogRecPtr)
elog(ERROR, "commit_lsn is not set in commit prepared message");
- prepare_data->end_lsn = pq_getmsgint64(in);
- if (prepare_data->end_lsn == InvalidXLogRecPtr)
- elog(ERROR, "end_lsn is not set in commit prepared message");
+ prepare_data->commit_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->commit_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "commit_end_lsn is not set in commit prepared message");
+ prepare_data->prepare_time = pq_getmsgint64(in);
prepare_data->commit_time = pq_getmsgint64(in);
prepare_data->xid = pq_getmsgint(in, 4);
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7378beb..5a707e2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2794,7 +2794,7 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->origin_lsn = origin_lsn;
if (is_commit)
- rb->commit_prepared(rb, txn, commit_lsn);
+ rb->commit_prepared(rb, txn, commit_lsn, prepare_end_lsn, prepare_time);
else
rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b9a7a7f..63e19bc 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -966,27 +966,39 @@ apply_handle_commit_prepared(StringInfo s)
/* Compute GID for two_phase transactions. */
TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,
gid, sizeof(gid));
-
- /* There is no transaction when COMMIT PREPARED is called */
- begin_replication_step();
-
/*
- * Update origin state so we can restart streaming from correct position
- * in case of crash.
+ * It is possible that we haven't received the prepare because
+ * the transaction did not have any changes relevant to this
+ * subscription and so was essentially an empty prepare. In this case,
+ * the walsender is optimized to drop the empty transaction and the
+ * accompanying prepare. Silently ignore if we don't find the prepared
+ * transaction.
*/
- replorigin_session_origin_lsn = prepare_data.end_lsn;
- replorigin_session_origin_timestamp = prepare_data.commit_time;
+ if (LookupGXact(gid, prepare_data.prepare_end_lsn,
+ prepare_data.prepare_time))
+ {
- FinishPreparedTransaction(gid, true);
- end_replication_step();
- CommitTransactionCommand();
+ /* There is no transaction when COMMIT PREPARED is called */
+ begin_replication_step();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.commit_end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.commit_time;
+
+ FinishPreparedTransaction(gid, true);
+ end_replication_step();
+ CommitTransactionCommand();
+ }
pgstat_report_stat(false);
- store_flush_position(prepare_data.end_lsn);
+ store_flush_position(prepare_data.commit_end_lsn);
in_remote_transaction = false;
/* Process any tables that are being synchronized in parallel. */
- process_syncing_tables(prepare_data.end_lsn);
+ process_syncing_tables(prepare_data.commit_end_lsn);
pgstat_report_activity(STATE_IDLE, NULL);
}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index e4314af..66496b0 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -56,7 +56,9 @@ static void pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx,
static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
- ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -130,6 +132,11 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;
+typedef struct PGOutputTxnData
+{
+ bool sent_begin_txn; /* flag indicating whether begin has been sent */
+} PGOutputTxnData;
+
/* Map used to remember which relation schemas we sent. */
static HTAB *RelationSyncCache = NULL;
@@ -405,15 +412,40 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
/*
- * BEGIN callback
+ * BEGIN callback.
+ *
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
*/
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *txndata = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
+
+ txndata->sent_begin_txn = false;
+ txn->output_plugin_private = txndata;
+}
+
+/*
+ * Send BEGIN.
+ * This is where the BEGIN is actually sent. This is called
+ * while processing the first change of the transaction.
+ */
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(txndata);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ txndata->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -428,23 +460,67 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ bool skip;
+
+ Assert(txndata);
+
+ /*
+ * If a BEGIN message was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the COMMIT message too.
+ */
+ skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
OutputPluginUpdateProgress(ctx);
+ if (skip)
+ {
+ elog(DEBUG1, "skipping replication of an empty transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
}
/*
- * BEGIN PREPARE callback
+ * BEGIN PREPARE callback.
+ *
+ * Don't send BEGIN PREPARE message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN PREPARE and COMMIT PREPARED messages
+ * to subscribers, using bandwidth on something with little/no use
+ * for logical replication.
*/
static void
pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ /*
+ * Delegate to assign the begin sent flag as false same as for the
+ * BEGIN message.
+ */
+ pgoutput_begin_txn(ctx, txn);
+}
+
+/*
+ * Send BEGIN PREPARE.
+ * This is where the BEGIN PREPARE is actually sent. This is called while
+ * processing the first change of the prepared transaction.
+ */
+static void
+pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(txndata);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin_prepare(ctx->out, txn);
+ txndata->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -459,8 +535,21 @@ static void
pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the PREPARE message too.
+ */
+ if (!txndata->sent_begin_txn)
+ {
+ elog(DEBUG1, "skipping replication of an empty prepared transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
@@ -471,12 +560,34 @@ pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
*/
static void
pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the COMMIT PREPARED
+ * messsage too.
+ */
+ if (txndata)
+ {
+ bool skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of COMMIT PREPARED of an empty transaction");
+ return;
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
OutputPluginWrite(ctx, true);
}
@@ -489,8 +600,27 @@ pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_end_lsn,
TimestampTz prepare_time)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the ROLLBACK PREPARED
+ * messsage too.
+ */
+ if (txndata)
+ {
+ bool skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of ROLLBACK of an empty transaction");
+ return;
+ }
+ }
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_rollback_prepared(ctx->out, txn, prepare_end_lsn,
prepare_time);
@@ -639,11 +769,15 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ Assert(in_streaming || txndata);
+
if (!is_publishable_relation(relation))
return;
@@ -677,6 +811,17 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /*
+ * Output BEGIN / BEGIN PREPARE if we haven't yet, unless streaming.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -779,6 +924,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
int nrelations, Relation relations[], ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
int i;
@@ -786,6 +932,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Oid *relids;
TransactionId xid = InvalidTransactionId;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ Assert(in_streaming || txndata);
+
/* Remember the xid for the change in streaming mode. See pgoutput_change. */
if (in_streaming)
xid = change->txn->xid;
@@ -822,6 +971,18 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -854,6 +1015,24 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /*
+ * Output BEGIN if we haven't yet.
+ * Avoid for streaming and non-transactional messages
+ */
+ if (!in_streaming && transactional)
+ {
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
+ if (!txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 63de90d..0be0a07 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -148,8 +148,10 @@ typedef struct LogicalRepPreparedTxnData
*/
typedef struct LogicalRepCommitPreparedTxnData
{
+ XLogRecPtr prepare_end_lsn;
XLogRecPtr commit_lsn;
- XLogRecPtr end_lsn;
+ XLogRecPtr commit_end_lsn;
+ TimestampTz prepare_time;
TimestampTz commit_time;
TransactionId xid;
char gid[GIDSIZE];
@@ -188,7 +190,9 @@ extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
extern void logicalrep_read_prepare(StringInfo in,
LogicalRepPreparedTxnData *prepare_data);
extern void logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
extern void logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *prepare_data);
extern void logicalrep_write_rollback_prepared(StringInfo out, ReorderBufferTXN *txn,
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 810495e..0d28306 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -128,7 +128,9 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
*/
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/*
* Called for ROLLBACK PREPARED.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff7..11e2e1e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -442,7 +442,9 @@ typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
/* commit prepared callback signature */
typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/* rollback prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index 0e218e0..3d246be 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -87,9 +87,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is( $result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot'
);
diff --git a/src/test/subscription/t/021_twophase.pl b/src/test/subscription/t/021_twophase.pl
index c6ada92..b954630 100644
--- a/src/test/subscription/t/021_twophase.pl
+++ b/src/test/subscription/t/021_twophase.pl
@@ -6,7 +6,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 24;
+use Test::More tests => 25;
###############################
# Setup
@@ -318,10 +318,9 @@ $node_publisher->safe_psql('postgres', "
$node_publisher->wait_for_catchup($appname_copy);
-# Check that the transaction has been prepared on the subscriber, there will be 2
-# prepared transactions for the 2 subscriptions.
+# Check that the transaction has been prepared on the subscriber
$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;");
-is($result, qq(2), 'transaction is prepared on subscriber');
+is($result, qq(1), 'transaction is prepared on subscriber');
# Now commit the insert and verify that it IS replicated
$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'mygid';");
@@ -337,6 +336,45 @@ is($result, qq(2), 'replicated data in subscriber table');
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_copy;");
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_copy;");
+##############################
+# Test empty prepares
+##############################
+
+# create a table that is not part of the publication
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_nopub (a int PRIMARY KEY)");
+
+# disable the subscription so that we can peek at the slot
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE");
+
+# wait for the replication slot to become inactive in the publisher
+$node_publisher->poll_query_until('postgres',
+ "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1);
+
+# create a transaction with no changes relevant to the slot
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_nopub SELECT generate_series(1,10);
+ PREPARE TRANSACTION 'empty_transaction';
+ COMMIT PREPARED 'empty_transaction';");
+
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+ 'postgres', qq(
+ SELECT get_byte(data, 0)
+ FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+ 'proto_version', '3',
+ 'publication_names', 'tap_pub')
+));
+
+# the empty transaction should be skipped
+is($result, qq(),
+ 'empty transaction dropped on slot'
+);
+
+# enable the subscription to test cleanup
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
+
###############################
# check all the cleanup
###############################
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..75639ab 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1606,6 +1606,7 @@ PGMessageField
PGModuleMagicFunction
PGNoticeHooks
PGOutputData
+PGOutputTxnData
PGPROC
PGP_CFB
PGP_Context
--
1.8.3.1
I have reviewed the v9 patch and my feedback comments are below:
//////////
1. Apply v9 gave multiple whitespace warnings
$ git apply v9-0001-Skip-empty-transactions-for-logical-replication.patch
v9-0001-Skip-empty-transactions-for-logical-replication.patch:479:
indent with spaces.
* If the BEGIN PREPARE was not yet sent, then it means there were no
v9-0001-Skip-empty-transactions-for-logical-replication.patch:480:
indent with spaces.
* relevant changes encountered, so we can skip the ROLLBACK PREPARED
v9-0001-Skip-empty-transactions-for-logical-replication.patch:481:
indent with spaces.
* messsage too.
v9-0001-Skip-empty-transactions-for-logical-replication.patch:482:
indent with spaces.
*/
warning: 4 lines add whitespace errors.
------
2. Commit comment - wording
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions which were skipped.
=>
Is that correct? Or did you mean to say:
AFTER
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions that are empty.
------
3. src/backend/replication/pgoutput/pgoutput.c - typo
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the COMMIT PREPARED
+ * messsage too.
+ */
Typo: "messsage" --> "message"
(NOTE this same typo is in 2 places)
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thu, Jul 22, 2021 at 11:37 PM Ajin Cherian <itsajin@gmail.com> wrote:
I have some minor comments on the v9 patch:
(1) Several whitespace warnings on patch application
(2) Suggested patch comment change:
BEFORE:
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction is empty (because it does not
AFTER:
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction might be empty (because it does not
(3) Comment needed for added struct defn:
typedef struct PGOutputTxnData
(4) Improve comment.
Can you add a comma (or add words) in the below sentence, so we know
how to read it?
+ /*
+ * Delegate to assign the begin sent flag as false same as for the
+ * BEGIN message.
+ */
Regards,
Greg Nancarrow
Fujitsu Australia
On Fri, Jul 23, 2021 at 10:26 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Thu, Jul 22, 2021 at 11:37 PM Ajin Cherian <itsajin@gmail.com> wrote:
I have some minor comments on the v9 patch:
(1) Several whitespace warnings on patch application
Fixed.
(2) Suggested patch comment change:
BEFORE:
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction is empty (because it does not
AFTER:
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction might be empty (because it does not
Changed accordingly.
(3) Comment needed for added struct defn:
typedef struct PGOutputTxnData
Added.
(4) Improve comment.
Can you add a comma (or add words) in the below sentence, so we know
how to read it?
Updated.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v10-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v10-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From f60263204b86df83b8a62294ba587085f75504b3 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 21 Jul 2021 06:29:57 -0400
Subject: [PATCH v10] Skip empty transactions for logical replication.
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction might be empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE messages until the first change is encountered.
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE messages for transactions that are empty.
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
contrib/test_decoding/test_decoding.c | 7 +-
doc/src/sgml/logicaldecoding.sgml | 13 +-
doc/src/sgml/protocol.sgml | 15 ++
src/backend/replication/logical/logical.c | 9 +-
src/backend/replication/logical/proto.c | 16 +-
src/backend/replication/logical/reorderbuffer.c | 2 +-
src/backend/replication/logical/worker.c | 38 +++--
src/backend/replication/pgoutput/pgoutput.c | 195 +++++++++++++++++++++++-
src/include/replication/logicalproto.h | 8 +-
src/include/replication/output_plugin.h | 4 +-
src/include/replication/reorderbuffer.h | 4 +-
src/test/subscription/t/020_messages.pl | 5 +-
src/test/subscription/t/021_twophase.pl | 46 +++++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 322 insertions(+), 41 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e5cd84e..408dbfc 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -86,7 +86,9 @@ static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_lsn);
static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -390,7 +392,8 @@ pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
/* COMMIT PREPARED callback */
static void
pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
TestDecodingData *data = ctx->output_plugin_private;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 89b8090..beb09ce 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> command or not.
+ If yes, it can commit the transaction, otherwise, it can skip the commit.
+ The <parameter>gid</parameter> alone is not sufficient to determine this
+ because the downstream node may already have a prepared transaction with the
+ same identifier.
<programlisting>
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
</programlisting>
</para>
</sect3>
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e8cb78f..5e68dfb 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -7550,6 +7550,13 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ The end LSN of the prepare.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
The LSN of the commit prepared.
</para></listitem>
</varlistentry>
@@ -7564,6 +7571,14 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ Prepare timestamp of the transaction. The value is in number
+ of microseconds since PostgreSQL epoch (2000-01-01).
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
Commit timestamp of the transaction. The value is in number
of microseconds since PostgreSQL epoch (2000-01-01).
</para></listitem>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d61ef4c..67c762a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -63,7 +63,8 @@ static void begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn
static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn);
static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn, TimestampTz prepare_time);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -936,7 +937,8 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void
commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
LogicalDecodingContext *ctx = cache->private_data;
LogicalErrorCallbackState state;
@@ -972,7 +974,8 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"commit_prepared_cb")));
/* do the actual work: call callback */
- ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index a245252..47a7489 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -206,7 +206,9 @@ logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
*/
void
logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
uint8 flags = 0;
@@ -222,8 +224,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, flags);
/* send fields */
+ pq_sendint64(out, prepare_end_lsn);
pq_sendint64(out, commit_lsn);
pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, prepare_time);
pq_sendint64(out, txn->xact_time.commit_time);
pq_sendint32(out, txn->xid);
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);
/* read fields */
+ prepare_data->prepare_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "prepare_end_lsn is not set in commit prepared message");
prepare_data->commit_lsn = pq_getmsgint64(in);
if (prepare_data->commit_lsn == InvalidXLogRecPtr)
elog(ERROR, "commit_lsn is not set in commit prepared message");
- prepare_data->end_lsn = pq_getmsgint64(in);
- if (prepare_data->end_lsn == InvalidXLogRecPtr)
- elog(ERROR, "end_lsn is not set in commit prepared message");
+ prepare_data->commit_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->commit_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "commit_end_lsn is not set in commit prepared message");
+ prepare_data->prepare_time = pq_getmsgint64(in);
prepare_data->commit_time = pq_getmsgint64(in);
prepare_data->xid = pq_getmsgint(in, 4);
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7378beb..5a707e2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2794,7 +2794,7 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->origin_lsn = origin_lsn;
if (is_commit)
- rb->commit_prepared(rb, txn, commit_lsn);
+ rb->commit_prepared(rb, txn, commit_lsn, prepare_end_lsn, prepare_time);
else
rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b9a7a7f..63e19bc 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -966,27 +966,39 @@ apply_handle_commit_prepared(StringInfo s)
/* Compute GID for two_phase transactions. */
TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,
gid, sizeof(gid));
-
- /* There is no transaction when COMMIT PREPARED is called */
- begin_replication_step();
-
/*
- * Update origin state so we can restart streaming from correct position
- * in case of crash.
+ * It is possible that we haven't received the prepare because
+ * the transaction did not have any changes relevant to this
+ * subscription and so was essentially an empty prepare. In this case,
+ * the walsender is optimized to drop the empty transaction and the
+ * accompanying prepare. Silently ignore if we don't find the prepared
+ * transaction.
*/
- replorigin_session_origin_lsn = prepare_data.end_lsn;
- replorigin_session_origin_timestamp = prepare_data.commit_time;
+ if (LookupGXact(gid, prepare_data.prepare_end_lsn,
+ prepare_data.prepare_time))
+ {
- FinishPreparedTransaction(gid, true);
- end_replication_step();
- CommitTransactionCommand();
+ /* There is no transaction when COMMIT PREPARED is called */
+ begin_replication_step();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.commit_end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.commit_time;
+
+ FinishPreparedTransaction(gid, true);
+ end_replication_step();
+ CommitTransactionCommand();
+ }
pgstat_report_stat(false);
- store_flush_position(prepare_data.end_lsn);
+ store_flush_position(prepare_data.commit_end_lsn);
in_remote_transaction = false;
/* Process any tables that are being synchronized in parallel. */
- process_syncing_tables(prepare_data.end_lsn);
+ process_syncing_tables(prepare_data.commit_end_lsn);
pgstat_report_activity(STATE_IDLE, NULL);
}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index e4314af..2cdd9aa 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -56,7 +56,9 @@ static void pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx,
static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
- ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -130,6 +132,17 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;
+/*
+ * Maintain a per-transaction level variable to track whether the
+ * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE
+ * is only sent when the first change in a transaction is processed.
+ * This make it possible to skip transactions that are empty.
+ */
+typedef struct PGOutputTxnData
+{
+ bool sent_begin_txn; /* flag indicating whether begin has been sent */
+} PGOutputTxnData;
+
/* Map used to remember which relation schemas we sent. */
static HTAB *RelationSyncCache = NULL;
@@ -405,15 +418,40 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
/*
- * BEGIN callback
+ * BEGIN callback.
+ *
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
*/
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *txndata = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
+
+ txndata->sent_begin_txn = false;
+ txn->output_plugin_private = txndata;
+}
+
+/*
+ * Send BEGIN.
+ * This is where the BEGIN is actually sent. This is called
+ * while processing the first change of the transaction.
+ */
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(txndata);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ txndata->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -428,23 +466,67 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ bool skip;
+
+ Assert(txndata);
+
+ /*
+ * If a BEGIN message was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the COMMIT message too.
+ */
+ skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
OutputPluginUpdateProgress(ctx);
+ if (skip)
+ {
+ elog(DEBUG1, "skipping replication of an empty transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
}
/*
- * BEGIN PREPARE callback
+ * BEGIN PREPARE callback.
+ *
+ * Don't send BEGIN PREPARE message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN PREPARE and COMMIT PREPARED messages
+ * to subscribers, using bandwidth on something with little/no use
+ * for logical replication.
*/
static void
pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ /*
+ * Delegate to assign the begin sent flag as false, same as for the
+ * BEGIN message.
+ */
+ pgoutput_begin_txn(ctx, txn);
+}
+
+/*
+ * Send BEGIN PREPARE.
+ * This is where the BEGIN PREPARE is actually sent. This is called while
+ * processing the first change of the prepared transaction.
+ */
+static void
+pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(txndata);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin_prepare(ctx->out, txn);
+ txndata->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -459,8 +541,21 @@ static void
pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the PREPARE message too.
+ */
+ if (!txndata->sent_begin_txn)
+ {
+ elog(DEBUG1, "skipping replication of an empty prepared transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
@@ -471,12 +566,34 @@ pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
*/
static void
pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the COMMIT PREPARED
+ * message too.
+ */
+ if (txndata)
+ {
+ bool skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of COMMIT PREPARED of an empty transaction");
+ return;
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
OutputPluginWrite(ctx, true);
}
@@ -489,8 +606,27 @@ pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_end_lsn,
TimestampTz prepare_time)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the ROLLBACK PREPARED
+ * message too.
+ */
+ if (txndata)
+ {
+ bool skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of ROLLBACK of an empty transaction");
+ return;
+ }
+ }
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_rollback_prepared(ctx->out, txn, prepare_end_lsn,
prepare_time);
@@ -639,11 +775,15 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ Assert(in_streaming || txndata);
+
if (!is_publishable_relation(relation))
return;
@@ -677,6 +817,17 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /*
+ * Output BEGIN / BEGIN PREPARE if we haven't yet, unless streaming.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -779,6 +930,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
int nrelations, Relation relations[], ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
int i;
@@ -786,6 +938,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Oid *relids;
TransactionId xid = InvalidTransactionId;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ Assert(in_streaming || txndata);
+
/* Remember the xid for the change in streaming mode. See pgoutput_change. */
if (in_streaming)
xid = change->txn->xid;
@@ -822,6 +977,18 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -854,6 +1021,24 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /*
+ * Output BEGIN if we haven't yet.
+ * Avoid for streaming and non-transactional messages
+ */
+ if (!in_streaming && transactional)
+ {
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
+ if (!txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 63de90d..0be0a07 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -148,8 +148,10 @@ typedef struct LogicalRepPreparedTxnData
*/
typedef struct LogicalRepCommitPreparedTxnData
{
+ XLogRecPtr prepare_end_lsn;
XLogRecPtr commit_lsn;
- XLogRecPtr end_lsn;
+ XLogRecPtr commit_end_lsn;
+ TimestampTz prepare_time;
TimestampTz commit_time;
TransactionId xid;
char gid[GIDSIZE];
@@ -188,7 +190,9 @@ extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
extern void logicalrep_read_prepare(StringInfo in,
LogicalRepPreparedTxnData *prepare_data);
extern void logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
extern void logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *prepare_data);
extern void logicalrep_write_rollback_prepared(StringInfo out, ReorderBufferTXN *txn,
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 810495e..0d28306 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -128,7 +128,9 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
*/
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/*
* Called for ROLLBACK PREPARED.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff7..11e2e1e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -442,7 +442,9 @@ typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
/* commit prepared callback signature */
typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/* rollback prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index 0e218e0..3d246be 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -87,9 +87,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is( $result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot'
);
diff --git a/src/test/subscription/t/021_twophase.pl b/src/test/subscription/t/021_twophase.pl
index c6ada92..b954630 100644
--- a/src/test/subscription/t/021_twophase.pl
+++ b/src/test/subscription/t/021_twophase.pl
@@ -6,7 +6,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 24;
+use Test::More tests => 25;
###############################
# Setup
@@ -318,10 +318,9 @@ $node_publisher->safe_psql('postgres', "
$node_publisher->wait_for_catchup($appname_copy);
-# Check that the transaction has been prepared on the subscriber, there will be 2
-# prepared transactions for the 2 subscriptions.
+# Check that the transaction has been prepared on the subscriber
$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;");
-is($result, qq(2), 'transaction is prepared on subscriber');
+is($result, qq(1), 'transaction is prepared on subscriber');
# Now commit the insert and verify that it IS replicated
$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'mygid';");
@@ -337,6 +336,45 @@ is($result, qq(2), 'replicated data in subscriber table');
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_copy;");
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_copy;");
+##############################
+# Test empty prepares
+##############################
+
+# create a table that is not part of the publication
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_nopub (a int PRIMARY KEY)");
+
+# disable the subscription so that we can peek at the slot
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE");
+
+# wait for the replication slot to become inactive in the publisher
+$node_publisher->poll_query_until('postgres',
+ "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1);
+
+# create a transaction with no changes relevant to the slot
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_nopub SELECT generate_series(1,10);
+ PREPARE TRANSACTION 'empty_transaction';
+ COMMIT PREPARED 'empty_transaction';");
+
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+ 'postgres', qq(
+ SELECT get_byte(data, 0)
+ FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+ 'proto_version', '3',
+ 'publication_names', 'tap_pub')
+));
+
+# the empty transaction should be skipped
+is($result, qq(),
+ 'empty transaction dropped on slot'
+);
+
+# enable the subscription to test cleanup
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
+
###############################
# check all the cleanup
###############################
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..75639ab 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1606,6 +1606,7 @@ PGMessageField
PGModuleMagicFunction
PGNoticeHooks
PGOutputData
+PGOutputTxnData
PGPROC
PGP_CFB
PGP_Context
--
1.8.3.1
On Fri, Jul 23, 2021 at 10:13 AM Peter Smith <smithpb2250@gmail.com> wrote:
I have reviewed the v9 patch and my feedback comments are below:
//////////
1. Apply v9 gave multiple whitespace warnings
Fixed.
------
2. Commit comment - wording
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions which were skipped.=>
Is that correct? Or did you mean to say:
AFTER
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions that are empty.------
Updated.
3. src/backend/replication/pgoutput/pgoutput.c - typo
+ /* + * If the BEGIN PREPARE was not yet sent, then it means there were no + * relevant changes encountered, so we can skip the COMMIT PREPARED + * messsage too. + */Typo: "messsage" --> "message"
(NOTE this same typo is in 2 places)
Fixed.
I have made these changes in v10 of the patch.
regards,
Ajin Cherian
Fujitsu Australia
I have reviewed the v10 patch.
Apply / build / test was all OK.
Just one review comment:
//////////
1. Typo
@@ -130,6 +132,17 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;
+/*
+ * Maintain a per-transaction level variable to track whether the
+ * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE
+ * is only sent when the first change in a transaction is processed.
+ * This make it possible to skip transactions that are empty.
+ */
=>
typo: "make it possible" --> "makes it possible"
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Fri, Jul 23, 2021 at 7:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
I have reviewed the v10 patch.
Apply / build / test was all OK.
Just one review comment:
//////////
1. Typo
@@ -130,6 +132,17 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;+/* + * Maintain a per-transaction level variable to track whether the + * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE + * is only sent when the first change in a transaction is processed. + * This make it possible to skip transactions that are empty. + */=>
typo: "make it possible" --> "makes it possible"
fixed.
regards,
Ajin Cherian
Fujitsu Australia
Attachments:
v11-0001-Skip-empty-transactions-for-logical-replication.patchapplication/octet-stream; name=v11-0001-Skip-empty-transactions-for-logical-replication.patchDownload
From f60263204b86df83b8a62294ba587085f75504b3 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 21 Jul 2021 06:29:57 -0400
Subject: [PATCH v10] Skip empty transactions for logical replication.
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction might be empty (because it does not
contain changes from the selected publications). It is a waste of CPU
cycles and network bandwidth to build/transmit these empty transactions.
This patch addresses the above problem by postponing the BEGIN / BEGIN
PREPARE messages until the first change is encountered.
If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE messages for transactions that are empty.
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions that are empty.
Discussion:
https://postgr.es/m/CAMkU=1yohp9-dv48FLoSPrMqYEyyS5ZWkaZGD41RJr10xiNo_Q@mail.gmail.com
---
contrib/test_decoding/test_decoding.c | 7 +-
doc/src/sgml/logicaldecoding.sgml | 13 +-
doc/src/sgml/protocol.sgml | 15 ++
src/backend/replication/logical/logical.c | 9 +-
src/backend/replication/logical/proto.c | 16 +-
src/backend/replication/logical/reorderbuffer.c | 2 +-
src/backend/replication/logical/worker.c | 38 +++--
src/backend/replication/pgoutput/pgoutput.c | 195 +++++++++++++++++++++++-
src/include/replication/logicalproto.h | 8 +-
src/include/replication/output_plugin.h | 4 +-
src/include/replication/reorderbuffer.h | 4 +-
src/test/subscription/t/020_messages.pl | 5 +-
src/test/subscription/t/021_twophase.pl | 46 +++++-
src/tools/pgindent/typedefs.list | 1 +
14 files changed, 322 insertions(+), 41 deletions(-)
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index e5cd84e..408dbfc 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -86,7 +86,9 @@ static void pg_decode_prepare_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_lsn);
static void pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -390,7 +392,8 @@ pg_decode_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
/* COMMIT PREPARED callback */
static void
pg_decode_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
TestDecodingData *data = ctx->output_plugin_private;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 89b8090..beb09ce 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
The required <function>commit_prepared_cb</function> callback is called
whenever a transaction <command>COMMIT PREPARED</command> has been decoded.
The <parameter>gid</parameter> field, which is part of the
- <parameter>txn</parameter> parameter, can be used in this callback.
+ <parameter>txn</parameter> parameter, can be used in this callback. The
+ parameters <parameter>prepare_end_lsn</parameter> and
+ <parameter>prepare_time</parameter> can be used to check if the plugin
+ has received this <command>PREPARE TRANSACTION</command> command or not.
+ If yes, it can commit the transaction, otherwise, it can skip the commit.
+ The <parameter>gid</parameter> alone is not sufficient to determine this
+ because the downstream node may already have a prepared transaction with the
+ same identifier.
<programlisting>
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
</programlisting>
</para>
</sect3>
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index e8cb78f..5e68dfb 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -7550,6 +7550,13 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ The end LSN of the prepare.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
The LSN of the commit prepared.
</para></listitem>
</varlistentry>
@@ -7564,6 +7571,14 @@ are available since protocol version 3.
<varlistentry>
<term>Int64</term>
<listitem><para>
+ Prepare timestamp of the transaction. The value is in number
+ of microseconds since PostgreSQL epoch (2000-01-01).
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
Commit timestamp of the transaction. The value is in number
of microseconds since PostgreSQL epoch (2000-01-01).
</para></listitem>
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index d61ef4c..67c762a 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -63,7 +63,8 @@ static void begin_prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn
static void prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn);
static void commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void rollback_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn, TimestampTz prepare_time);
static void change_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
@@ -936,7 +937,8 @@ prepare_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
static void
commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
LogicalDecodingContext *ctx = cache->private_data;
LogicalErrorCallbackState state;
@@ -972,7 +974,8 @@ commit_prepared_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
"commit_prepared_cb")));
/* do the actual work: call callback */
- ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn);
+ ctx->callbacks.commit_prepared_cb(ctx, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
/* Pop the error context stack */
error_context_stack = errcallback.previous;
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index a245252..47a7489 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -206,7 +206,9 @@ logicalrep_read_prepare(StringInfo in, LogicalRepPreparedTxnData *prepare_data)
*/
void
logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
uint8 flags = 0;
@@ -222,8 +224,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
pq_sendbyte(out, flags);
/* send fields */
+ pq_sendint64(out, prepare_end_lsn);
pq_sendint64(out, commit_lsn);
pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, prepare_time);
pq_sendint64(out, txn->xact_time.commit_time);
pq_sendint32(out, txn->xid);
@@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData *
elog(ERROR, "unrecognized flags %u in commit prepared message", flags);
/* read fields */
+ prepare_data->prepare_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "prepare_end_lsn is not set in commit prepared message");
prepare_data->commit_lsn = pq_getmsgint64(in);
if (prepare_data->commit_lsn == InvalidXLogRecPtr)
elog(ERROR, "commit_lsn is not set in commit prepared message");
- prepare_data->end_lsn = pq_getmsgint64(in);
- if (prepare_data->end_lsn == InvalidXLogRecPtr)
- elog(ERROR, "end_lsn is not set in commit prepared message");
+ prepare_data->commit_end_lsn = pq_getmsgint64(in);
+ if (prepare_data->commit_end_lsn == InvalidXLogRecPtr)
+ elog(ERROR, "commit_end_lsn is not set in commit prepared message");
+ prepare_data->prepare_time = pq_getmsgint64(in);
prepare_data->commit_time = pq_getmsgint64(in);
prepare_data->xid = pq_getmsgint(in, 4);
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7378beb..5a707e2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2794,7 +2794,7 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
txn->origin_lsn = origin_lsn;
if (is_commit)
- rb->commit_prepared(rb, txn, commit_lsn);
+ rb->commit_prepared(rb, txn, commit_lsn, prepare_end_lsn, prepare_time);
else
rb->rollback_prepared(rb, txn, prepare_end_lsn, prepare_time);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b9a7a7f..63e19bc 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -966,27 +966,39 @@ apply_handle_commit_prepared(StringInfo s)
/* Compute GID for two_phase transactions. */
TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,
gid, sizeof(gid));
-
- /* There is no transaction when COMMIT PREPARED is called */
- begin_replication_step();
-
/*
- * Update origin state so we can restart streaming from correct position
- * in case of crash.
+ * It is possible that we haven't received the prepare because
+ * the transaction did not have any changes relevant to this
+ * subscription and so was essentially an empty prepare. In this case,
+ * the walsender is optimized to drop the empty transaction and the
+ * accompanying prepare. Silently ignore if we don't find the prepared
+ * transaction.
*/
- replorigin_session_origin_lsn = prepare_data.end_lsn;
- replorigin_session_origin_timestamp = prepare_data.commit_time;
+ if (LookupGXact(gid, prepare_data.prepare_end_lsn,
+ prepare_data.prepare_time))
+ {
- FinishPreparedTransaction(gid, true);
- end_replication_step();
- CommitTransactionCommand();
+ /* There is no transaction when COMMIT PREPARED is called */
+ begin_replication_step();
+
+ /*
+ * Update origin state so we can restart streaming from correct position
+ * in case of crash.
+ */
+ replorigin_session_origin_lsn = prepare_data.commit_end_lsn;
+ replorigin_session_origin_timestamp = prepare_data.commit_time;
+
+ FinishPreparedTransaction(gid, true);
+ end_replication_step();
+ CommitTransactionCommand();
+ }
pgstat_report_stat(false);
- store_flush_position(prepare_data.end_lsn);
+ store_flush_position(prepare_data.commit_end_lsn);
in_remote_transaction = false;
/* Process any tables that are being synchronized in parallel. */
- process_syncing_tables(prepare_data.end_lsn);
+ process_syncing_tables(prepare_data.commit_end_lsn);
pgstat_report_activity(STATE_IDLE, NULL);
}
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index e4314af..2cdd9aa 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -56,7 +56,9 @@ static void pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx,
static void pgoutput_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx,
- ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
static void pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_end_lsn,
@@ -130,6 +132,17 @@ typedef struct RelationSyncEntry
TupleConversionMap *map;
} RelationSyncEntry;
+/*
+ * Maintain a per-transaction level variable to track whether the
+ * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE
+ * is only sent when the first change in a transaction is processed.
+ * This makes it possible to skip transactions that are empty.
+ */
+typedef struct PGOutputTxnData
+{
+ bool sent_begin_txn; /* flag indicating whether begin has been sent */
+} PGOutputTxnData;
+
/* Map used to remember which relation schemas we sent. */
static HTAB *RelationSyncCache = NULL;
@@ -405,15 +418,40 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
}
/*
- * BEGIN callback
+ * BEGIN callback.
+ *
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN and COMMIT messages to subscribers,
+ * using bandwidth on something with little/no use for logical replication.
*/
static void
pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ PGOutputTxnData *txndata = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));
+
+ txndata->sent_begin_txn = false;
+ txn->output_plugin_private = txndata;
+}
+
+/*
+ * Send BEGIN.
+ * This is where the BEGIN is actually sent. This is called
+ * while processing the first change of the transaction.
+ */
+static void
+pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(txndata);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin(ctx->out, txn);
+ txndata->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -428,23 +466,67 @@ static void
pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr commit_lsn)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ bool skip;
+
+ Assert(txndata);
+
+ /*
+ * If a BEGIN message was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the COMMIT message too.
+ */
+ skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
OutputPluginUpdateProgress(ctx);
+ if (skip)
+ {
+ elog(DEBUG1, "skipping replication of an empty transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_commit(ctx->out, txn, commit_lsn);
OutputPluginWrite(ctx, true);
}
/*
- * BEGIN PREPARE callback
+ * BEGIN PREPARE callback.
+ *
+ * Don't send BEGIN PREPARE message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on
+ * table(s) that are not published will produce empty transactions. These
+ * empty transactions will send BEGIN PREPARE and COMMIT PREPARED messages
+ * to subscribers, using bandwidth on something with little/no use
+ * for logical replication.
*/
static void
pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
{
+ /*
+ * Delegate to assign the begin sent flag as false, same as for the
+ * BEGIN message.
+ */
+ pgoutput_begin_txn(ctx, txn);
+}
+
+/*
+ * Send BEGIN PREPARE.
+ * This is where the BEGIN PREPARE is actually sent. This is called while
+ * processing the first change of the prepared transaction.
+ */
+static void
+pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
bool send_replication_origin = txn->origin_id != InvalidRepOriginId;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+ Assert(txndata);
OutputPluginPrepareWrite(ctx, !send_replication_origin);
logicalrep_write_begin_prepare(ctx->out, txn);
+ txndata->sent_begin_txn = true;
send_repl_origin(ctx, txn->origin_id, txn->origin_lsn,
send_replication_origin);
@@ -459,8 +541,21 @@ static void
pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN was not yet sent, then it means there were no relevant
+ * changes encountered, so we can skip the PREPARE message too.
+ */
+ if (!txndata->sent_begin_txn)
+ {
+ elog(DEBUG1, "skipping replication of an empty prepared transaction");
+ return;
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);
@@ -471,12 +566,34 @@ pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
*/
static void
pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn)
+ XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the COMMIT PREPARED
+ * message too.
+ */
+ if (txndata)
+ {
+ bool skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of COMMIT PREPARED of an empty transaction");
+ return;
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
- logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn);
+ logicalrep_write_commit_prepared(ctx->out, txn, commit_lsn, prepare_end_lsn,
+ prepare_time);
OutputPluginWrite(ctx, true);
}
@@ -489,8 +606,27 @@ pgoutput_rollback_prepared_txn(LogicalDecodingContext *ctx,
XLogRecPtr prepare_end_lsn,
TimestampTz prepare_time)
{
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
OutputPluginUpdateProgress(ctx);
+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the ROLLBACK PREPARED
+ * message too.
+ */
+ if (txndata)
+ {
+ bool skip = !txndata->sent_begin_txn;
+ pfree(txndata);
+ txn->output_plugin_private = NULL;
+ if (skip)
+ {
+ elog(DEBUG1,
+ "skipping replication of ROLLBACK of an empty transaction");
+ return;
+ }
+ }
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_rollback_prepared(ctx->out, txn, prepare_end_lsn,
prepare_time);
@@ -639,11 +775,15 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Relation relation, ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
TransactionId xid = InvalidTransactionId;
Relation ancestor = NULL;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ Assert(in_streaming || txndata);
+
if (!is_publishable_relation(relation))
return;
@@ -677,6 +817,17 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Assert(false);
}
+ /*
+ * Output BEGIN / BEGIN PREPARE if we haven't yet, unless streaming.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);
@@ -779,6 +930,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
int nrelations, Relation relations[], ReorderBufferChange *change)
{
PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
MemoryContext old;
RelationSyncEntry *relentry;
int i;
@@ -786,6 +938,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
Oid *relids;
TransactionId xid = InvalidTransactionId;
+ /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */
+ Assert(in_streaming || txndata);
+
/* Remember the xid for the change in streaming mode. See pgoutput_change. */
if (in_streaming)
xid = change->txn->xid;
@@ -822,6 +977,18 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (nrelids > 0)
{
+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_truncate(ctx->out,
xid,
@@ -854,6 +1021,24 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (in_streaming)
xid = txn->xid;
+ /*
+ * Output BEGIN if we haven't yet.
+ * Avoid for streaming and non-transactional messages
+ */
+ if (!in_streaming && transactional)
+ {
+ PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private;
+
+ Assert(txndata);
+ if (!txndata->sent_begin_txn)
+ {
+ if (rbtxn_prepared(txn))
+ pgoutput_begin_prepare(ctx, txn);
+ else
+ pgoutput_begin(ctx, txn);
+ }
+ }
+
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_message(ctx->out,
xid,
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index 63de90d..0be0a07 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -148,8 +148,10 @@ typedef struct LogicalRepPreparedTxnData
*/
typedef struct LogicalRepCommitPreparedTxnData
{
+ XLogRecPtr prepare_end_lsn;
XLogRecPtr commit_lsn;
- XLogRecPtr end_lsn;
+ XLogRecPtr commit_end_lsn;
+ TimestampTz prepare_time;
TimestampTz commit_time;
TransactionId xid;
char gid[GIDSIZE];
@@ -188,7 +190,9 @@ extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
extern void logicalrep_read_prepare(StringInfo in,
LogicalRepPreparedTxnData *prepare_data);
extern void logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
extern void logicalrep_read_commit_prepared(StringInfo in,
LogicalRepCommitPreparedTxnData *prepare_data);
extern void logicalrep_write_rollback_prepared(StringInfo out, ReorderBufferTXN *txn,
diff --git a/src/include/replication/output_plugin.h b/src/include/replication/output_plugin.h
index 810495e..0d28306 100644
--- a/src/include/replication/output_plugin.h
+++ b/src/include/replication/output_plugin.h
@@ -128,7 +128,9 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx,
*/
typedef void (*LogicalDecodeCommitPreparedCB) (struct LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/*
* Called for ROLLBACK PREPARED.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5b40ff7..11e2e1e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -442,7 +442,9 @@ typedef void (*ReorderBufferPrepareCB) (ReorderBuffer *rb,
/* commit prepared callback signature */
typedef void (*ReorderBufferCommitPreparedCB) (ReorderBuffer *rb,
ReorderBufferTXN *txn,
- XLogRecPtr commit_lsn);
+ XLogRecPtr commit_lsn,
+ XLogRecPtr prepare_end_lsn,
+ TimestampTz prepare_time);
/* rollback prepared callback signature */
typedef void (*ReorderBufferRollbackPreparedCB) (ReorderBuffer *rb,
diff --git a/src/test/subscription/t/020_messages.pl b/src/test/subscription/t/020_messages.pl
index 0e218e0..3d246be 100644
--- a/src/test/subscription/t/020_messages.pl
+++ b/src/test/subscription/t/020_messages.pl
@@ -87,9 +87,8 @@ $result = $node_publisher->safe_psql(
'publication_names', 'tap_pub')
));
-# 66 67 == B C == BEGIN COMMIT
-is( $result, qq(66
-67),
+# no message and no BEGIN and COMMIT because of empty transaction optimization
+is($result, qq(),
'option messages defaults to false so message (M) is not available on slot'
);
diff --git a/src/test/subscription/t/021_twophase.pl b/src/test/subscription/t/021_twophase.pl
index c6ada92..b954630 100644
--- a/src/test/subscription/t/021_twophase.pl
+++ b/src/test/subscription/t/021_twophase.pl
@@ -6,7 +6,7 @@ use strict;
use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 24;
+use Test::More tests => 25;
###############################
# Setup
@@ -318,10 +318,9 @@ $node_publisher->safe_psql('postgres', "
$node_publisher->wait_for_catchup($appname_copy);
-# Check that the transaction has been prepared on the subscriber, there will be 2
-# prepared transactions for the 2 subscriptions.
+# Check that the transaction has been prepared on the subscriber
$result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;");
-is($result, qq(2), 'transaction is prepared on subscriber');
+is($result, qq(1), 'transaction is prepared on subscriber');
# Now commit the insert and verify that it IS replicated
$node_publisher->safe_psql('postgres', "COMMIT PREPARED 'mygid';");
@@ -337,6 +336,45 @@ is($result, qq(2), 'replicated data in subscriber table');
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_copy;");
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_copy;");
+##############################
+# Test empty prepares
+##############################
+
+# create a table that is not part of the publication
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab_nopub (a int PRIMARY KEY)");
+
+# disable the subscription so that we can peek at the slot
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub DISABLE");
+
+# wait for the replication slot to become inactive in the publisher
+$node_publisher->poll_query_until('postgres',
+ "SELECT COUNT(*) FROM pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND active='f'", 1);
+
+# create a transaction with no changes relevant to the slot
+$node_publisher->safe_psql('postgres', "
+ BEGIN;
+ INSERT INTO tab_nopub SELECT generate_series(1,10);
+ PREPARE TRANSACTION 'empty_transaction';
+ COMMIT PREPARED 'empty_transaction';");
+
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+ 'postgres', qq(
+ SELECT get_byte(data, 0)
+ FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+ 'proto_version', '3',
+ 'publication_names', 'tap_pub')
+));
+
+# the empty transaction should be skipped
+is($result, qq(),
+ 'empty transaction dropped on slot'
+);
+
+# enable the subscription to test cleanup
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION tap_sub ENABLE");
+
###############################
# check all the cleanup
###############################
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37cf4b2..75639ab 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1606,6 +1606,7 @@ PGMessageField
PGModuleMagicFunction
PGNoticeHooks
PGOutputData
+PGOutputTxnData
PGPROC
PGP_CFB
PGP_Context
--
1.8.3.1
FYI - I have checked the v11 patch. Everything applies, builds, and
tests OK for me, and I have no more review comments. So v11 LGTM.
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Fri, Jul 23, 2021 at 8:09 PM Ajin Cherian <itsajin@gmail.com> wrote:
fixed.
The v11 patch LGTM.
Regards,
Greg Nancarrow
Fujitsu Australia
On Friday, July 23, 2021 7:10 PM Ajin Cherian <itsajin@gmail.com> wrote:
On Fri, Jul 23, 2021 at 7:38 PM Peter Smith <smithpb2250@gmail.com> wrote:
I have reviewed the v10 patch.
The patch v11 looks good to me as well.
Thanks for addressing my past comments.
Best Regards,
Takamichi Osumi
Hi Ajin.
I have spent some time studying how your "empty transaction" (v11)
patch will affect network traffic and transaction throughput.
BLUF
====
For my test environment the general observations with the patch applied are:
- There is a potentially large reduction of network traffic (depends
on the number of empty transactions sent)
- Transaction throughput improved up to 7% (average ~2% across
mixtures) for Synchronous mode
- Transaction throughput improved up to 7% (average ~3% across
mixtures) for NOT Synchronous mode
So this patch LGTM.
TEST INFORMATION
================
Overview
-------------
1. There are 2 similar tables. One table is published; the other is not.
2. Equivalent simple SQL operations are performed on these tables. E.g.
- INSERT/UPDATE/DELETE using normal COMMIT
- INSERT/UPDATE/DELETE using 2PC COMMIT PREPARED
3. pg_bench is used to measure the throughput for different mixes of
empty and not-empty transactions sent. E.g.
- 0% are empty
- 25% are empty
- 50% are empty
- 75% are empty
- 100% are empty
4. The apply_dispatch code has been temporarily modified to log the
number of protocol messages/bytes being processed.
- At the conclusion of the test run the logs are processed to extract
the numbers.
5. Each test run is 15 minutes elapsed time.
6. The tests are repeated without/with your patch applied
- So, there are 2 (without/with patch) x 5 (different mixes) = 10 test results
- Transaction throughput results are from pg_bench
- Protocol message bytes are extracted from the logs (from modified
apply_dispatch)
7. Also, the entire set of 10 test cases was repeated with
synchronous_standby_names setting enable/disabled.
- Enabled, so the results are for total round-trip processing of the pub/sub.
- Disabled. no waiting at the publisher side.
Configuration
-------------------
My environment is a single test machine with 2 PG instances (for pub and sub).
Using default configs except:
PUB-node
- wal_level = logical
- max_wal_senders = 10
- logical_decoding_work_mem = 64kB
- checkpoint_timeout = 30min
- min_wal_size = 10GB
- max_wal_size = 20GB
- shared_buffers = 2GB
- synchronous_standby_names = 'sync_sub' (for synchronous testing only)
SUB-node
- max_worker_processes = 11
- max_logical_replication_workers = 10
- checkpoint_timeout = 30min
- min_wal_size = 10GB
- max_wal_size = 20GB
- shared_buffers = 2GB
SQL files
-------------
Contents of test_empty_not_published.sql:
-- Operations for table not published
BEGIN;
INSERT INTO test_tab_nopub VALUES(1, 'foo');
UPDATE test_tab_nopub SET b = 'bar' WHERE a = 1;
DELETE FROM test_tab_nopub WHERE a = 1;
COMMIT;
-- 2PC operations for table not published
BEGIN;
INSERT INTO test_tab_nopub VALUES(2, 'fizz');
UPDATE test_tab_nopub SET b = 'bang' WHERE a = 2;
DELETE FROM test_tab_nopub WHERE a = 2;
PREPARE TRANSACTION 'gid_nopub';
COMMIT PREPARED 'gid_nopub';
~~
Contents of test_empty_published.sql:
(same as above but the table is called test_tab)
SQL Tables
----------------
(tables are the same apart from the name)
CREATE TABLE test_tab (a int primary key, b text, c timestamptz
DEFAULT now(), d bigint DEFAULT 999);
CREATE TABLE test_tab_nopub (a int primary key, b text, c timestamptz
DEFAULT now(), d bigint DEFAULT 999);
Example pg_bench command
------------------------
(this example is showing a test for a 25% mix of empty transactions)
pgbench -s 100 -T 900 -c 1 -f test_empty_not_published.sql@5 -f
test_empty_published.sql@15 test_pub
RESULTS / OBSERVATIONS
======================
Synchronous Mode
----------------
- As the percentage mix of empty transactions increases, so does the
transaction throughput. I assume this is because we are using
synchronous mode; so when there is less waiting time, then there is
more time available for transaction processing
- The performance was generally similar before/after the patch, but
there was an observed throughput improvement of ~2% (averaged across
all mixes)
- The number of protocol bytes is associated with the number of
transactions that are processed during the test time of 15 minutes.
This adds up to a significant number of bytes even when the
transactions are empty.
- For the unpatched code as the transaction rate increases, then so
does the number of traffic bytes.
- The patch improves this significantly by eliminating all the empty
transaction traffic.
- Before the patch, even "empty transactions" are processing some
bytes, so it can never reach zero. After the patch, empty transaction
traffic is eliminated entirely.
NOT Synchronous Mode
--------------------
- Since there is no synchronous waiting for round trips, the
transaction throughput is generally consistent regardless of the empty
transaction mix.
- There is a hint of a small overall improvement in throughput as the
empty transaction mix approaches near 100%. For my test environment
both the pub/sub nodes are using the same machine/CPU, so I guess is
that when there is less CPU spent processing messages in the Apply
Worker then there is more CPU available to pump transactions at the
publisher side.
- The patch transaction throughput seems ~3% better than for
non-patched. This might also be attributable to the same reason
mentioned above - less CPU spent processing empty messages at the
subscriber side leaves more CPU available to pump transactions from
the publisher side.
- The number of protocol bytes is associated with the number of
transactions that are processed during the test time of 15 minutes.
- Because the transaction throughput is consistent, the traffic of
protocol bytes here is determined mainly by the proportion of "empty
transactions" in the mixture.
- Before the patch, even “empty transactions” are processing some
bytes, so it can never reach zero. After the patch, the empty
transaction traffic is eliminated entirely.
- Before the patch, even “empty transactions” are processing some
bytes, so it can never reach zero. After the patch, the empty
transaction traffic is eliminated entirely.
ATTACHMENTS
===========
PSA
A1. A PDF version of my test report (also includes raw result data)
A2. Sync: Graph of Transaction throughput
A3. Sync: Graph of Protocol bytes (total)
A4. Sync: Graph of Protocol bytes (per transaction)
A5. Not-Sync: Graph of Transaction throughput
A6. Not-Sync: Graph of Protocol bytes (total)
A7. Not-Sync: Graph of Protocol bytes (per transaction)
------
Kind Regards,
Peter Smith.
Fujitsu Australia.
Attachments:
PS-empty-tx-testing-15min.pdfapplication/pdf; name=PS-empty-tx-testing-15min.pdfDownload
%PDF-1.7
%����
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-AU) /StructTreeRoot 55 0 R/MarkInfo<</Marked true>>/Metadata 1660 0 R/ViewerPreferences 1661 0 R>>
endobj
2 0 obj
<</Type/Pages/Count 12/Kids[ 3 0 R 26 0 R 30 0 R 34 0 R 38 0 R 40 0 R 42 0 R 44 0 R 46 0 R 48 0 R 50 0 R 52 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F3 11 0 R/F4 16 0 R/F5 18 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/Annots[ 23 0 R 24 0 R 25 0 R] /MediaBox[ 0 0 595.2 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 1983>>
stream
x��Z�O�F�����RU)����w]�H��IU�#�CU?�� �#��P���wfl�A�\HL�D�gw~��]�{<������u�g�tx�]�?�'��lr�gw��!�^�7w�tv7w/�f8t>���i��NN�������Bp �&�+���b�����o����dpx�=�LJ.�$&�W\(���.�{������Vf7t�����\D���7Q�:��d�7�g�������Q5`�U�`C�Z���Nv)$���^���3\5��D�3j]�RZ�u=O�����\����s��v?��x�m0�i.�a��f=��$ D>���mE�2Ft�2�& ������u/0��������Z3�u���\9���`
B��=��p�LO�#!]R\�iq����������*�� ����8~FtB�{�#��I
���<�����4�y��^����i�'T9�|K>8.\%���� ����<m�<%�A�U��|���=^�^���:K�0��F��d9;��U<k.#wrS�%\�-]�����R;��<q��,=��2.k��k������5<�b{�L-#�������n~�S�3���B��"l(�q�2�T(��a����0�ut�+�1��2��Y��Z�*��\�y�)p{�pI ��$"B��mdA��:�
���8�q��U��s�X������m���s��1w����6�-%�M]/�?�%0��=���/�$����dM����@#�����p���5t ��.qP�$��j���F��n�!=I#�F��������N"i;�c��\������\����K��m[�Y��]�� J:������<��%[�C�.4�.r�}j���H�8(���(���J�I�:��+�Gsz_�qe�!���81�P�U%O
},��qE�8��/����,���l��A�KdB���b}�*T!�,�b�/�W����#Z�q[�������I���zM��2�V���Z-��16��"��R�y������wQ:?2����m�[Y�
w��[?(0�p�Fv�2��� ��fSv��Z�|���u:�@�8@@8���7� L�>;Ho��]�B
v\���n�������4��E��L����u�j�3�#�h���AH�Mz���
��t|��p�N�:YX�8�7z:���d:�c���5[.:��`��m�K{�}aJ�B��<RCjh;>E�Fn�&k�@h��3>��F��d�O��d�iv��t�7�UY��x+�D
7W���VSA���@���$m��@9�uhb�z.P�4����|EK;7c<����������G����� `A��6}��/R�!t�lq����O���������Y=������S���]�K�����RH�S�8�_���z����uG�8 �(���b�Z�����cI�]����
��s����s�O��/g�|���|~�Z�[/�Z�-%^��I��2}��z/�!w{���������.R��P���[v}��8�*��G���w4x�1��.$�i�eS����1���}H�����.X�
.��������ZVo4`A�8���| wm��{[6=�W�`x���u,�"��5�>�����?T[D�OA?�����W�~z�~�E��/��<��(��r��Le����Y��0�2;��48���w��j~���&�C�,��X��F#j��V���@=���H�������68��O��?��H.�^`�����|���vP�x�h��s�)���� ?��
��y��,##UT>���\5C�����������~m`� /#�H�/�~�?"_�GIR��-���V_]'Q�l#2
�x��Gs�B|f ����l'Wj������HbS3��;��5�v]�2W�5�#�F
���K��(���N��R!���im�AI6zE�W�E�������� j���R�Xz]1��0����'/�L�8��\�b ��62Z������!��l�.�_����
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/BCDEEE+Calibri/Encoding/WinAnsiEncoding/FontDescriptor 6 0 R/FirstChar 32/LastChar 126/Widths 1649 0 R>>
endobj
6 0 obj
<</Type/FontDescriptor/FontName/BCDEEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 1647 0 R>>
endobj
7 0 obj
<</Type/ExtGState/BM/Normal/ca 1>>
endobj
8 0 obj
<</Type/ExtGState/BM/Normal/CA 1>>
endobj
9 0 obj
<</Type/Font/Subtype/TrueType/Name/F2/BaseFont/BCDFEE+Calibri-Bold/Encoding/WinAnsiEncoding/FontDescriptor 10 0 R/FirstChar 32/LastChar 126/Widths 1650 0 R>>
endobj
10 0 obj
<</Type/FontDescriptor/FontName/BCDFEE+Calibri-Bold/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 536/MaxWidth 1781/FontWeight 700/XHeight 250/StemV 53/FontBBox[ -519 -250 1263 750] /FontFile2 1651 0 R>>
endobj
11 0 obj
<</Type/Font/Subtype/Type0/BaseFont/BCDGEE+Calibri-Light/Encoding/Identity-H/DescendantFonts 12 0 R/ToUnicode 1652 0 R>>
endobj
12 0 obj
[ 13 0 R]
endobj
13 0 obj
<</BaseFont/BCDGEE+Calibri-Light/Subtype/CIDFontType2/Type/Font/CIDToGIDMap/Identity/DW 1000/CIDSystemInfo 14 0 R/FontDescriptor 15 0 R/W 1654 0 R>>
endobj
14 0 obj
<</Ordering(Identity) /Registry(Adobe) /Supplement 0>>
endobj
15 0 obj
<</Type/FontDescriptor/FontName/BCDGEE+Calibri-Light/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 520/MaxWidth 1820/FontWeight 300/XHeight 250/StemV 52/FontBBox[ -511 -250 1309 750] /FontFile2 1653 0 R>>
endobj
16 0 obj
<</Type/Font/Subtype/TrueType/Name/F4/BaseFont/BCDHEE+Calibri-Light/Encoding/WinAnsiEncoding/FontDescriptor 17 0 R/FirstChar 32/LastChar 121/Widths 1655 0 R>>
endobj
17 0 obj
<</Type/FontDescriptor/FontName/BCDHEE+Calibri-Light/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 520/MaxWidth 1820/FontWeight 300/XHeight 250/StemV 52/FontBBox[ -511 -250 1309 750] /FontFile2 1653 0 R>>
endobj
18 0 obj
<</Type/Font/Subtype/Type0/BaseFont/BCDIEE+Calibri/Encoding/Identity-H/DescendantFonts 19 0 R/ToUnicode 1646 0 R>>
endobj
19 0 obj
[ 20 0 R]
endobj
20 0 obj
<</BaseFont/BCDIEE+Calibri/Subtype/CIDFontType2/Type/Font/CIDToGIDMap/Identity/DW 1000/CIDSystemInfo 21 0 R/FontDescriptor 22 0 R/W 1648 0 R>>
endobj
21 0 obj
<</Ordering(Identity) /Registry(Adobe) /Supplement 0>>
endobj
22 0 obj
<</Type/FontDescriptor/FontName/BCDIEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 1647 0 R>>
endobj
23 0 obj
<</Subtype/Link/Rect[ 69.75 527.46 525.55 541.95] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(https://www.postgresql.org/message-id/flat/CAFPTHDaQFuASQPjxrYTcRPjF6exewjxXVyfuz1hCWJeCJpOSsQ%40mail.gmail.com#d9dbe3d195f1acccddbd81e46ded2315) >>/StructParent 1>>
endobj
24 0 obj
<</Subtype/Link/Rect[ 69.75 512.97 525.55 527.46] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(https://www.postgresql.org/message-id/flat/CAFPTHDaQFuASQPjxrYTcRPjF6exewjxXVyfuz1hCWJeCJpOSsQ%40mail.gmail.com#d9dbe3d195f1acccddbd81e46ded2315) >>/StructParent 2>>
endobj
25 0 obj
<</Subtype/Link/Rect[ 69.75 490.48 203.78 512.97] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(https://www.postgresql.org/message-id/flat/CAFPTHDaQFuASQPjxrYTcRPjF6exewjxXVyfuz1hCWJeCJpOSsQ%40mail.gmail.com#d9dbe3d195f1acccddbd81e46ded2315) >>/StructParent 3>>
endobj
26 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F4 16 0 R/F6 28 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.2 841.92] /Contents 27 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 4>>
endobj
27 0 obj
<</Filter/FlateDecode/Length 3492>>
stream
x��ko�8�{��}9�Z���H
��@��]l���w�C�(�DN�M�R����������IcY���yqN�l���8���~:yR���ey�?yzS�7��,�m��7��j]��������.�������?GO�?�>%$�J�<J�,���$g��<>�����������%�(% ���#�oN"IF�����[\��~y'��J�9��O���������x�g�E<�2������o������adrA�RCd�4;�m�4�DQ��Vg� s������;E)��L�0��~���.�����2I�����E&$S<)��G<���QB��s��/��zYI�#�"Ty[��5z��Y���������Qj�#*H�l(KM�����L�PO���v�-.j�����(Ng�?��|Vnc�f�O9[�1��/���
e�����h���aD\��fJ-��U*��R��|��J���ge���]�s5��SM�|V��K�Em��l��(��&�3����/�xNi� ������e������|�\��r�_��������l��m7�-��zUn�
q(���^0�i�L�T*���r��&:�~��Hj�,���!�6p)��\�N�$s�{���&�5�0�c�@h�(n$b�K��}!'#
��~�W���,w�CRoZ��m��+�9k�\���Ry�U�C����H+S�Z����i�\0���si%0��cA �l�\���5H�_@l`��o?{��o��jb�K���q�"&� j:s�%bp.��l� �9�(k-�_n�1�����,���4F�����#����vVi>����8;8�����V�*�\����w�!xN����q�����0w�w�[k�������x�������6�tfw�a��c�V`��<h�m^���a�P 2:�����
@V�k|�BGQ�E>��$U.p�� |v�C2���!�
�^�����Q�2��.|�p
RzSQ��Rzp��4~��?���T����o�#s$�jb$Vs��� y��DN���S^#�TJq B�6���ARmC���|<Z/�#$BH�n��m�v����Y1���0��KS�Q9�i ��};���P6��au���:�-��������O �3���?��V]���D+E��8�;w�yA��,�u$��@3����>�&��c�u
���*�W+�(�t_���f�����9��w����Xv�wy^_��K��� �{���/�������vzr\[(�M,�Y�m���X������C�5�E���_�m�l�<�O����}F/}�R(t�w�2-n�tt�rF��@�V���7'���)��<����J��&��iB��T�Q@��+m�����.�5�$(��B��.�p����\�`�0�IW�0;l���F����&����}e|l������X�
���
o�(��/�C���h�:��y������f����'�6�-�>��1S�'�&Z�L@��������������J�5��u%X���3�����^)�+X
\K|
2n�H���l�r���R���m��fd�\��r+�`Rn�J���7G�w�m�B�G� B�����Gy����s�X#Zo��K�y�^�(>��x,jj9�������z��Zh�BEp�m^�^��K91�7�L�2�����Hd�vp�Nph���4���X�!)���qJG��a=j�����]o�#f�
����w��I"�ky�&�[�M�_&��C=X�u�y�ZB
�M?`��4f���[���;)��b8#!��u'C��CwG�`�D�p����v���>���B���Ia�D��
N�ta��l0�
N�`2�,��abK�H��=�?�)���^7�$Q���Pn�\(�.��o�\zy
��;��<#����<���cx�y:�[�����L����o�c��k2<g���6X��m���Z�[n?��1��mq�)'���a�\0����������1���r��<�
�D|w��w8�*��
/`3��*#�#D�{#F*xb��\�������������m�_P�����9��l
61��"�Y��6 �����.D�F�s�Y�����'������k)VB������,UD�&{N���k������b���-H����hEM
"]���!r����^=��H��vq�@�������F�4 ��u��[��8@��:��
�,�#��1��[8�c��>�OQ]�f�����~h���C��Q���:���J��P��:�'0m��F��W�$����Of�������i�`���t���q����~vpXF��||��������#��X!�9�i�t.w^��`����<moc��s�����X�4m�P�������z�TS~Em�����?
af����\7� Z����9����s\;���3�G���pa:g�.��o�T�3����R��A
�v/��l!�]�f���7R2'��&�����:� ������_��^��W��rc
Z�^4����J�*���1�4���\AO�w["��Q��V��,�\02��-�;�
�^�����R(�I����t��M��<R�f�&\�|���������Z!
��v�v�}jD_����9u�dk�0���X1:H��Z�{�R�Bs?1�������g��L��;�������ha��kn��[�i�Z��m����G���p���4��t0������?��?�`��z�_m�dlL�~���e��q6��q&�������~��0��@ ���������+����$�W��{7
���\B0`� ��-������va����@4K�bn��A ��Qx;��h+���i�>�.t������\�I���yoa<�A@� �\0!����s�o5��.?�MulZd�
&���Z�y]�]�WI������#c���%O���S���~\�pBM��Se�� ���xC�)b�@��ojgT�X���S<�;�!�D&��d��w����a��
�Y|6P���%TE�]�]i�M���|��:����Pts,���S�n{�+p�@YXj
m�f�S��;e�dp�?w��A>�� ���� ������A-$�>Vh�}W+YJ�t���������~L���n��� ������k����f��M�������qm��\c�:��9R�{�d�C���\B){tc]���<���/������y�����-Uxf�������NQ��O�
;���`6����\w���94�SlTQh�����N����(������e��b�~v��p:Cwd�;8���L)IYa���D����27�|������P���} Z�Q�nO�3-$����-��^e,2Lhy���Y;��P��Z�B��;0 ���t�t��S�Lr�3�5���d�I�����:����|�e��Uk�g���;!����$�XQ�uz��)��b���
��?�h��
endstream
endobj
28 0 obj
<</Type/Font/Subtype/TrueType/Name/F6/BaseFont/ArialMT/Encoding/WinAnsiEncoding/FontDescriptor 29 0 R/FirstChar 32/LastChar 32/Widths 1656 0 R>>
endobj
29 0 obj
<</Type/FontDescriptor/FontName/ArialMT/Flags 32/ItalicAngle 0/Ascent 905/Descent -210/CapHeight 728/AvgWidth 441/MaxWidth 2665/FontWeight 400/XHeight 250/Leading 33/StemV 44/FontBBox[ -665 -210 2000 728] >>
endobj
30 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F4 16 0 R/F7 32 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.2 841.92] /Contents 31 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 5>>
endobj
31 0 obj
<</Filter/FlateDecode/Length 3769>>
stream
x��ks�8�{f������_�x��sM_��}t����f'��N�i����������(9�^g[H� �<^�./f������V�����������������/���W�w������������z~s��/~�!:|�$������eYj�$�V3e�3+��|��o����������<��%*:���� �D<2�%BE����� ��k�[B��;�����{o&��������t2��?�����=�����1���2�eMb�����k�<�,�>t�F��N1�Ew��Ep�r����8�)tn����� ��+KBs�S��I��T�J������(a�X���7�o��sDf�g�F�����D��+� ?=y�4J���
X�l�E���)��L�'L
��lz�u��u<��r�\�S=n"tgA1�l{��"ms$m��fq6���M%��i�=�kls��C�p��C�Id Jh����Y�2��H�Uf8�������l��n�H�ocI���Nh�Y���>�O��y��@0�v��e<5y�;�\�V1��KlY���2vD�Kf|��� �};DHC�@5���=PU�Y�� Px��y��S;9t�63������3L{��P�����\�F�e����T���FJ��'$�`8�C�_3�9`XN?���I>o,��G�%hw��b�_����\V
Z���LR&S�C�3�g2f�C���O�� &V;]�S�ZN�X"���>Ixv�
����!ve���j�PBz������9�Zi�1���O�B�����X���U��$��b(|�� a��Q���{���I���(���&�'�)�/���|
`����{����7����+w�*��n�o��]�����,��A�������;�A�����'��FW��'�������[0��4������I�h�����\B{��!tPB"�����p��]I���kM-E����g���A�����&��)qC��|w>4|������;��]��Cf-�(��|������l72J!&��;�+u��o���>H��3��p������w)Y~)�lr"Q/7qZ���.O�+�5���{jvz=�B3_��s���x<�G@�����3��h�������g�)���y���K| �:�U�pw��+A�n�|�jzu�DEt�7hM���;`b���G���7���L�#�&SBX�����,o�������e���z��M]>k��$���h|�3�e+��P�����,���K�q�O���*�Y���@I���-^������<Rh��:���%h
r��H��m8�����!����M,LG�
����P�[�h<\������E�Y�u/�e��r�r��P�����9��X��������J�]~N���l��2<%�&�F�3
�����o�A��� :��o��x�e,K C7ZD�>��f��.��3t�D���x��L{s���I��f�]�'�d�F�@�7�����%3f�$]�3��z������������IX�Lq����o��X<S</�gZ"3����TP^���Ri��$�.��������B4 �N��A��]BG�2�G�P����Y*Hi�u���2�EC"�EB����S�4�z��I'��kR����Gra<U_c�)� �\�)���,��T�iE����[4Fge�����{�S*/w[~�E�
���L����Ip��
P���(e"�I�-�h=��k+R�?��Sou��X%������L�L�Pv`��[*���n��&1�r�{ �<N�;��<).�6e~-�\Pow������7������!c5�U��g��x�E=?c��?��U�X�<8��
T� e4�8&�I�K<����4��8�����pE:��F����f���)7�����GxF���H?_�@�M�%�#�/�����I^JJ����
�-,-�Tb2�A��Jj�c��8�I��M�. ����Z�����4�Nx�(a���P���Q�(�w�~�7*n������*W���Lk��03\s�bF+��nZZ�j��gc�U�����_p���{����7�wu��n��h~��@D<�9H6�2���,0)����b�n%�hV���e��a�&"� ��=�5�;�L�����������e�z�S��4�z�*I�5��t5�-&�I�Rm���]��pO��@&���W���1��2L��M�H�s��`��$�����U�6�%`�eamm2���V���?L���%|T)a�&c���&�*�.m����5w�\�������R��'��E~.�����we���)��*��H3��$j��^���$?�y����I&K�s~h<7��b^l����^��&��C�n�\
����eG���=r���xZ���J�X�V��c����cX��
~J|�� ��fSc�D��6P���s� Ro%����_7�BS���Pj5va�~�=%��U�J-��^M�����"g~�1T��Y@��A�Bm��M!��3E�����W������[4(3-K0�'�->�v��6���-����J���Y����2:��AT`,-H�R����\�c}��6��
��
�.:��������*��7����C���/�~/\�`�?) !�y����.�����������b:0�&��f
��uWF����d��i����x��m��B�p�?&�=*�$V�����T�KC��4�u�X�*�����~z�y���\W����?/�4 ���`��y�Xb��g �p�3�R�N��6X����C�����:z�5����<����~�V�U�y�B��L�
B1Uy�/�;�����m�;��t���=�P�,��J`Zx�����t7]��+��-s"v-`Z���T��nK/:�i�4�&��G%�rQ�`��+�KE[PJ-?����Q~~����:�V��a������7��e����a���
c�^8�\�e�?��_�N���u�%�I���$?��m���8+_*RG�Mi��o�������E������a[��\7��=���o�T�La��*;L
k�p*
�F�b�\���T���NE�Z��~���5����e����9~�Q
����
|���"J|��vq�������a������|8���#����(wC��a�?�][rI��B7�������,�&4�Y.�������\��<��c*+����/��G:�~�"����Ij��K45@a�o�������&k0^�S���h=��s+��/��c�/�:�_�c����A�9������|�T
�=]T�K=�|�������6.eR#4�� �e��k��0��B��+�����|L�����>�I���y4OS��CvIn[G��2�"u�`B�$M���������:�.@��_ �%k��pI\���h�iQ��,�.PF���%_���c�'s����������v�]�U^#PY|?������C� ����v{�V3�@�������v����Y]�,m����.-�e"8f��"d�`���1^�'F������
��w�:_�� ,�JAg��|��e
������C�IF+��H��mW�S�&/]��IB�p�=�1*��p8
�t���CIA�����������!��?�K(����U���w�uS��2���D�����BH�� �� �#�����i����i B���j I���fP��6�����\fxj�>3����Xx��h�x�4d�%iQ�0���C���
endstream
endobj
32 0 obj
<</Type/Font/Subtype/TrueType/Name/F7/BaseFont/BCDJEE+Calibri-Italic/Encoding/WinAnsiEncoding/FontDescriptor 33 0 R/FirstChar 32/LastChar 121/Widths 1657 0 R>>
endobj
33 0 obj
<</Type/FontDescriptor/FontName/BCDJEE+Calibri-Italic/Flags 32/ItalicAngle -11/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1984/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -725 -250 1260 750] /FontFile2 1658 0 R>>
endobj
34 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F4 16 0 R/F8 36 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 841.92 595.2] /Contents 35 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 6>>
endobj
35 0 obj
<</Filter/FlateDecode/Length 12774>>
stream
x���k���q���8_
�o�/�E�&i�H�&�"(���q$^��F�_J��(r.����k�!g���(����?����?��������?}����?}���w�~�������w_����~�������������O������������?�������������=&o��}�s���?|������/����_~��_���g�_���/L��1�gg��7O7�_��~������������&��_~���^~���y���W�����?���/�����|��9c,a�<<�p0f�T�u����<��S�tu�:�4������oQc��H�||�G������{����9�=��������9�c�=���=����\����k��\ ������J����������������/����3<��2�������C��2P�4=~�����Y/�7K�����_M���ejP��0��p���(�
�����_,��~�����n����
V1����?�O?q�����x^���^�^��y{� ��-g�6����C4��pW�@H�i�����0u~�:����+���7�q��
��)����/�_�2��>4N��rLw�u��1�����O;W�:��;�nY���s�B�?��L��wsk�n�N��i�������X16��t.����0<���d��\�����t�����>���_��g_�[�__��T�o?��L9#�p��_�f���%���u��7K#V/����>Mb��r��.J��>���1,�Ml��=���������B-?R����W���:��e�����i����o>/����!�|*��b�����M�3���~M<~r!��O��1NTDK/�� ?�8c��a��x�$k�*���E���-����?�~5�;���?����!���������{�������O�?�l��{��mu��Fb� �����]�>?�C��#\)2�@f)���}���+%5;�r�:9��][������e��\��v�'V�;wQ
F
\���1 `���Q�M�s��f|�v��(��.�n�A�[.��9�i������h'��`���}2,
}�>�FXB����x��Q�X����-<dL^#{��kmY[��^���_�[���Wc^~O����u~��g����9*���6��5|M[��>�{��#np��z��v�c�m�z>�p\�H����l�3NK�t��L�0Zy84��E���&P��a����4,t���b�5�z������5��9�,a�yJ%\�O���1���[V0����UU��Q���/5����,un���GB�X/����p���c�';>�Gs��
���n�E��>.[n��}�����, ����i|NQ��C�9K-�eqyog�4�������&��*��0-�6��������}��&�n��m�Px���e�8,��9<�,�t.��y�T��6Y������K�[��,XU�%K�������W[Y���d�O��M�m�Y�nj��[�vxp�,JxWK�e�b��W��d�Q����m��J�����Y�fc�Mv��0���m�nX6,a�t�Y�6Y��.6|��<D��!�u�%�����Q��1���j�!<�l�W9�{����$A�p.5|�W6
,�C�RC�����-��ZL���V�p;��A���%K�~r]���N%g�
HI1?d����������&q��>�+�&���oQ;����D%|�K�>,!����]6�%��J1?F~�����K2���2���^J�4��0�Ch�]�n �~�����d{��J�$��+�~�����{�K5���B
/�B�[<���[-��N�]��a����sb�
�$�\*A���)����K�
5��!Z�n�!RnEm�}���T7������;� F��C?';p���%Y�Z���u����`�l�G�:��a�8V��(Y����`����I�t��|�4�Q����0����i���}[^��E�. �4/�}�Tq?��~�#"������C�� �q -l���������<��E�mc�%G+l��J}l'� ��u�[���,�O�
n�Q�~^�2�C�}�0����<
L��O;�9�#�����3+����C���i����"�@��2�V�c���>3cy�������Y���������R�d���`����{w��i��]*/��I�1�O�]�A6����
~�������EC�>���Y�(f���;7���>S�m�!��;��y�+8;��}�a�"{�)A���������8�@��d7�"~^ ���d0v%K08�~0��~0��~ v�~06�~p�'�O�8��~p���6ss��R?8�b?�}+��o��S�Q�x��u�!��>�#���[{�nyW�'�Z����)���C� �>;�z�f�kaf���B����r����g�`Q�C����eqc��)EI��5 P��H�LQ�
�<Oi����C��{����;���h04o�S@X�8�%��Z��)>�/�U,�
|�����K�G���������rj�Y����S��l[W�2�L
.�}�r��#$�4�6�P���������X���@���~��u��'rUc���mr9��c0�C�c�n�v(�'�0 Qb�����V
�3�����|���5j���L���H�5��e1�W��H�PN�E �����N�H~M�i���� jo�,���{3_"��1���$�I\��g���1yw�F-;x.���������|��t���;T�Y@���O|�xPD ����3}��U�f~Q�
��j��C���@�&�/�S������/<������M�1���?}���;�� z�a��r*�_�#^��2zg����L�\�mQ�
�������:��\�]��aK�Fj�_�r��\`�����S=�H �"F������y��T���<��<�����Q; 0
�����z��D�$9G^ ;D�EJ�"��Gh��}���i�u����`�Kf!�F+dA�~��m����SsP�<�2�S NF�����!�72\��9���m��� F80Gj��= 5� ��U��� wWmH�Y[X,es�.�l��� �T��(I�x� ������ p.�B�}GL������`u�SWgg�}=#Q�=�6�^@m�X�:�Q�@m6��.Q�A@m� ��x�&v��6����C��
��v j���6���@'<j#����`���� j��6�!����
�Mj��@'<j#��M��j�}�
���6����v$����6[�$J1���\v���&I�x�B(��� �� 2 ���d ������������N���|����N]���M*SS����Sc����Z&�*H#�@9f���&�@d��T��X�3�^U��ys��9i�9��g����������\��K�FP��fZ�r�:1���=��\2 ��7r$�!�� �X:R�#��4�#7�F�#A��:�f^�-~)?6��]~L�/-��E��c�W�nL�����m�\~]+,��F�gZ��KBEh��)��~�Y<��b�S����c��L�������K�������`2%���T����n���� Hy���������4
)Hx�f,��I@Qd_��D��J��5��A(7�s(��|
�
_!|K��>�*�2���e�u
|�c�O��[� ��z�BfA]����8��Q�(���������|��q��G(�K�p�
��-�c{�'���`��s�E?�����)���"�6��g�6��,WfVT�>�l� �;g7��Xy�q�Li�[j����\�wXt��^ 7N.�~�Y�n�LImMfc}U[%c�5�e6.�b�z�#w���gV��1�6�5���b�]�j�:������|:�d��1�AB�m��H������Hr�}gl�������u�`�
������x#�`��2,�QN2 �{Cv�X��B�8=V�C�=��v�e���x#]��� G�q� ��Nu�\"f}�{C��X�oy@��/B��{����Lk$/��x������]= u�J��s �\l���k����.�9�[l=�?H[lC���
�Z�Y��7I����m@��7��#��@gp�o2�H
�EbCXl]��o�8�9�z����l ������
,Y��A�����#��%8���KJ�����s,�Q&oF�F�F���N|���I�^ �[���>/SR�;��W�,Qr��p�JH�r�b�.C'\}������{�%W,B���E�
W]����j3���x%1h+X
Q�JS�v��$��Rr�* *������*%�$�V�a��L������4�l��^m�
����� ��������
���G�����bW4#hAi�!-\m5�_�[��E�h��h5����x����.BKRc���U� 7�����ES�p�x���f�ot@���aD ���{?�DEc�����U�yo�d�1���H3���&�bMc�����~@=�1�������k� �����8�X�*�����8��5��3��)���o\�d�4��d\e�
�~Q��1��A�����P����H�����s_��Fx�x�}���~���k3-j����U36n�� ��,�a�T�Rc��P��6���mM��8��G��V�0�"�Z����w��l��4���!= ���Z{��$�9~yW_|������B��s���q�$�:�h���B|�V+o���KmFw���&�fd�f��#�7�Qo�H`\]�
B�r��X��3��0�g$U�*�q~��+�o\����r'
q)JRN2�#��^~����������e�!��}�
�+�7�R�k��W�ox�@r ^��!�$�W�op�,�W�o�=
�����j�<�W�oC�D�W�ov ��B!�H�x���`r!�!���V#:�CCC�
�j�����Z�
��C�<� �C D�VC����0_������ $�]���!��7E�� w�oH�`pJ�`�I�`p��@�H�`lH���O���oH���E����N�����ar����>(KRK�)cI��{�\���\�$�j�&1��r��&�d�$gM:���i��Q�������W�� E����R�"%���Vvh3����-�`]��(����omF��oc�����JD�*W��Z��J�Y ������,��0��,��hIr5L��+?����Ju
[��Kb���p��\�K�3��l��r�KN
����F�:Qj��0����=��1����ZEL�2�0�<br�.`L���������|��R��2������;����]�L���S6�����Uc��5�+I����� �?���?N��q�`M��5�`M�v��&�V�������f4���h6eN���6`"���k\I���U����G��'��F���4��JS���(H�����= *��a''\�}��o�cOn�� p�I� ����$O��z�.t�0�r��������o@��u8��K�Dn��'��������N�~�O2r1U����"J
��(�-bE9]�%�"�V����W�f2(o�h�XE|P��
#E��k��+� ����������[�P���Z��(o�M@*�8v�^���h3%nkC����dQx�*�r~�|��'�8��a@���O���PY����x��]�g�H�Q��h��g�I�iQ��/�RlZ�� ��|��cy�������Q�{(Rr�f� TlZ�� ����E!�b�K�2pR-���=�-9��i��jDdB�b��K2x�-
+6-ar��Sq��V����D��\4�{(gr��u��WD��^6 t�P��cz� �X����Q���h�8�c9��y�h�,�B&�f�N
�8�#�rn@���d�=@�� �mI�:��&H@r�%��G*�G:��P�0g��D���a���`(�=a��=a�=A��=a��=ad�=[F8���pN��^,%� +���q� K��K�J���<<*�����k�@��a]��~,9A����uV�0wc.W,B���E�(���h
#I9t�ph�����kT��_���]z8����)�����
�Gp�9�BK~��(G��ZU5%�Hy�f~�Bj\>���V�h�5��m��� #��MI?Z^����p?\�����t,������P�k�a-~�����eQj�q���Y?��b�2���%L#���)�Uw)��o 6Z`�e3�F����.�$��>Cn��55]��$n$�����G��a�a@���_�j��0��g�����Uu��u��M��`���y{BF��^�%��yMc����I�$���n�����A�9��4����X�+����O�#�rj9�~��J������l��%7�V`6e�R��(I��x>�m�PW�`$�I�@��������^�`$#mf��Vn�u�������S�hN]������l2c����K!M�hF��W3M���K�o~��\�M�:uKJ8��4#A��y���z�p��f�PX�z`����������%�a� �Sy� �$�*�s>@}E �K&!��F9i�������% �N9��^U�f"�I������
����r$�+`�V���\�s?q����D`?�Z��9?l����l����r'0,q)JRN����%h&���Q�q�f��g���1��
<��g��m���~�����H��>�?s��G���B(�uVc���@<i�� ���{��������{��N���A"��3= �c����A �?��G���1}Z&������@���{����`� �g�*�}IvF�5I���X� ��xC"b��������!H�,c���9HDB���;���^&���v�yNL��ox��gr��{���@�x;��a����GZ� �i���y*f^ ��.����d�'5�"~^ �;
%h$K08�~0��~0��~ v�~06�~p���p%h�~p���T�F����OU ���5f��L���XO�X
Ir�Z(Cs��^�FZ
�d��Vs��|7g+���<�?�5����}���#q�X0�k���|��_)�e�
:l�V�XA���y����>n���TF�is��6H��d�8I�p���5��X;F��\�5�^�n�>�7�j�d�in��F��9��J�8A��[��3��^�{Nj��_������]���TV����z�u���3(��
���(��X�Js�v�����6�:C�>����XU��($Aj|38����fu��Hj�Z4���K6��l8��O)='JR���t�^mFr^ ���5��r���Xx��,�'M����p����k����$H9�a��'Sv��g������� �W/y�C�&K6��V�m�H��[�,]g����������6���n\K;^�{M���,��y� lN�X
�uF��Hm�:6�}�$/MoxQ��E����$�b�b��f�J2����Q+�UZ]���$o0e]����F��/�JVEj���[���%��rnr���^���Sk�j*����d�%]��A*-� h�=��#�(H
q��`B�Ej��P*�^~|�i��=�:L�d p)w�[ ����'f�I$<g2#9��<gJE���������q�b�t�d�+��-�3
�����R��2�d����3��J��U��'��4 ����ef.���&�zl�&*�M���&{�u��J�����9�i����������l��%v��~%�D$�D��E�L�~���P���6) 8�.
��G�N�~6�������?��DE"�Pp�E�icQv�8�b)���(6����kFmt��<PQ�FG\�TF��
v%�*�Q�����#+�'�Z��$pE�icWD��!Hl����%�7�w�[��X �HH�������)��rC�=V;)b^��Y��T��O��J�F�!�%���.��SE�j2��z������d���3�����RL���V9��5�*&c
���UfeL�%i`V����J� g��QN����c+��WT�)�Q��M��A�?��d �+w�x0��sv��$�gsx�WS�Z�^�4k�=:�=�+VM� �������W���`����������~r�����+Z#OB)��G�[z!�~����J���eb�uBrj�h��+�0���"��PQa����Q�b$�s����P5���I�����
'J�o�H�_U�~��J�����[y ��$��A��5�#�R��b�T�Rc�cEL��7������6��Y�{��i;��H5M��/��X+�����9�v] �\[~"AR7�kv���A��5������[���Vk�W`�z�?��A�*)�6�$i��+"Q1�%9j�m����^��.�I.���&��`6��J14�E[ 'i�� f�j��~����T-�y���j J�>A ���W>���Q�Z�L�>\�s�%@�6� �4�$%b^�*�0��0V�����M��Z�Y�|R�c������Jn|�@L">�kNI�uS��@@�T8��WOk4�E���v�W,GrMyxX�w��e�;��q�L���9���W����?��tK����XoJ��U��-uC��i�KV�r� \�t�/JR���C_/�b�O��h sb���6�aS�|Jn��f 67��b3��[��E��k�d���n�$�fRj��} M3��[�A�f�j��= A3)�[������-�P 53��[$;�����-�H�LJ��ddC�A�J�LJ���b&�v�nO�LJ��da&�v���L�LJ���_X;vA%_��Iyb�� ��l�Q�x�C(��� �� _�E��S��O��K�bG�cC��~�S�E��.��[�~p����P����"fO����$��
�����H����w�kT�<L���n�a.��@L�
g�0G�>�"J�n��p��*����`bd�*S3pG8�/�r��b��!p`Q�)��������.���u�d�=.l3��&&��fE,Z����J6Fj,�1�Lz�v��
�&���P�k�p�^�D.S�1rs��l���4*B�����XK�)B���f��,�"f����U�UJ�t�pn(��F[ ��1�#���C(*��}����+��������L����A�F�h�2������������e��-C��������^pyh4 ��t�$F��U���}��s��~*���E�X����~��D�!��H����D��%�"�V9��`����\�@�� X�������HE�e����O��Z�X��O�D+�=�t� ��_{�� i�#�0#�8��-Q����T���Hk�
0���
�Q ��'�x������8�>b]���@4�O������g9Qj�}F�@���Z���H��[ej���'�g���fB57>o ���'vWI��7dX
�v,�"�o,�Q�
�\��4j��N�����8����6Zs�L`b�$9jM\���t�<������xM����%�c����w�k�dY��`��+61tL��Z�[�>�p��<csU���s�=|�J�m6is�L:�6�^�f���� r3����!��l�� ��b6��!�3����g�(f3�e>�j���C�X����m8'�_�sd��*���6!�#��b�:�y�#���PS�&5�Gw��b6�Q��H�I�x��b6�M �H6!T��$!=[n��k1� ��v ���
AZ���S�.L&���(K�$B�Dk0���0p��0&�� ���0���0D���-9��bO8�Qh/u�
��{������n����g l_S�����6C|��uo�/�W(Vo�I��\1��:��=`ME�F������Z����g�kT�lO���%S�,��N��HeN��������U�I?�^�Y�zN{�,W�d P=M�K������8�P��VezR��-c?���C�fDz(�5H����M����*����&��J�n�yN����ohIb��7�����PS�&��{�b��9��a"��&S ��1�#���T�aD)�����U�I��^�[�x��m�����m� ����~�L�W��| �H�5��N{E�Jo ���2m� �����C�X�*�����sji:
%�smE�W��rc�U{��8��3���5{�j�@$o`��p����4V�!z������q���C�L p�!�V�"�$9j����N�9��.��
�9;�{v����N�l���+��]��H�5j�j�\~$�I5@;g�(�z������~w��������
��J��Q=;
y����
8�$5�6�u 7TU�I�1\�Y�u�B�&���u.�����O�|��w�-I��g�tU��/II�
�\]I�)N�Ny8��*�s�$�t��C ��=ZB��=C��}��
8�X2:�Z�9��`J�����9U�h�o���;'
`&�G������A1�T�����h�]N<m�74�N��'�m�x�rp�I@�8�dP���Bm""$��d�"�A{?]�d���A��H ��d�#�AR?@A?"�� �I&��!B2�H"�d������ �$$��+��mB2H6
Y�( ���2H6
� q�J��� ���� �($�b�d�$�A��H)6$�Q�h�Dm��8���@���b?�����P�y���b?�%�R?�R?�b?bR?<R?i?\����(#�������sC�{�aS��$�r���.��lV��_�����Ve��=E~.X�P�g ��~X"$� �0�@6MZ��+�I�����vW3`���Y���%����: ZlV��t�$H��g�����H��K�G���W�m��N{N(�s� ��� 4�X�u�Pr?|S��p�Z��o�Z?�r7�A5r?��ZV6��S��$5��yd������n�8���TrF�b~���t�9�����8��46��S�K��5g��=��e�:����,�+}UYM� tt�����y��w�$�<c2�����$���8�4��Q����z'G��A������������p%n$������hFgw|�u:�7"�0���%k�8j�� ��/�eIj�}��Y ����.-@���/ ��A�������j����$H9c����x�'!������x�F�T;^G\�\94Y�����U������8��NuZ]��ot�c,ot�����<6��SzM$���������r\*Q+�K?�\�W ���E_\���^��]��������%)gEtgY��Gl�WDA$I/?`���'7�q�k�����]����3�Q� ���.����%�" zU��|V�(�+V du����[%��)����lV
I�QI�+ V����5����S|A��~�d Vw� @wlZ���>�5w�*H��a��3Q�S��M��~�����02z�K���\^�(�z������������5��J��A��XT��3�@5-l�a���6��a�&�T�A��^�j�^"�t6`� ��PM�RMG ���j���j:���piH5�=�j:
� �T��T�Q���
���F5AO�j:!� RM0D3�}�d��j���� ���lRM�RMG`)d��j���B1
�����T�`�lRM0��j:
��EK�����R�QH5�H!&� �S��
�}�G��(��bPM�M���6�t��,h�������yk��� I9���Z0���Cn����;E�V`4�`�
�`�I�@\ �`��` ��d;8��^p���^����B/8��^��Q�*7�n�<&}�@�����Hm��$��i����&�H�aX���@��x���
��I�@3�P�-;����dH�������1����:�}IjH�yc�a]Y�0���w�R����d����N�1��A���q���+���5p�T8�u�i�B�M�I��Z=�6�K�;��}o�M5K��h����L��Zj����p+Y����r�Z����bk� Vp�~�zM5��p+��g���<P�<�����'���������.��v�W��1�Gy��Y�g�qTL�J�'�~����v�u��xsgc�ec}y�$�fop�u�Z8��3]I~�8F���� � Sq����-������5�������>��&�����^�cawq-[N�f���5wkk�=nC�go���
|�A}����L|��_�E���/�9����gP���� ��l'���>T��))������%���}�EP��lF�b��fND?�%��+*_�����_J$^o��+Uo�*���O.��f�����Nu�h����w~j�NU��i�����a�F�z�U��w�s��y��}����l +Fy�����
���au�}*��Y�/n�U�3�KE'�qsD8����k}���9�E���W���"����1�,(��zIW���{z O���
��Wy�Q�����������UR�Qp�a�0��c�eW�1�C@��BA����p�a}�mw�y^
]������{�71��`�9M�C�������e��������7x`��r-�/�e����7�0,
endstream
endobj
36 0 obj
<</Type/Font/Subtype/TrueType/Name/F8/BaseFont/TimesNewRomanPSMT/Encoding/WinAnsiEncoding/FontDescriptor 37 0 R/FirstChar 32/LastChar 32/Widths 1659 0 R>>
endobj
37 0 obj
<</Type/FontDescriptor/FontName/TimesNewRomanPSMT/Flags 32/ItalicAngle 0/Ascent 891/Descent -216/CapHeight 693/AvgWidth 401/MaxWidth 2614/FontWeight 400/XHeight 250/Leading 42/StemV 40/FontBBox[ -568 -216 2046 693] >>
endobj
38 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 9 0 R/F4 16 0 R/F8 36 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 841.92 595.2] /Contents 39 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 7>>
endobj
39 0 obj
<</Filter/FlateDecode/Length 12444>>
stream
x���m��������_�w�VK"��df�] ����`��8&n�����XEQd�P�n���>��:�*�"�S��/?}��O_��������|���7���/����O�?��~���?~���_��_�����?|������������_~���|��/�����o';��/��m|���_~���/��_~���_|���}x���/_���/'��/����}�������8�_�ny��g�����o+��W_~�����}��_��a~����?_���_~�oN�����{���3�������0=��������r���m�vk�������Gt�n�M�|p����|���_���o;������7hY�nrZ��������m{�;������������7�1K�����}��_�����o� �������WU�uC�m3]�������<�F���8M����w��?�w���������_������������������?������a�^�����y���_�����-iD}�a[��T�$:��OO��]0MZ�~8�x��v��������v��L�����%�zo������Kg�p��l���Wx�<w��"9lN������<����*�=��K��������7��~�����\� ���Y;�c���}��_���+�a�hpJ�-�pt8F��sXxY4x?,��:��(����{�_;EA+�����#h?�-�_�Y�������_�������asS��y �|�q/�� Bw�9&��QF�������"��q�XX�6r����K(q\�8�e���c�G�������n���{��]-.�?�������}�(�o�����Y<�|��
cf}++��������U�������-�|��s�V��E;���&���a����}��H��m���/3Adtk�u�J,���YTR ���v|t�y�����}�2�u��a�*��H2��t��������1�s��^n�����f���v���aiv��ar�?��wK�<�[������
�wp0�+�m�
\��u�������6�c���6�`��uf�=r������`��!Z�\�{��1WtqC�bo�:K��[�x�n<x@-`(9bP�d�2-�����\��G����
c�z������7j9���@���[���[����i?q>�it������ Y�.�G� [����Y�=CTb��q��M��6��=�� 9������H�Dxp?:�z�G��9
�n���D�����O�m��_�W�G���������y��D�I�$���q[�����3V�T��)������nag�_aX��^�C�d�?��������6]�}5�9m�C�qsC�k�]�Uq�$9"Z�2�n�����v��k�
�Q~��q�bU��Ou?��.�������+������;�5%K�d�|0k�M�Jj��%f��Y'!NQIe1i�� �G��j^�M���5���������s^����J��������5�Lw<t�Wo�f)�����d���W(�&����)w���� �����3�����-f����} [w/l2�f��5��3 i2^�}IR����lKB!g&J��|WR�����o�h�������&s����)8�{|=������`��d2������
1&Z����g������I������7�����2�����Y&���53������g�L�$A�(%�{�[S��(Z���
��3�~t�����1j0=������-a��Z��#Kq�E����3.���c�UA�]~���aQ��N��yh}8������`���"b���y�]�����$Zh&�AA�E�������7~[�>.��E��R���D|�����0����O]
��#���z����[p�nZ����'�K�M����9� �/6|n�]nd�����
�����[s,&��j���� �#s0*���Q/r��_>�m\������K������4���i������t�[�;e��1�l���!����]d
1q����?�/VV��L�a������XR�?����c�&�����V~��0������1����gQ��&����[����8i������.��D`�MOF�1�'�~��<��~2����1�7�����"�f�ME�(S)G��<�N_~���N���9�-�� ��� r=���`���C]&�j%�JAkN���D|� �%lV��� ��,�@�
"���J�(�h���n�����(�+��{���g����$C����������������sE2�rPh�M<�F�I����O�Z�L��^�����?n�v�7�X:������\�R�c��>���_�0!UD |2 ��@L�����3%�y�r4������p�l|\� �����y������
�,J���b9��������q��2?�`�)�COE��/���+� �x��!+Jq�w��{bH�k�gPb��������?|=�9��cq�P���A����l�SF����)�Y'��T��<�>��������%�"J4�� s��R���Yd�%g}\Zg5&�m��N�"0�\8�#������p���DNE��N�o��l����G� W�n��i��������T8eI"��6D�($�� h��~OUb�>��i��nT�k��a��?���6�<T�^A���J5Y"��8��^�F��*3���7�D�6{2vv|v��Ky��9�S��;4���`p�0���P�m����?�O�?��4�I W))E��%ay����<�Y���(�VO�����z �RhT��-�o���_�-��|I�f��[��&`4�&���9##_����:b��O�j7�����L���x�5c(��@���a���L02�Yv���`��s8�C���h*�����F�����N���M�-��d#��G�?q/hve;���
� ��l09"�)��J�H�9�Y~�V�����dS��|I��;�Q$VJJT�%"�=�J�`z*��?Q�������pA��k[��X7���?� h�D'����g������'�I��� =�* =��`^�X��
XO,�s_
����{.B�1?���g�G:�����h:��X(�U���mD�#�@R���S� ��|�HI @�O�hHZQ\�<.�T|#�A��0h:a��� HH� U �e"�B��`P(j�"�BRx�
x�e�9����BB�S�!8������_�� ��N �7B4%���E�-�M �(DC8�BP���,E)�r2��J$��$A��B���$�(�9�l���(j���fw�&���UX���N�HX�-I�[U9"���y�#�p)�������@��-��rI "�II��"�.�s�q�t;��f�8��P�S@�W@��r�3e�HA"�)9
��O�
��v�t���P�lq
�����/��]�Bt,������j�kb����l��;���,� 96�i�2�@�t����C5�Co��j30����a����v�$xI@�$�#Mk��fU
�i�p��6�M������}J���=�$�-�o ��I�������
�4_�#�yr�j�N���:d�p%�'���<r���S�D;��o+��p�K�#{i������_�$�z�b-�r��TKU���R��O�<�����>���&��{4��K��0s n��'X��o�8���.C�~Q��������T%]��@������G�����J��.U�}�:�y ^��b����}`?���f]3lE�T�V��Z�}3m��M�Yi�� �zCR��-�U��Y��>�U��2�ko�����j�(�wZ�#r���&0�d�mH��-���1T$�*���L���V�Y���=��RD�L��������K�l�T���`�e���n!��N�D�B/HY����(�
r�q��:�(�FG�g��e�IHd��<p(� -s=(p��L���6�%|3IP'^�o&�}�M����H���k&#|dL@��LFP�t �LFR\ �G8��� Q��N���~��6�����|3A�#�Ch&#)r���d���4g��HJ�%|3�82L3��p,�;����>�S=�}��an9?g��"�s��
>�9��`�V��D��$�)I�WL����H"���f�$�*�`*�_�i&#h�C���(<�A
��%�?������!�f$�U��]��x��]�� }��E@I"�[���i!�C�g�hY�@nG4����.2�^�^Q��O���-�Gu�����2bT��,�,��Qm�
`������
�� ��*2h?��]��i-��L� !}���,�eL��2[/%���{��� �5`4r:033�,I
r�6�>��"����$�*�4cfZ(���1 C��V��C��l�GCQ*������g�bl�78��U<�y���!O� ���F |T����UuN[��#B����EP
��L���=��0�"cQPp�(���M�Bn��0�DJ]���'�����<Yn�s�[�����U,������]�&}8A���:W�Ki(�-
�@��F�$h��:��R�}h��:ht��)��P}������)��<������z���d ���y�is~��Q/Vu����8.�Ri �����>��S*�$5�K�����>K�9�"�UI����tT�s P�wz�@�"<��S�"JEtKh�>m� E5~i�m�Wyb�o�d�L�[�
N�\$y1"��-������� �9���U����f>���YU�_���S8�((NE�
�I��{�ist��$���������� ��f�7
��G���}T-H������G����>*�& Z���GE�j���?rI�-A��GE�^���&����
I"����x����]P����StH/)�E�)����P}T�%���� 9�$�QQT
��>*�&dbB�H}TU@��*��J=����TD.N�4�_����/ S�)�=BrPF����p"�`���`��2P����(�(��"��HN��2P�&�(�,�`��_+]
a�Iznp3f+��B�QK�����uE���7\��a3f{��)55s��[�:��Ma�`.Wh�Y���/��13���v�hq�#3�R~�o9��3� ��
�`H������5^�-�d/��<.�\DO���z�f�h�R���zG�u��
�� ���`��=e�_�^r�����e����n��I�{�E�2�}�{
-H
��r28Qq2���k�L�D1X�L���4��)�����@���)�$����`X��*�7������lw� d��=�����Eb0K
�H�
�J�o0� v��/mT���(4�R����Lh@`��,I
�z�Un����*u���k�R���<�� y�~����k5T��;��$��2�_�baP�:O<��R��(( Q������+��M��5L��h���T���z`�h=Tk2HE�VcT�E����=�Zl{D����chR��2�o�-�VT���Nd�X@����SZ�`-���D��Fx��)�1Z���N��jpJEMNz�:��wm�r>jkmTI*���L�h�~�5#SD����t���V�[w��0�2Qd��nW*��z.��gT
kU�R*�WXV����&��B �EAQ��DH�� �W���)u��?�L������o�G��f*��L��1�^m���-3@��f*�@Oz��
�E:��f*��M�����-4��f*�DLz�������pR�7S���z3�8�����Z3AB$��LER�H�5St 8��m�")d������a������z��
L��A�>��~n�>���H"�c��
>���k�T�(��R��E|M�"G��l3Q���$�,��\�D /_(Y�@�IZ�@!���� |7F�Z���W�`Hj�z���������7�
�r ���.��C���3�T���:%�D�$D���"" �BP��y8�Y��/�D�+� <�w��{�p�Tp����VD����"�m~��f�����Uj�^���F���S����rbD�XeF�o�0#v��+u�������by�,�:K(��8����v��}u���<���l\@F��\��u��K�+�^y��
�����
�,J�sL������5�om��K�m�.�F� qy������0�;���T|m��@|��wh��s.�u�ti�'\����w,`E`�]����"^,���G��1�;R�-*6�/����Zw<��pC���$4Yj��+�r�"��LC��DA4{��w���#a��r��T8q�g�E����=��s!�������'��I�����F� �y��z���VQ"�K
����%'u*�?Q���xZ��{R/�`��D���x������-��F��G����u�:�0z�f�,�v��Up�`���� A����p�G<mR��2�(5�)���3����U��~���e�z�>��N��R���p/�d�������G���<O�
�]�*�����+NA�
e
�����H�����������U�����y�x"�RR��,�5�z���J`=�u�<Q������6�h�~���������"�����fP��jF��y��Z�H_
��Gl5#������o5�|/ �P��jFQ��jFS�i ��N�[�H>E��jF�)�z�tIj5�hJ5��f4U�BUR�ECg��f4U�BUR��K��V3�S�������:�)N�������*y�Y�z�D���D���B4�S�,�-
a��BP���/� ���mg!�Q�_���0g�/�-�0EM���-��2��B�F����4��K1�����3+�VU�h��
v �c������:��-���of������OE��D���)G�*e�|�����99$�����+���Ri�/ z0d��xor^3��(9*d)�CV�<f%x ���;,���v5����1��p��u�2^��X�����c������G�rt�*��O��[�@��@
��*j�PD����;�l�#\+4���Q�>d����&�@��o�;��������6�v��S��f#���J����05x��s3�%����3� ���q+k<�0�`X%q��k��I����G�3f��_�?<�W�<y�Yq!Gx1*p��2����g�d �-�k�[���:�>d���
��<.@W�Z��%]nn����*�����q|O��:����}�������)���'������:�B` ��<�����|q<O� ��/H�>Q�M���9%unT����NU�����k-�������� ��T����=�����LN��{&��4�8w�Z�A�����,�p�
�h��������_��6@8�3�$���=����6�o`OA����)�7�9���;$�8����m�[)!*�; j�R�
YJ�����{������i,���o�����yx������qBT�R�w�b6
cSm�S4D3,�����Q3wAd��X!���(H�lH��;�6tLMn/�\��_RhM�������fY�Z�9$D���0��!�0A���xE" #��9�K!�}�)A@0�E``�o���#0�"`��H������p�.)B���'��j }���x�#�/��{� ���������Hd^D=��`IH�����H�]������P�$�p�;���d�/�c����l�^�r����Y�������-H@��X���� ��&9��</��,H@�����WIV.�Y�������z��R�Z%���M:���$�]������_�q�2�,6���2,�$��b�i1l9�"�V_�T2h�kt�<�����a�F-<�BKR�K�����G -�e��H��G�����Z��a������F�V�c�a�� ����6�1�� e�0�iv����]�B&�p
�:.���*�grk�h��t`f�Y��!�Ml�#F�����XVy��S�`�!)������!
��.��NQ*�&�Nl�Cz1�6OD�*zR6n��@u��:�>i�����I�>S��H�
�_�n�����<��a,��h��s!
3���R>_a�{���4He�y�w����9Od���LFk�#�P��k�eT��t��/��v����/���s��� ���dIj����U�OKL��FY�-��k� �Jw �h. l���
G�����UV�l���+�&��k�]��`�$�x��!^�2�x�����+���J��m4L�$��On���(����Gg}��9�!�U���� �������|��#5?�+�T����i�1zs�p���c�������n�C�2����� 7$�� P�'3 H ���,���Z�$lY�$��%�Y))*��s������ U�`���(� Q*|[���� ������-�g>�Z�o�/l�����J oy0�.|�[M��I`dt���}�{=�-p=�*�H�*'T���I�D�a_��~�Y��UN� Y��UN��@�t��r�� �C���IT�
�Gt\�)Ar��#:OgU �"�����UN��*'I�*�����HP���I���JHs�UN� T@ �*�Kd�W9 �D��N���$gSx�]Nl1��)���X�0���@uKV�xE�NI�O���e|����$'VD�}Is[����d05���.�0�$=��S�t�#a���H����+c�Zq�����G������J��;��)l�B�Q�2 �b�����+�u�*�R0X������R��*��S�u���b6%��)��2�o������#!kp����E��)i�2�%GS�+U0�(?�s�[~DYo��n �b���������������i�yS�&� ��(�E ���'�5�[�����'c�p��
��-���%^<�� (�M�HR���'� ���o�?�������.0��'E�����h�Oj�A��q�^���B�L���W�� 9;\c�����3����q��9
� 5�EAj���4���S�T��.I/m�sg�-� c����f�&��&i�xIA�/p��?s�����%�������5�K���J�5�J�2M���
���� %��P�)����T����!�-�kN��������9������g�p���h�d�~\��rI�m�G�'�"La�J��!AGLa��y���������������������������e^���D0���H�x��H�iO��/7z�.��L������t�����B��KRf�����/k�����u���b���Z�� �%�Bn���S!5��[��T��T�vK����0��]C��"���� !���#�A���I~K_�4�M�����i�����'�a.�H{�m����_9�t5�������9���t��O�YH��K�����W ��C8�fU �����y�~�>��u��"#?��WI~��j@��&]J��o^P���$?g5 �� �"�
>��'I~��6$�x��p��:�S����p��>f��������6�1�.�+S����%�� (A�W
�}H Hl^W����%0�����!�u�&9?1a�
� -I������W�@Gj�ru�n��*�1�{�7SU> �Q����9�k��H8Q"t����w@�N�C����q�(8��E�+ z<C`�� � -IQ*�&�= ���Chs�C��B�CH��is���F��"��b1���t������7`~���)�YW�q�@(�%�l�!��� ��*����~ca�Z����39=��YB�zI��%"�k���U6��,�f��8%h����z$�����B�N����`"�����c�V�2�*+R6vgs�Y�RR��"��>�0u��|L�!cF��Ul��D�O��k� ����U&��T������q��B�d�A���:09CR�v���2X���F%i-1n���q���C��N�k�������F_�*i(@%J*A�F}J����0gxIm�����0���"&�� e�\@l@���#�����x�
��,j��H�6� ��} t���?�[3�D�Z�N�M��2�����>�'
���A��!4/aD������"~�k�a�C)+��M�_R��(��oO��d�����E$�g�f�m���������T��lJ�� V[�������)O��8���$�QydL%�w���A�����
�Ap��V�py�c�D�\��R1�l��\0��Je�#�R;��~���������Z��ay0�.����8�S��\�\D�gN-&ior:O��},�R ��k����y�-i��\Z�\D��n��!�b��'W��D���K���#�����4��EG��*`����U�<��#����|�N�%E ��>��|#�����F��D��� A�4H�
�L��
�sm�r���A�3&�Qr:!I 4��2�:��0��kJ"�EY�8Q�S8
���:R�)N������6!DQ�zzD���D���B4�S�,�-
a��BP���/� :��Td!�Q�_���0g�/���0EM�1�=5�7��%G�������T���������{�_xP�H��yp��g��*� �-!o bE�UL����vc�h��X���U�<����C��Lc����y�Lx�;7��+&~�{�a~���d��� =V�� ��)p��}�����%���*g{Os6�Y�i��4R6��R� V�����4%���|�������r�G�,P5����%�:���Gz���V9�Q����T��� �����~��������������=O�p���-I�oM�_Q� �o9�#�V���A3oi��r���[�00Gk]1L��w�R�����r���+�o�S��Gy�3H���/KR�=\1�[��������<��M� ������?�D����0;�.zn[w����PM�[�&��|�p���p�*O z2t��kO�B��C�j/�����W�N����f�����t��r��$H�/t���-�M��1��h��9~�S�Bo������x�t���7� ��� �������m�{�H�8i���U1������=j�F��Go�C
��>=�3�G�h�#XU ���"GHUy|��1C�&����zhI*ji��ZQ3�z$������A���:�����G��m��m��- ��L$@}���$+E%P��'�QK��h��Y���=/O��#�}��A=*$i���%����{��<���*�����o4�tS�����os ��@
��5r9��t8����w7f"��� Z��>!d�l�r�
T��>
dM @y���' 0����V?!D= @=[>��k�P��ly�AT� �FD?�"