repeated decoding of prepared transactions

Started by Markus Wanneralmost 5 years ago72 messages
#1Markus Wanner
markus.wanner@enterprisedb.com

Amit, Ajin, hackers,

testing logical decoding for two-phase transactions, I stumbled over
what I first thought is a bug. But comments seems to indicate this is
intended behavior. Could you please clarify or elaborate on the design
decision? Or indicate this indeed is a bug?

What puzzled me is that if a decoder is restarted in between the PREPARE
and the COMMIT PREPARED, it repeats the entire transaction, despite it
being already sent and potentially prepared on the receiving side.

In terms of `pg_logical_slot_get_changes` (and roughly from the
prepare.sql test), this looks as follows:

data
----------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:1
PREPARE TRANSACTION 'test_prepared#1'
(3 rows)

This is the first delivery of the transaction. After a restart, it will
get all of the changes again, though:

data
----------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:1
PREPARE TRANSACTION 'test_prepared#1'
COMMIT PREPARED 'test_prepared#1'
(4 rows)

I did not expect this, as any receiver that wants to have decoded 2PC is
likely supporting some kind of two-phase commits itself. And would
therefore prepare the transaction upon its first reception. Potentially
receiving it a second time would require complicated filtering on every
prepared transaction.

Furthermore, this clearly and unnecessarily holds back the restart LSN.
Meaning even just a single prepared transaction can block advancing the
restart LSN. In most cases, these are short lived. But on the other
hand, there may be an arbitrary amount of other transactions in between
a PREPARE and the corresponding COMMIT PREPARED in the WAL. Not being
able to advance over a prepared transaction seems like a bad thing in
such a case.

I fail to see where this repetition would ever be useful. Is there any
reason for the current implementation that I'm missing or can this be
corrected? Thanks for elaborating.

Regards

Markus

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#1)
Re: repeated decoding of prepared transactions

On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Amit, Ajin, hackers,

testing logical decoding for two-phase transactions, I stumbled over
what I first thought is a bug. But comments seems to indicate this is
intended behavior. Could you please clarify or elaborate on the design
decision? Or indicate this indeed is a bug?

What puzzled me is that if a decoder is restarted in between the PREPARE
and the COMMIT PREPARED, it repeats the entire transaction, despite it
being already sent and potentially prepared on the receiving side.

In terms of `pg_logical_slot_get_changes` (and roughly from the
prepare.sql test), this looks as follows:

data
----------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:1
PREPARE TRANSACTION 'test_prepared#1'
(3 rows)

This is the first delivery of the transaction. After a restart, it will
get all of the changes again, though:

data
----------------------------------------------------
BEGIN
table public.test_prepared1: INSERT: id[integer]:1
PREPARE TRANSACTION 'test_prepared#1'
COMMIT PREPARED 'test_prepared#1'
(4 rows)

I did not expect this, as any receiver that wants to have decoded 2PC is
likely supporting some kind of two-phase commits itself. And would
therefore prepare the transaction upon its first reception. Potentially
receiving it a second time would require complicated filtering on every
prepared transaction.

The reason was mentioned in ReorderBufferFinishPrepared(). See below
comments in code:
/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/
This won't happen when we replicate via pgoutput (the patch for which
is still not committed) because it won't restart from a previous point
(unless the server needs to be restarted due to some reason) as you
are doing via logical decoding APIs. Now, we don't send again the
prepared xacts on repeated calls of pg_logical_slot_get_changes()
unless we encounter commit. This behavior is already explained in docs
[1]: https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html
skip the prepare.

Furthermore, this clearly and unnecessarily holds back the restart LSN.
Meaning even just a single prepared transaction can block advancing the
restart LSN. In most cases, these are short lived. But on the other
hand, there may be an arbitrary amount of other transactions in between
a PREPARE and the corresponding COMMIT PREPARED in the WAL. Not being
able to advance over a prepared transaction seems like a bad thing in
such a case.

That anyway is true without this work as well where restart_lsn can be
advanced on commits. We haven't changed anything in that regard.

[1]: https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html

--
With Regards,
Amit Kapila.

#3Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Amit Kapila (#2)
Re: repeated decoding of prepared transactions

Hello Amit,

thanks for your very quick response.

On 08.02.21 11:13, Amit Kapila wrote:

/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/

The way I read the surrounding code, the only case a 2PC transaction
does not get decoded a prepare time is if the transaction is empty. Or
are you aware of any other situation that might currently happen?

(unless the server needs to be restarted due to some reason)

Right, the repetition occurs only after a restart of the walsender in
between a prepare and a commit prepared record.

That anyway is true without this work as well where restart_lsn can be
advanced on commits. We haven't changed anything in that regard.

I did not mean to blame the patch, but merely try to understand some of
the design decisions behind it.

And as I just learned, even if we managed to avoid the repetition, a
restarted walsender still needs to see prepared transactions as
in-progress in its snapshots. So we cannot move forward the restart_lsn
to after a prepare record (until the final commit or rollback is consumed).

Best Regards

Markus

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#3)
Re: repeated decoding of prepared transactions

On Mon, Feb 8, 2021 at 8:36 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Hello Amit,

thanks for your very quick response.

On 08.02.21 11:13, Amit Kapila wrote:

/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/

The way I read the surrounding code, the only case a 2PC transaction
does not get decoded a prepare time is if the transaction is empty. Or
are you aware of any other situation that might currently happen?

We also skip decoding at prepare time if we haven't reached a
consistent snapshot by that time. See below code in DecodePrepare().
DecodePrepare()
{
..
/* We can't start streaming unless a consistent state is reached. */
if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
{
ReorderBufferSkipPrepare(ctx->reorder, xid);
return;
}
..
}

There are other reasons as well like the output plugin doesn't want to
allow decoding at prepare time but I don't think those are relevant to
the discussion here.

(unless the server needs to be restarted due to some reason)

Right, the repetition occurs only after a restart of the walsender in
between a prepare and a commit prepared record.

That anyway is true without this work as well where restart_lsn can be
advanced on commits. We haven't changed anything in that regard.

I did not mean to blame the patch, but merely try to understand some of
the design decisions behind it.

And as I just learned, even if we managed to avoid the repetition, a
restarted walsender still needs to see prepared transactions as
in-progress in its snapshots. So we cannot move forward the restart_lsn
to after a prepare record (until the final commit or rollback is consumed).

Right and say if we forget the prepared transactions and move forward
with restart_lsn once we get the prepare for any transaction. Then we
will open up a window where we haven't actually sent the prepared xact
because of say "snapshot has not yet reached consistent state" and we
have moved the restart_lsn. Then later when we get the commit
corresponding to the prepared transaction by which time say the
"snapshot has reached consistent state" then we will miss sending the
transaction contents and prepare for it. I think for such reasons we
allow restart_lsn to moved only once the transaction is finished
(committed or rolled back).

--
With Regards,
Amit Kapila.

#5Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Amit Kapila (#4)
Re: repeated decoding of prepared transactions

On Tue, Feb 9, 2021 at 8:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Feb 8, 2021 at 8:36 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Hello Amit,

thanks for your very quick response.

On 08.02.21 11:13, Amit Kapila wrote:

/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/

The way I read the surrounding code, the only case a 2PC transaction
does not get decoded a prepare time is if the transaction is empty. Or
are you aware of any other situation that might currently happen?

We also skip decoding at prepare time if we haven't reached a
consistent snapshot by that time. See below code in DecodePrepare().
DecodePrepare()
{
..
/* We can't start streaming unless a consistent state is reached. */
if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
{
ReorderBufferSkipPrepare(ctx->reorder, xid);
return;
}
..
}

Can you please provide steps which can lead to this situation? If
there is an earlier discussion which has example scenarios, please
point us to the relevant thread.

If we are not sending PREPARED transactions that's fine, but sending
the same prepared transaction as many times as the WAL sender is
restarted between sending prepare and commit prepared is a waste of
network bandwidth. The wastage is proportional to the amount of
changes in the transaction and number of such transactions themselves.
Also this will cause performance degradation. So if we can avoid
resending prepared transactions twice that will help.

--
Best Wishes,
Ashutosh Bapat

#6Ajin Cherian
itsajin@gmail.com
In reply to: Ashutosh Bapat (#5)
Re: repeated decoding of prepared transactions

On Tue, Feb 9, 2021 at 4:59 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Can you please provide steps which can lead to this situation? If
there is an earlier discussion which has example scenarios, please
point us to the relevant thread.

If we are not sending PREPARED transactions that's fine, but sending
the same prepared transaction as many times as the WAL sender is
restarted between sending prepare and commit prepared is a waste of
network bandwidth. The wastage is proportional to the amount of
changes in the transaction and number of such transactions themselves.
Also this will cause performance degradation. So if we can avoid
resending prepared transactions twice that will help.

One of this scenario is explained in the test case in

postgres/contrib/test_decoding/specs/twophase_snapshot.spec

regards,
Ajin Cherian
Fujitsu Australia

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Bapat (#5)
Re: repeated decoding of prepared transactions

On Tue, Feb 9, 2021 at 11:29 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Tue, Feb 9, 2021 at 8:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Feb 8, 2021 at 8:36 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Hello Amit,

thanks for your very quick response.

On 08.02.21 11:13, Amit Kapila wrote:

/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/

The way I read the surrounding code, the only case a 2PC transaction
does not get decoded a prepare time is if the transaction is empty. Or
are you aware of any other situation that might currently happen?

We also skip decoding at prepare time if we haven't reached a
consistent snapshot by that time. See below code in DecodePrepare().
DecodePrepare()
{
..
/* We can't start streaming unless a consistent state is reached. */
if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
{
ReorderBufferSkipPrepare(ctx->reorder, xid);
return;
}
..
}

Can you please provide steps which can lead to this situation?

Ajin has already shared the example with you.

If
there is an earlier discussion which has example scenarios, please
point us to the relevant thread.

It started in the email [1]/messages/by-id/CAA4eK1+d3gzCyzsYjt1m6sfGf_C_uFmo9JK=3Wafp6yR8Mg8uQ@mail.gmail.com and from there you can read later emails
to know more about this.

If we are not sending PREPARED transactions that's fine,

Hmm, I am not sure if that is fine because if the output plugin sets
the two-phase-commit option, it would expect all prepared xacts to
arrive not some only some of them.

but sending
the same prepared transaction as many times as the WAL sender is
restarted between sending prepare and commit prepared is a waste of
network bandwidth.

I think similar happens without any of the work done in PG-14 as well
if we restart the apply worker before the commit completes on the
subscriber. After the restart, we will send the start_decoding_at
point based on some previous commit which will make publisher send the
entire transaction again. I don't think restart of WAL sender or WAL
receiver is such a common thing. It can only happen due to some bug in
code or user wishes to stop the nodes or some crash happened.

[1]: /messages/by-id/CAA4eK1+d3gzCyzsYjt1m6sfGf_C_uFmo9JK=3Wafp6yR8Mg8uQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#8Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#7)
Re: repeated decoding of prepared transactions

On Tue, Feb 9, 2021 at 6:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think similar happens without any of the work done in PG-14 as well
if we restart the apply worker before the commit completes on the
subscriber. After the restart, we will send the start_decoding_at
point based on some previous commit which will make publisher send the
entire transaction again. I don't think restart of WAL sender or WAL
receiver is such a common thing. It can only happen due to some bug in
code or user wishes to stop the nodes or some crash happened.

Really? My impression is that the logical replication protocol is
supposed to be designed in such a way that once a transaction is
successfully confirmed, it won't be sent again. Now if something is
not confirmed then it has to be sent again. But if it is confirmed
then it shouldn't happen.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#8)
Re: repeated decoding of prepared transactions

On Wed, Feb 10, 2021 at 12:08 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 9, 2021 at 6:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think similar happens without any of the work done in PG-14 as well
if we restart the apply worker before the commit completes on the
subscriber. After the restart, we will send the start_decoding_at
point based on some previous commit which will make publisher send the
entire transaction again. I don't think restart of WAL sender or WAL
receiver is such a common thing. It can only happen due to some bug in
code or user wishes to stop the nodes or some crash happened.

Really? My impression is that the logical replication protocol is
supposed to be designed in such a way that once a transaction is
successfully confirmed, it won't be sent again. Now if something is
not confirmed then it has to be sent again. But if it is confirmed
then it shouldn't happen.

If by successfully confirmed, you mean that once the subscriber node
has received, it won't be sent again then as far as I know that is not
true. We rely on the flush location sent by the subscriber to advance
the decoding locations. We update the flush locations after we apply
the transaction's commit successfully. Also, after the restart, we use
the replication origin's last flush location as a point from where we
need the transactions and the origin's progress is updated at commit
time.

OTOH, If by successfully confirmed, you mean that once the subscriber
has applied the complete transaction (including commit), then you are
right that it won't be sent again.

--
With Regards,
Amit Kapila.

#10Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Kapila (#9)
Re: repeated decoding of prepared transactions

On Wed, Feb 10, 2021 at 8:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Feb 10, 2021 at 12:08 AM Robert Haas <robertmhaas@gmail.com>
wrote:

On Tue, Feb 9, 2021 at 6:57 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

I think similar happens without any of the work done in PG-14 as well
if we restart the apply worker before the commit completes on the
subscriber. After the restart, we will send the start_decoding_at
point based on some previous commit which will make publisher send the
entire transaction again. I don't think restart of WAL sender or WAL
receiver is such a common thing. It can only happen due to some bug in
code or user wishes to stop the nodes or some crash happened.

Really? My impression is that the logical replication protocol is
supposed to be designed in such a way that once a transaction is
successfully confirmed, it won't be sent again. Now if something is
not confirmed then it has to be sent again. But if it is confirmed
then it shouldn't happen.

If by successfully confirmed, you mean that once the subscriber node
has received, it won't be sent again then as far as I know that is not
true. We rely on the flush location sent by the subscriber to advance
the decoding locations. We update the flush locations after we apply
the transaction's commit successfully. Also, after the restart, we use
the replication origin's last flush location as a point from where we
need the transactions and the origin's progress is updated at commit
time.

OTOH, If by successfully confirmed, you mean that once the subscriber
has applied the complete transaction (including commit), then you are
right that it won't be sent again.

I think we need to treat a prepared transaction slightly different from an
uncommitted transaction when sending downstream. We need to send a whole
uncommitted transaction downstream again because previously applied changes
must have been aborted and hence lost by the downstream and thus it needs
to get all of those again. But when a downstream prepares a transaction,
even if it's not committed, those changes are not lost even after restart
of a walsender. If the downstream confirms that it has "flushed" PREPARE,
there is no need to send all the changes again.

--
Best Wishes,
Ashutosh

#11Ajin Cherian
itsajin@gmail.com
In reply to: Ashutosh Bapat (#10)
Re: repeated decoding of prepared transactions

On Wed, Feb 10, 2021 at 3:43 PM Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I think we need to treat a prepared transaction slightly different from an uncommitted transaction when sending downstream. We need to send a whole uncommitted transaction downstream again because previously applied changes must have been aborted and hence lost by the downstream and thus it needs to get all of those again. But when a downstream prepares a transaction, even if it's not committed, those changes are not lost even after restart of a walsender. If the downstream confirms that it has "flushed" PREPARE, there is no need to send all the changes again.

But the other side of the problem is that ,without this, if the
prepared transaction is prior to a consistent snapshot when decoding
starts/restarts, then only the "commit prepared" is sent to downstream
(as seen in the test scenario I shared above), and downstream has to
error away the commit prepared because it does not have the
corresponding prepared transaction. We did not find an easy way to
distinguish between these two scenarios for prepared transactions.
a. A consistent snapshot being formed in between a prepare and a
commit prepared for the first time.
b. Decoder restarting between a prepare and a commit prepared.

For plugins to be able to handle this, we have added a special
callback "Begin Prepare" as explained in [1] section 49.6.4.10

"The required begin_prepare_cb callback is called whenever the start
of a prepared transaction has been decoded. The gid field, which is
part of the txn parameter can be used in this callback to check if the
plugin has already received this prepare in which case it can skip the
remaining changes of the transaction. This can only happen if the user
restarts the decoding after receiving the prepare for a transaction
but before receiving the commit prepared say because of some error."

The pgoutput plugin is also being updated to be able to handle this
situation of duplicate prepared transactions.

regards,
Ajin Cherian
Fujitsu Australia

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#11)
Re: repeated decoding of prepared transactions

On Wed, Feb 10, 2021 at 11:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Feb 10, 2021 at 3:43 PM Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I think we need to treat a prepared transaction slightly different from an uncommitted transaction when sending downstream. We need to send a whole uncommitted transaction downstream again because previously applied changes must have been aborted and hence lost by the downstream and thus it needs to get all of those again. But when a downstream prepares a transaction, even if it's not committed, those changes are not lost even after restart of a walsender. If the downstream confirms that it has "flushed" PREPARE, there is no need to send all the changes again.

But the other side of the problem is that ,without this, if the
prepared transaction is prior to a consistent snapshot when decoding
starts/restarts, then only the "commit prepared" is sent to downstream
(as seen in the test scenario I shared above), and downstream has to
error away the commit prepared because it does not have the
corresponding prepared transaction.

I think it is not only simple error handling, it is required for
data-consistency. We need to send the transactions whose commits are
encountered after a consistent snapshot is reached.

--
With Regards,
Amit Kapila.

#13Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Amit Kapila (#12)
Re: repeated decoding of prepared transactions

On 10.02.21 07:32, Amit Kapila wrote:

On Wed, Feb 10, 2021 at 11:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

But the other side of the problem is that ,without this, if the
prepared transaction is prior to a consistent snapshot when decoding
starts/restarts, then only the "commit prepared" is sent to downstream
(as seen in the test scenario I shared above), and downstream has to
error away the commit prepared because it does not have the
corresponding prepared transaction.

I think it is not only simple error handling, it is required for
data-consistency. We need to send the transactions whose commits are
encountered after a consistent snapshot is reached.

I'm with Ashutosh here. If a replica is properly in sync, it knows
about prepared transactions and all the gids of those. Sending the
transactional changes and the prepare again is inconsistent.

The point of a two-phase transaction is to have two phases. An output
plugin must have the chance of treating them as independent events.
Once a PREPARE is confirmed, it must not be sent again. Even if the
transaction is still in-progress and its changes are not yet visible on
the origin node.

Regards

Markus

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#13)
Re: repeated decoding of prepared transactions

On Wed, Feb 10, 2021 at 1:40 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

On 10.02.21 07:32, Amit Kapila wrote:

On Wed, Feb 10, 2021 at 11:45 AM Ajin Cherian <itsajin@gmail.com> wrote:

But the other side of the problem is that ,without this, if the
prepared transaction is prior to a consistent snapshot when decoding
starts/restarts, then only the "commit prepared" is sent to downstream
(as seen in the test scenario I shared above), and downstream has to
error away the commit prepared because it does not have the
corresponding prepared transaction.

I think it is not only simple error handling, it is required for
data-consistency. We need to send the transactions whose commits are
encountered after a consistent snapshot is reached.

I'm with Ashutosh here. If a replica is properly in sync, it knows
about prepared transactions and all the gids of those. Sending the
transactional changes and the prepare again is inconsistent.

The point of a two-phase transaction is to have two phases. An output
plugin must have the chance of treating them as independent events.

I am not sure I understand what problem you are facing to deal with
this in the output plugin, it is explained in docs and Ajin also
pointed out the same. Ajin and I have explained to you the design
constraints on the publisher-side due to which we have done this way.
Do you have any better ideas to deal with this?

--
With Regards,
Amit Kapila.

#15Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#9)
Re: repeated decoding of prepared transactions

On Tue, Feb 9, 2021 at 9:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

If by successfully confirmed, you mean that once the subscriber node
has received, it won't be sent again then as far as I know that is not
true. We rely on the flush location sent by the subscriber to advance
the decoding locations. We update the flush locations after we apply
the transaction's commit successfully. Also, after the restart, we use
the replication origin's last flush location as a point from where we
need the transactions and the origin's progress is updated at commit
time.

OTOH, If by successfully confirmed, you mean that once the subscriber
has applied the complete transaction (including commit), then you are
right that it won't be sent again.

I meant - once the subscriber has advanced the flush location.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#1)
Re: repeated decoding of prepared transactions

On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

I did not expect this, as any receiver that wants to have decoded 2PC is
likely supporting some kind of two-phase commits itself. And would
therefore prepare the transaction upon its first reception. Potentially
receiving it a second time would require complicated filtering on every
prepared transaction.

I would like to bring one other scenario to your notice where you
might want to handle things differently for prepared transactions on
the plugin side. Assume we have multiple publications (for simplicity
say 2) on publisher with corresponding subscriptions (say 2, each
corresponding to one publication on the publisher). When a user
performs a transaction on a publisher that involves the tables from
both publications, on the subscriber-side, we do that work via two
different transactions, corresponding to each subscription. But, we
need some way to deal with prepared xacts because they need GID and we
can't use the same GID for both subscriptions. Please see the detailed
example and one idea to deal with the same in the main thread[1]/messages/by-id/CAA4eK1+LvkeX=B3xon7RcBwD4CVaFSryPj3pTBAALrDxQVPDwA@mail.gmail.com. It
would be really helpful if you or others working on the plugin side
can share your opinion on the same.

Now, coming back to the restart case where the prepared transaction
can be sent again by the publisher. I understand yours and others
point that we should not send prepared transaction if there is a
restart between prepare and commit but there are reasons why we have
done that way and I am open to your suggestions. I'll once again try
to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time. One might think it is quite easy
to detect that, basically if we skip prepare when the snapshot state
was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and
use the same to detect during commit and accordingly take the decision
to send prepare but unfortunately it is not that easy. There is always
a chance that on restart we reuse the snapshot serialized by some
other Walsender at a location prior to Prepare and if that happens
then this time the prepare won't be skipped due to snapshot state
(SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering
we have already shipped some of the later commits but not prepare).
Now, this will actually become the same situation where the restart
has happened after we have sent the prepare but not commit. This is
the reason we have to resend the prepare when the subscriber restarts
between prepare and commit.

You can reproduce the case where we can't distinguish between two
situations by using the test case in twophase_snapshot.spec and
additionally starting a separate session via the debugger. So, the
steps in the test case are as below:

"s2b" "s2txid" "s1init" "s3b" "s3txid" "s2c" "s2b" "s2insert" "s2p"
"s3c" "s1insert" "s1start" "s2cp" "s1start"

Define new steps as

"s4init" {SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot_1',
'test_decoding');}
"s4start" {SELECT data FROM
pg_logical_slot_get_changes('isolation_slot_1', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'1');}

The first thing we need to do is s4init and stop the debugger in
SnapBuildProcessRunningXacts. Now perform steps from 's2b' till first
's1start' in twophase_snapshot.spec. Then continue in the s4 session
and perform s4start. After this, if you debug (or add the logs) the
second s1start, you will notice that we are skipping prepare not
because of inconsistent snapshot but a forward location in
start_decoding_at. If you don't involve session-4, then it will always
skip prepare due to an inconsistent snapshot state. This involves a
debugger so not easy to write an automated test for it.

I have used a bit tricky scenario to explain this but not sure if
there was any other simpler way.

[1]: /messages/by-id/CAA4eK1+LvkeX=B3xon7RcBwD4CVaFSryPj3pTBAALrDxQVPDwA@mail.gmail.com

--
With Regards,
Amit Kapila.

#17Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Amit Kapila (#16)
Re: repeated decoding of prepared transactions

Hello Amit,

thanks a lot for your extensive explanation and examples, I appreciate
this very much. I'll need to think this through and see how we can make
this work for us.

Best Regards

Markus

#18Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#16)
Re: repeated decoding of prepared transactions

On Thu, Feb 11, 2021 at 5:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time.

This might be a dumb question, but: why?

Is this because the effects of the prepared transaction might
otherwise be included neither in the initial synchronization of the
data nor in any subsequently decoded transaction, thus leaving the
replica out of sync?

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#18)
Re: repeated decoding of prepared transactions

On Fri, Feb 12, 2021 at 1:10 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 11, 2021 at 5:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time.

This might be a dumb question, but: why?

Is this because the effects of the prepared transaction might
otherwise be included neither in the initial synchronization of the
data nor in any subsequently decoded transaction, thus leaving the
replica out of sync?

Yes.

--
With Regards,
Amit Kapila.

#20Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#9)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-10 08:02:17 +0530, Amit Kapila wrote:

On Wed, Feb 10, 2021 at 12:08 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 9, 2021 at 6:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think similar happens without any of the work done in PG-14 as well
if we restart the apply worker before the commit completes on the
subscriber. After the restart, we will send the start_decoding_at
point based on some previous commit which will make publisher send the
entire transaction again. I don't think restart of WAL sender or WAL
receiver is such a common thing. It can only happen due to some bug in
code or user wishes to stop the nodes or some crash happened.

Really? My impression is that the logical replication protocol is
supposed to be designed in such a way that once a transaction is
successfully confirmed, it won't be sent again. Now if something is
not confirmed then it has to be sent again. But if it is confirmed
then it shouldn't happen.

Correct.

If by successfully confirmed, you mean that once the subscriber node
has received, it won't be sent again then as far as I know that is not
true. We rely on the flush location sent by the subscriber to advance
the decoding locations. We update the flush locations after we apply
the transaction's commit successfully. Also, after the restart, we use
the replication origin's last flush location as a point from where we
need the transactions and the origin's progress is updated at commit
time.

That's not quite right. Yes, the flush location isn't guaranteed to be
updated at that point, but a replication client will send the last
location they've received and successfully processed, and that has to
*guarantee* that they won't receive anything twice, or miss
something. Otherwise you've broken the protocol.

Greetings,

Andres Freund

#21Petr Jelinek
petr.jelinek@enterprisedb.com
In reply to: Andres Freund (#20)
Re: repeated decoding of prepared transactions

On 13 Feb 2021, at 17:32, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-02-10 08:02:17 +0530, Amit Kapila wrote:

On Wed, Feb 10, 2021 at 12:08 AM Robert Haas <robertmhaas@gmail.com> wrote:

If by successfully confirmed, you mean that once the subscriber node
has received, it won't be sent again then as far as I know that is not
true. We rely on the flush location sent by the subscriber to advance
the decoding locations. We update the flush locations after we apply
the transaction's commit successfully. Also, after the restart, we use
the replication origin's last flush location as a point from where we
need the transactions and the origin's progress is updated at commit
time.

That's not quite right. Yes, the flush location isn't guaranteed to be
updated at that point, but a replication client will send the last
location they've received and successfully processed, and that has to
*guarantee* that they won't receive anything twice, or miss
something. Otherwise you've broken the protocol.

Agreed, if we relied purely on flush location of a slot, there would be no need for origins to track the lsn. AFAIK this is exactly why origins are Wal logged along with transaction, it allows us to guarantee never getting anything that has beed durably written.


Petr

#22Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#21)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-13 17:37:29 +0100, Petr Jelinek wrote:

Agreed, if we relied purely on flush location of a slot, there would
be no need for origins to track the lsn.

And we would be latency bound replicating transactions, which'd not be
fun for single-insert ones for example...

AFAIK this is exactly why origins are Wal logged along with
transaction, it allows us to guarantee never getting anything that has
beed durably written.

I think you'd need something like origins in that case, because
something could still go wrong before the other side has received the
flush (network disconnect, primary crash, ...).

Greetings,

Andres Freund

#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#22)
Re: repeated decoding of prepared transactions

On Sat, Feb 13, 2021 at 10:23 PM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-13 17:37:29 +0100, Petr Jelinek wrote:

AFAIK this is exactly why origins are Wal logged along with
transaction, it allows us to guarantee never getting anything that has
beed durably written.

I think you'd need something like origins in that case, because
something could still go wrong before the other side has received the
flush (network disconnect, primary crash, ...).

We are already using origins in apply-worker to guarantee that and
with each commit, the origin's lsn location is also WAL-logged. That
helps us to send the start location for a slot after the restart. As
far as I understand this is how it works from the apply-worker side. I
am not sure if I am missing something here?

--
With Regards,
Amit Kapila.

#24Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#16)
Re: repeated decoding of prepared transactions

On Thu, Feb 11, 2021 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Now, coming back to the restart case where the prepared transaction
can be sent again by the publisher. I understand yours and others
point that we should not send prepared transaction if there is a
restart between prepare and commit but there are reasons why we have
done that way and I am open to your suggestions. I'll once again try
to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time. One might think it is quite easy
to detect that, basically if we skip prepare when the snapshot state
was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and
use the same to detect during commit and accordingly take the decision
to send prepare but unfortunately it is not that easy. There is always
a chance that on restart we reuse the snapshot serialized by some
other Walsender at a location prior to Prepare and if that happens
then this time the prepare won't be skipped due to snapshot state
(SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering
we have already shipped some of the later commits but not prepare).
Now, this will actually become the same situation where the restart
has happened after we have sent the prepare but not commit. This is
the reason we have to resend the prepare when the subscriber restarts
between prepare and commit.

After further thinking on this problem and some off-list discussions
with Ajin, there appears to be another way to solve the above problem
by which we can avoid resending the prepare after restart if it has
already been processed by the subscriber. The main reason why we were
not able to distinguish between the two cases ((a) prepare happened
before SNAPBUILD_CONSISTENT state but commit prepared happened after
we reach SNAPBUILD_CONSISTENT state and (b) prepare is already
decoded, successfully processed by the subscriber and we have
restarted the decoding) is that we can re-use the serialized snapshot
at LSN location prior to Prepare of some concurrent WALSender after
the restart. Now, if we ensure that we don't use serialized snapshots
for decoding via slots where two_phase decoding option is enabled then
we won't have that problem. The drawback is that in some cases it can
take a bit more time for initial snapshot building but maybe that is
better than the current solution.

Any suggestions?

--
With Regards,
Amit Kapila.

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#24)
Re: repeated decoding of prepared transactions

On Tue, Feb 16, 2021 at 9:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 11, 2021 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

Now, coming back to the restart case where the prepared transaction
can be sent again by the publisher. I understand yours and others
point that we should not send prepared transaction if there is a
restart between prepare and commit but there are reasons why we have
done that way and I am open to your suggestions. I'll once again try
to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time. One might think it is quite easy
to detect that, basically if we skip prepare when the snapshot state
was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and
use the same to detect during commit and accordingly take the decision
to send prepare but unfortunately it is not that easy. There is always
a chance that on restart we reuse the snapshot serialized by some
other Walsender at a location prior to Prepare and if that happens
then this time the prepare won't be skipped due to snapshot state
(SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering
we have already shipped some of the later commits but not prepare).
Now, this will actually become the same situation where the restart
has happened after we have sent the prepare but not commit. This is
the reason we have to resend the prepare when the subscriber restarts
between prepare and commit.

After further thinking on this problem and some off-list discussions
with Ajin, there appears to be another way to solve the above problem
by which we can avoid resending the prepare after restart if it has
already been processed by the subscriber. The main reason why we were
not able to distinguish between the two cases ((a) prepare happened
before SNAPBUILD_CONSISTENT state but commit prepared happened after
we reach SNAPBUILD_CONSISTENT state and (b) prepare is already
decoded, successfully processed by the subscriber and we have
restarted the decoding) is that we can re-use the serialized snapshot
at LSN location prior to Prepare of some concurrent WALSender after
the restart. Now, if we ensure that we don't use serialized snapshots
for decoding via slots where two_phase decoding option is enabled then
we won't have that problem. The drawback is that in some cases it can
take a bit more time for initial snapshot building but maybe that is
better than the current solution.

I see another thing which we need to address if we have to use the
above solution. The issue is if initially the two-pc option for
subscription is off and we skipped prepare because of that and then
some unrelated commit happened which allowed start_decoding_at point
to move ahead. And then the user enabled the two-pc option for the
subscription, then we will again skip prepare because it is behind
start_decoding_at point which becomes the same case where prepare
seems to have already been sent. So, in such a situation with the
above solution, we will miss sending the prepared transaction and its
data and hence risk making replica out-of-sync. Now, this can be
avoided if we don't allow users to alter the two-pc option once the
subscription is created. I am not sure but maybe for the first version
of this feature that might be okay and we can improve it later if we
have better ideas. This will definitely allow us to avoid checks in
the plugins and or apply-worker which seems like a good trade-off and
it will address the concern most people have raised in this thread.
Any thoughts?

--
With Regards,
Amit Kapila.

#26Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#24)
1 attachment(s)
Re: repeated decoding of prepared transactions

On Tue, Feb 16, 2021 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

After further thinking on this problem and some off-list discussions
with Ajin, there appears to be another way to solve the above problem
by which we can avoid resending the prepare after restart if it has
already been processed by the subscriber. The main reason why we were
not able to distinguish between the two cases ((a) prepare happened
before SNAPBUILD_CONSISTENT state but commit prepared happened after
we reach SNAPBUILD_CONSISTENT state and (b) prepare is already
decoded, successfully processed by the subscriber and we have
restarted the decoding) is that we can re-use the serialized snapshot
at LSN location prior to Prepare of some concurrent WALSender after
the restart. Now, if we ensure that we don't use serialized snapshots
for decoding via slots where two_phase decoding option is enabled then
we won't have that problem. The drawback is that in some cases it can
take a bit more time for initial snapshot building but maybe that is
better than the current solution.

Based on this suggestion, I have created a patch on HEAD which now
does not allow repeated decoding
of prepared transactions. For this, the code now enforces
full_snapshot if two-phase decoding is enabled.
Do have a look at the patch and see if you have any comments.

Currently one problem with this, as you have also mentioned in your
last mail, is that if initially two-phase is disabled in
test_decoding while
decoding prepare (causing the prepared transaction to not be decoded)
and later enabled after the commit prepared (where it assumes that the
transaction was decoded at prepare time), then the transaction is not
decoded at all. For eg:

postgres=# begin;
BEGIN
postgres=*# INSERT INTO do_write DEFAULT VALUES;
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'test1';
PREPARE TRANSACTION
postgres=# SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'0');
data
------
(0 rows)
postgres=# commit prepared 'test1';
COMMIT PREPARED
postgres=# SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'1');
data
-------------------------
COMMIT PREPARED 'test1' (1 row)

1st pg_logical_slot_get_changes is called with two-phase-commit off,
2nd is called with two-phase-commit on. You can see that the
transaction is not decoded at all.
For this, I am planning to change the semantics such that
two-phase-commit can only be specified while creating the slot using
pg_create_logical_replication_slot()
and not in pg_logical_slot_get_changes, thus preventing
two-phase-commit flag from being toggled between restarts of the
decoder. Let me know if anybody objects to this
change, else I will update that in the next patch.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

0001-Don-t-allow-repeated-decoding-of-prepared-transactio.patchapplication/octet-stream; name=0001-Don-t-allow-repeated-decoding-of-prepared-transactio.patchDownload
From 129947ab2d0ba223862ed1c87be0f96b51645ba0 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Thu, 18 Feb 2021 20:18:16 -0500
Subject: [PATCH] Don't allow repeated decoding of prepared transactions.

Enforce full snapshot while decoding with two-phase enabled. This
allows the decoder to differentiate between prepared transaction that
were sent prior to restart and prepared transactions that were not sent
because they were prior to consistent snapshot.
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 src/backend/replication/logical/decode.c           |  5 ++-
 src/backend/replication/logical/logical.c          |  8 +++++
 src/backend/replication/logical/reorderbuffer.c    | 15 +++++++++
 src/backend/replication/logical/snapbuild.c        |  9 +++++
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/snapbuild.h                |  1 +
 8 files changed, 53 insertions(+), 52 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..00d789d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -789,10 +789,13 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 * SnapBuildProcessRunningXacts. But we need to process cache
 	 * invalidations if there are any for the reasons mentioned in
 	 * DecodeCommit.
+	 *
+	 * We need to mark the transaction as prepared, so that we don't resend it on 
+	 * COMMIT PREPARED.
 	 */
 	if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
+		ReorderBufferMarkPrepare(ctx->reorder, xid);
 		ReorderBufferInvalidate(ctx->reorder, xid, buf->origptr);
 		return;
 	}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 0977aec..d98bc92 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -431,6 +431,10 @@ CreateInitDecodingContext(const char *plugin,
 		startup_cb_wrapper(ctx, &ctx->options, true);
 	MemoryContextSwitchTo(old_context);
 
+	/* If two-phase is on, then only full snapshot can be used */
+	if (ctx->twophase)
+		SetSnapBuildType(ctx->snapshot_builder, true);
+
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
 	return ctx;
@@ -534,6 +538,10 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
+	/* If two-phase is on, then only full snapshot can be used */
+	if (ctx->twophase)
+		SetSnapBuildType(ctx->snapshot_builder, true);
+
 	ereport(LOG,
 			(errmsg("starting logical decoding for slot \"%s\"",
 					NameStr(slot->data.name)),
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5a62ab8..4730cf0 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2638,6 +2638,21 @@ ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
 	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
 }
 
+/* Mark this transaction as prepared */
+void
+ReorderBufferMarkPrepare(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+	/* unknown transaction, nothing to do */
+	if (txn == NULL)
+		return;
+
+	txn->txn_flags |= RBTXN_PREPARE;
+}
+
 /*
  * Prepare a two-phase transaction.
  *
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 752cf2d..5e6899e 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -357,6 +357,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Set snapshot type
+ */
+void
+SetSnapBuildType(SnapBuild *builder, bool need_full_snapshot)
+{
+	builder->building_full_snapshot = need_full_snapshot;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..8824e60 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -676,6 +676,7 @@ bool		ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 											 TimestampTz prepare_time,
 											 RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid);
+void		ReorderBufferMarkPrepare(ReorderBuffer *rb, TransactionId xid);
 void		ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid, char *gid);
 ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
 TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..786d0d4 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -69,6 +69,7 @@ extern void SnapBuildSnapDecRefcount(Snapshot snap);
 extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
 extern const char *SnapBuildExportSnapshot(SnapBuild *snapstate);
 extern void SnapBuildClearExportedSnapshot(void);
+extern void SetSnapBuildType(SnapBuild *builder, bool need_full_snapshot);
 
 extern SnapBuildState SnapBuildCurrentState(SnapBuild *snapstate);
 extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
-- 
1.8.3.1

#27Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Ajin Cherian (#26)
1 attachment(s)
Re: repeated decoding of prepared transactions

Ajin, Amit,

thank you both a lot for thinking this through and even providing a patch.

The changes in expectation for twophase.out matches exactly with what I
prepared. And the switch with pg_logical_slot_get_changes indeed is
something I had not yet considered, either.

On 19.02.21 03:50, Ajin Cherian wrote:

For this, I am planning to change the semantics such that
two-phase-commit can only be specified while creating the slot using
pg_create_logical_replication_slot()
and not in pg_logical_slot_get_changes, thus preventing
two-phase-commit flag from being toggled between restarts of the
decoder. Let me know if anybody objects to this
change, else I will update that in the next patch.

This sounds like a good plan to me, yes.

However, more generally speaking, I suspect you are overthinking this.
All of the complexity arises because of the assumption that an output
plugin receiving and confirming a PREPARE may not be able to persist
that first phase of transaction application. Instead, you are trying to
somehow resurrect the transactional changes and the prepare at COMMIT
PREPARED time and decode it in a deferred way.

Instead, I'm arguing that a PREPARE is an atomic operation just like a
transaction's COMMIT. The decoder should always feed these in the order
of appearance in the WAL. For example, if you have PREAPRE A, COMMIT B,
COMMIT PREPARED A in the WAL, the decoder should always output these
events in exactly that order. And not ever COMMIT B, PREPARE A, COMMIT
PREPARED A (which is currently violated in the expectation for
twophase_snapshot, because the COMMIT for `s1insert` there appears after
the PREPARE of `s2p` in the WAL, but gets decoded before it).

The patch I'm attaching corrects this expectation in twophase_snapshot,
adds an explanatory diagram, and eliminates any danger of sending
PREPAREs at COMMIT PREPARED time. Thereby preserving the ordering of
PREPAREs vs COMMITs.

Given the output plugin supports two-phase commit, I argue there must be
a good reason for it setting the start_decoding_at LSN to a point in
time after a PREPARE. To me that means the output plugin (or its
downstream replica) has processed the PREPARE (and the downstream
replica did whatever it needed to do on its side in order to make the
transaction ready to be committed in a second phase).

(In the weird case of an output plugin that wants to enable two-phase
commit but does not really support it downstream, it's still possible
for it to hold back LSN confirmations for prepared-but-still-in-flight
transactions. However, I'm having a hard time justifying this use case.)

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the
PREPARE operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin
should not enable two-phase commit.)

b) the PREPARE happens after the start_decoding_at LSN and must be
decoded. (It obviously is not included in the initial synchronization
or decoded by a previous instance of the decoder process.)

The case where the PREPARE lies before SNAPBUILD_CONSISTENT must always
be case a) where we must not repeat the PREPARE, anyway. And in case b)
where we need a consistent snapshot to decode the PREPARE, existing
provisions already guarantee that to be possible (or how would this be
different from a regular single-phase commit?).

Please let me know what you think and whether this approach is feasible
for you as well.

Regards

Markus

Attachments:

0001-Preserve-ordering-of-PREPAREs-vs-COMMITs.patchtext/x-patch; charset=UTF-8; name=0001-Preserve-ordering-of-PREPAREs-vs-COMMITs.patchDownload
From ed03c463175733072edf8afb8d120a1285a3194f Mon Sep 17 00:00:00 2001
From: Markus Wanner <markus.wanner@enterprisedb.com>
Date: Tue, 9 Feb 2021 16:16:13 +0100
Subject: [PATCH] Preserve ordering of PREPAREs vs COMMITs in logical decoding

Decouple decoding of the prepare phase of a two-phase transaction from
the final commit (or rollback) of a two-phase transaction, so that
these are more like atomic operations which preserve the ordering in
WAL.  And so that transactional changes of a PREPARE are not ever
provided to the output plugin unnecessarily.

Correct test expectations to expect no duplication.  Add a variant with
a ROLLBACK PREPARED to twophase_snapshot and illustrate the test case
with an explanatory diagram.
---
 contrib/test_decoding/expected/twophase.out   | 38 ++++---------
 .../expected/twophase_snapshot.out            | 40 ++++++++++++-
 .../expected/twophase_stream.out              | 28 +---------
 .../specs/twophase_snapshot.spec              | 56 ++++++++++++++++---
 doc/src/sgml/logicaldecoding.sgml             | 17 ++++--
 .../replication/logical/reorderbuffer.c       | 29 ----------
 6 files changed, 112 insertions(+), 96 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5a62ab8bbc1..d0d805e5c0e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2695,35 +2695,6 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	/* add the gid in the txn */
 	txn->gid = pstrdup(gid);
 
-	/*
-	 * It is possible that this transaction is not decoded at prepare time
-	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
-	 */
-	if (!rbtxn_prepared(txn) && is_commit)
-	{
-		txn->txn_flags |= RBTXN_PREPARE;
-
-		/*
-		 * The prepare info must have been updated in txn even if we skip
-		 * prepare.
-		 */
-		Assert(txn->final_lsn != InvalidXLogRecPtr);
-
-		/*
-		 * By this time the txn has the prepare record information and it is
-		 * important to use that so that downstream gets the accurate
-		 * information. If instead, we have passed commit information here
-		 * then downstream can behave as it has already replayed commit
-		 * prepared after the restart.
-		 */
-		ReorderBufferReplay(txn, rb, xid, txn->final_lsn, txn->end_lsn,
-							txn->commit_time, txn->origin_id, txn->origin_lsn);
-	}
-
 	txn->final_lsn = commit_lsn;
 	txn->end_lsn = end_lsn;
 	txn->commit_time = commit_time;
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cf705ed9cda..15820883a43 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -821,11 +821,18 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       whenever the start of a prepared transaction has been decoded. The
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter can be used in this callback to
-      check if the plugin has already received this prepare in which case it
-      can skip the remaining changes of the transaction. This can only happen
-      if the user restarts the decoding after receiving the prepare for a
-      transaction but before receiving the commit prepared say because of some
-      error.
+      identify the transaction and later match it with invocations of
+      of <function>commit_prepared_cb</function>
+      or <function>rollback_prepared_cb</function>.
+     </para>
+
+     <para>
+      Note the start of a logical slot (by LSN) may fall in between
+      a <command>PREPARE</command> and its final <command>COMMIT
+      PREPARED</command> for the same transaction.  Similarly it is possible
+      for a decoder to restart in between.  Therefore, an output plugin needs
+      to be prepared to handle a final commit or rollback for a gid it has not
+      ever seen in its lifetime.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bedd1cf..c51870f8dd1 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d93876462..0b51ead97f7 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -32,10 +32,44 @@ step s2cp: COMMIT PREPARED 'test1';
 step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
 data           
 
-BEGIN          
-table public.do_write: INSERT: id[integer]:1
-PREPARE TRANSACTION 'test1'
 COMMIT PREPARED 'test1'
 ?column?       
 
 stop           
+
+starting permutation: s2b s2txid s1init s3b s3txid s2c s2b s2insert s2p s3c s1insert s1start s2rp s1start
+step s2b: BEGIN;
+step s2txid: SELECT pg_current_xact_id() IS NULL;
+?column?       
+
+f              
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s3b: BEGIN;
+step s3txid: SELECT pg_current_xact_id() IS NULL;
+?column?       
+
+f              
+step s2c: COMMIT;
+step s2b: BEGIN;
+step s2insert: INSERT INTO do_write DEFAULT VALUES;
+step s2p: PREPARE TRANSACTION 'test1';
+step s3c: COMMIT;
+step s1init: <... completed>
+?column?       
+
+init           
+step s1insert: INSERT INTO do_write DEFAULT VALUES;
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+data           
+
+BEGIN          
+table public.do_write: INSERT: id[integer]:2
+COMMIT         
+step s2rp: ROLLBACK PREPARED 'test1';
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+data           
+
+ROLLBACK PREPARED 'test1'
+?column?       
+
+stop           
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd365..d54e640b409 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e700404e0e..f4df24e7353 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -28,6 +28,7 @@ step "s2c" { COMMIT; }
 step "s2insert" { INSERT INTO do_write DEFAULT VALUES; }
 step "s2p" { PREPARE TRANSACTION 'test1'; }
 step "s2cp" { COMMIT PREPARED 'test1'; }
+step "s2rp" { ROLLBACK PREPARED 'test1'; }
 
 
 session "s3"
@@ -37,17 +38,56 @@ step "s3b" { BEGIN; }
 step "s3txid" { SELECT pg_current_xact_id() IS NULL; }
 step "s3c" { COMMIT; }
 
-# Force building of a consistent snapshot between a PREPARE and COMMIT PREPARED
-# and ensure that the whole transaction is decoded at the time of COMMIT
-# PREPARED.
+# Force building of a consistent snapshot between a PREPARE and COMMIT
+# PREPARED, where the changes together with the prepare are decoded
+# while the transaction still is in progress at the point in time the
+# consistent snapshot was taken.
 #
 # 's1init' step will initialize the replication slot and cause logical decoding
 # to wait in initial starting point till the in-progress transaction in s2 is
 # committed. 's2c' step will cause logical decoding to go to initial consistent
 # point and wait for in-progress transaction s3 to commit. 's3c' step will cause
-# logical decoding to find a consistent point while the transaction s2 is
-# prepared and not yet committed. This will cause the first s1start to skip
-# prepared transaction s2 as that will be before consistent point. The second
-# s1start will allow decoding of skipped prepare along with commit prepared done
-# as part of s2cp.
+# logical decoding to find a consistent point while the transaction in s2 is
+# prepared but not yet committed.
+#
+# In this case, the prepare of the two-phase transaction is considered
+# to have happened prior to the creation of the logical slot.  A
+# replica or any kind of consumer that is synchronized up to this
+# specific point in time should already know about that prepared
+# transaction (by gid) and be ready to process a final "commit
+# prepared" or "rollback prepared" for it.
+#
+# The following diagram shows the timeline of events.  The '|' (pipe)
+# stands for a transaction in progress that's neither committed nor
+# prepared, while '.' (dot) stands for a prepared, but still
+# uncommitted transaction.
+#
+#              s2b
+#   s1init      |            <---- Start to state SNAPBUILD_BUILDING_SNAPSHOT with
+#     |         |                  one transaction in progress (by s2).
+#     |         |       s3b
+#     |         |        |         The commit in s2 allows s1 to enter
+#     |        s2c       |   <---- SNAPBUILD_FULL_SNAPSHOT with a different
+#     |                  |         transaction still in progress (by s3).
+#     |        s2b       |
+#     |      s2insert    |         The two-phase transaction to test started in s2,
+#     |         |        |   <---- so s2 and s3 now both have a transaction in
+#     |        s2p       |         progress.
+#     |         .        |
+#     |         .       s3c        Immediately after the commit in s3, s1 is allowed
+#     o         .            <---- to proceed to SNAPSHOT_CONSISTENT and terminate
+#               .                  the creation of the logical slot.
+#               .
+#               .                  minimal transaction by s1, the first that starts
+#  s1insert     .            <---- after the slot creation has completed, the
+#               .                  prepared transaction in s2 still in progress
+#  s1start      .
+#              s2cp                since the transaction in s2 started before
+#  s1start                   <---- the consistent point, none of its changes must be
+#                                  decoded.  The final commit prepared must still
+#                                  be delivered.
+
 permutation "s2b" "s2txid" "s1init" "s3b" "s3txid" "s2c" "s2b" "s2insert" "s2p" "s3c" "s1insert" "s1start" "s2cp" "s1start"
+
+# Equivalent test case with ROLLBACK PREPARED instead.
+permutation "s2b" "s2txid" "s1init" "s3b" "s3txid" "s2c" "s2b" "s2insert" "s2p" "s3c" "s1insert" "s1start" "s2rp" "s1start"
-- 
2.30.0

#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#27)
Re: repeated decoding of prepared transactions

On Fri, Feb 19, 2021 at 8:23 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the
PREPARE operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin
should not enable two-phase commit.)

I see a problem with this assumption. During the initial
synchronization, this transaction won't be visible to snapshot and we
won't copy it. Then later if we won't decode and send it then the
replica will be out of sync. Such a problem won't happen with Ajin's
patch.

--
With Regards,
Amit Kapila.

#29Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#26)
Re: repeated decoding of prepared transactions

On Fri, Feb 19, 2021 at 8:21 AM Ajin Cherian <itsajin@gmail.com> wrote:

Based on this suggestion, I have created a patch on HEAD which now
does not allow repeated decoding
of prepared transactions. For this, the code now enforces
full_snapshot if two-phase decoding is enabled.
Do have a look at the patch and see if you have any comments.

Few minor comments:
===================
1.
.git/rebase-apply/patch:135: trailing whitespace.
* We need to mark the transaction as prepared, so that we
don't resend it on
warning: 1 line adds whitespace errors.

Whitespace issue.

2.
/*
+ * Set snapshot type
+ */
+void
+SetSnapBuildType(SnapBuild *builder, bool need_full_snapshot)

There is no caller which passes the second parameter as false, so why
have it? Can't we have a function with SetSnapBuildFullSnapshot or
something like that?

3.
@@ -431,6 +431,10 @@ CreateInitDecodingContext(const char *plugin,
startup_cb_wrapper(ctx, &ctx->options, true);
MemoryContextSwitchTo(old_context);

+ /* If two-phase is on, then only full snapshot can be used */
+ if (ctx->twophase)
+ SetSnapBuildType(ctx->snapshot_builder, true);
+
  ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

return ctx;
@@ -534,6 +538,10 @@ CreateDecodingContext(XLogRecPtr start_lsn,

ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

+ /* If two-phase is on, then only full snapshot can be used */
+ if (ctx->twophase)
+ SetSnapBuildType(ctx->snapshot_builder, true);

I think it is better to add a detailed comment on why we are doing
this? You can write the comment in one of the places.

Currently one problem with this, as you have also mentioned in your
last mail, is that if initially two-phase is disabled in
test_decoding while
decoding prepare (causing the prepared transaction to not be decoded)
and later enabled after the commit prepared (where it assumes that the
transaction was decoded at prepare time), then the transaction is not
decoded at all. For eg:

postgres=# begin;
BEGIN
postgres=*# INSERT INTO do_write DEFAULT VALUES;
INSERT 0 1
postgres=*# PREPARE TRANSACTION 'test1';
PREPARE TRANSACTION
postgres=# SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'0');
data
------
(0 rows)
postgres=# commit prepared 'test1';
COMMIT PREPARED
postgres=# SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'1');
data
-------------------------
COMMIT PREPARED 'test1' (1 row)

1st pg_logical_slot_get_changes is called with two-phase-commit off,
2nd is called with two-phase-commit on. You can see that the
transaction is not decoded at all.
For this, I am planning to change the semantics such that
two-phase-commit can only be specified while creating the slot using
pg_create_logical_replication_slot()
and not in pg_logical_slot_get_changes, thus preventing
two-phase-commit flag from being toggled between restarts of the
decoder.

+1.

--
With Regards,
Amit Kapila.

#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#29)
Re: repeated decoding of prepared transactions

On Sat, Feb 20, 2021 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 19, 2021 at 8:21 AM Ajin Cherian <itsajin@gmail.com> wrote:

Based on this suggestion, I have created a patch on HEAD which now
does not allow repeated decoding
of prepared transactions. For this, the code now enforces
full_snapshot if two-phase decoding is enabled.
Do have a look at the patch and see if you have any comments.

Few minor comments:
===================
1.
.git/rebase-apply/patch:135: trailing whitespace.
* We need to mark the transaction as prepared, so that we
don't resend it on
warning: 1 line adds whitespace errors.

Whitespace issue.

2.
/*
+ * Set snapshot type
+ */
+void
+SetSnapBuildType(SnapBuild *builder, bool need_full_snapshot)

There is no caller which passes the second parameter as false, so why
have it? Can't we have a function with SetSnapBuildFullSnapshot or
something like that?

3.
@@ -431,6 +431,10 @@ CreateInitDecodingContext(const char *plugin,
startup_cb_wrapper(ctx, &ctx->options, true);
MemoryContextSwitchTo(old_context);

+ /* If two-phase is on, then only full snapshot can be used */
+ if (ctx->twophase)
+ SetSnapBuildType(ctx->snapshot_builder, true);
+
ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

return ctx;
@@ -534,6 +538,10 @@ CreateDecodingContext(XLogRecPtr start_lsn,

ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

+ /* If two-phase is on, then only full snapshot can be used */
+ if (ctx->twophase)
+ SetSnapBuildType(ctx->snapshot_builder, true);

I think it is better to add a detailed comment on why we are doing
this? You can write the comment in one of the places.

Few more comments:
==================
1. I think you need to update the examples in the docs as well [1]https://www.postgresql.org/docs/devel/logicaldecoding-example.html.
2. Also the text in the description of begin_prepare_cb [2]https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html needs some
adjustment. We can say something on lines that if users want they can
check if the same GID exists and then they can either error out or
take appropriate action based on their need.

[1]: https://www.postgresql.org/docs/devel/logicaldecoding-example.html
[2]: https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html

--
With Regards,
Amit Kapila.

#31Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Amit Kapila (#28)
Re: repeated decoding of prepared transactions

On 20.02.21 04:38, Amit Kapila wrote:

I see a problem with this assumption. During the initial
synchronization, this transaction won't be visible to snapshot and we
won't copy it. Then later if we won't decode and send it then the
replica will be out of sync. Such a problem won't happen with Ajin's
patch.

You are assuming that the initial snapshot is a) logical and b) dumb.

A physical snapshot very well "sees" prepared transactions and will
restore them to their prepared state. But even in the logical case, I
think it's beneficial to keep the decoder simpler and instead require
some support for two-phase commit in the initial synchronization logic.
For example using the following approach (you will recognize
similarities to what snapbuild does):

1.) create the slot
2.) start to retrieve changes and queue them
3.) wait for the prepared transactions that were pending at the
point in time of step 1 to complete
4.) take a snapshot (by visibility, w/o requiring to "see" prepared
transactions)
5.) apply the snapshot
6.) replay the queue, filtering commits already visible in the
snapshot

Just as with the solution proposed by Ajin and you, this has the danger
of showing transactions as committed without the effects of the PREPAREs
being "visible" (after step 5 but before 6).

However, this approach of solving the problem outside of the walsender
has two advantages:

* The delay in step 3 can be made visible and dealt with. As there's
no upper boundary to that delay, it makes sense to e.g. inform the
user after 10 minutes and provide a list of two-phase transactions
still in progress.

* Second, it becomes possible to avoid inconsistencies during the
reconciliation window in between steps 5 and 6 by disallowing
concurrent (user) transactions to run until after completion of
step 6.

Whereas the current implementation hides this in the walsender without
any way to determine how much a PREPARE had been delayed or when
consistency has been reached. (Of course, short of using the very same
initial snapshotting approach outlined above. For which the reordering
logic in the walsender does more harm than good.)

Essentially, I think I'm saying that while I agree that some kind of
snapshot synchronization logic is needed, it should live in a different
place.

Regards

Markus

#32Amit Kapila
amit.kapila16@gmail.com
In reply to: Markus Wanner (#31)
Re: repeated decoding of prepared transactions

On Sat, Feb 20, 2021 at 4:25 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

On 20.02.21 04:38, Amit Kapila wrote:

I see a problem with this assumption. During the initial
synchronization, this transaction won't be visible to snapshot and we
won't copy it. Then later if we won't decode and send it then the
replica will be out of sync. Such a problem won't happen with Ajin's
patch.

You are assuming that the initial snapshot is a) logical and b) dumb.

A physical snapshot very well "sees" prepared transactions and will
restore them to their prepared state. But even in the logical case, I
think it's beneficial to keep the decoder simpler

I think after the patch Ajin proposed decoders won't need any special
checks after receiving the prepared xacts. What additional simplicity
this approach will bring? I rather see that we might need to change
the exiting initial sync (copy) with additional restrictions to
support two-pc for subscribers.

and instead require
some support for two-phase commit in the initial synchronization logic.
For example using the following approach (you will recognize
similarities to what snapbuild does):

1.) create the slot
2.) start to retrieve changes and queue them
3.) wait for the prepared transactions that were pending at the
point in time of step 1 to complete
4.) take a snapshot (by visibility, w/o requiring to "see" prepared
transactions)
5.) apply the snapshot

Do you mean to say that after creating the slot we take an additional
pass over WAL (till the LSN where we found a consistent snapshot) to
collect all prepared transactions and wait for them to get
committed/rollbacked?

6.) replay the queue, filtering commits already visible in the
snapshot

Just as with the solution proposed by Ajin and you, this has the danger
of showing transactions as committed without the effects of the PREPAREs
being "visible" (after step 5 but before 6).

I think the scheme proposed by you is still not fully clear to me but
can you please explain how in the existing proposed patch there is a
danger of showing transactions as committed without the effects of the
PREPAREs being "visible"?

However, this approach of solving the problem outside of the walsender
has two advantages:

* The delay in step 3 can be made visible and dealt with. As there's
no upper boundary to that delay, it makes sense to e.g. inform the
user after 10 minutes and provide a list of two-phase transactions
still in progress.

* Second, it becomes possible to avoid inconsistencies during the
reconciliation window in between steps 5 and 6 by disallowing
concurrent (user) transactions to run until after completion of
step 6.

This second point sounds like a restriction that users might not like.

Whereas the current implementation hides this in the walsender without
any way to determine how much a PREPARE had been delayed or when
consistency has been reached. (Of course, short of using the very same
initial snapshotting approach outlined above. For which the reordering
logic in the walsender does more harm than good.)

Essentially, I think I'm saying that while I agree that some kind of
snapshot synchronization logic is needed, it should live in a different
place.

But we need something in existing logic in WALSender or somewhere to
allow supporting 2PC for subscriptions and from your above
description, it is not clear to me how we can achieve that?

--
With Regards,
Amit Kapila.

#33Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#28)
Re: repeated decoding of prepared transactions

Hi,

On Fri, Feb 19, 2021, at 19:38, Amit Kapila wrote:

On Fri, Feb 19, 2021 at 8:23 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the
PREPARE operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin
should not enable two-phase commit.)

I see a problem with this assumption. During the initial
synchronization, this transaction won't be visible to snapshot and we
won't copy it. Then later if we won't decode and send it then the
replica will be out of sync. Such a problem won't happen with Ajin's
patch.

Why isn't the more obvious answer to this to not allow/disable 2pc decoding during the initial sync? You can't really make sense of it before you're synced anyway...

Regards,

Andres

#34Andres Freund
andres@anarazel.de
In reply to: Ajin Cherian (#26)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-19 13:50:52 +1100, Ajin Cherian wrote:

From 129947ab2d0ba223862ed1c87be0f96b51645ba0 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Thu, 18 Feb 2021 20:18:16 -0500
Subject: [PATCH] Don't allow repeated decoding of prepared transactions.

Enforce full snapshot while decoding with two-phase enabled. This
allows the decoder to differentiate between prepared transaction that
were sent prior to restart and prepared transactions that were not sent
because they were prior to consistent snapshot.

Isn't this an *extremely* expensive solution? Maintaining a full
snapshot is pretty darn expensive - so expensive that it's repeatedly
been a problem even just for building the initial snapshot (to the point
of being inable to do so). And that's typically a comparatively rare
operation, not something continual - but what you're proposing is a cost
paid during ongoing replication.

Greetings,

Andres Freund

#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#33)
Re: repeated decoding of prepared transactions

On Sat, Feb 20, 2021 at 10:26 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On Fri, Feb 19, 2021, at 19:38, Amit Kapila wrote:

On Fri, Feb 19, 2021 at 8:23 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the
PREPARE operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin
should not enable two-phase commit.)

I see a problem with this assumption. During the initial
synchronization, this transaction won't be visible to snapshot and we
won't copy it. Then later if we won't decode and send it then the
replica will be out of sync. Such a problem won't happen with Ajin's
patch.

Why isn't the more obvious answer to this to not allow/disable 2pc decoding during the initial sync?

Here, I am assuming you are asking to disable 2PC both via
apply-worker and tablesync worker till the initial sync (aka all
tables are in SUBREL_STATE_READY state) phase is complete. If we do
that and what if commit prepared happened after the initial sync phase
but prepare happened before that? At Commit prepared because the 2PC
is enabled, we will just send Commit Prepared without the actual data
and prepare. Now, to solve that say we remember in TXN that at prepare
time 2PC was not enabled so at commit prepared time consider that 2PC
is disabled for that TXN and send the entire transaction along with
commit as we do for non-2PC TXNs. But it is possible that a restart
might happen before the commit prepared and then it is possible that
prepare falls before start_decoding_at point so we will still skip
sending it even though 2PC is enabled after the restart and just send
the commit prepared. So, again that can lead to replica going out of
sync.

The other thing related to this is to see to ensure that via SQL APIs
we don't skip any prepared xacts and just return commit prepared.
Basically, the example case, I have described in my email above [1]/messages/by-id/CAA4eK1L5aX1BL9Xg-wSULbFeB417G0v9qk5qZ6NbYCkCo6JUGQ@mail.gmail.com.

One of the ideas I have previously speculated to overcome these
challenges is to someway persist the information of Prepares that are
decoded. Say, after sending prepare, we update the slot information on
disk to indicate that the particular GID is sent. Then next time
whenever we have to skip prepare due to whatever reason, we can check
the existence of persistent information on disk for that GID, if it
exists then we need to send just Commit Prepared, otherwise, the
entire transaction. We can remove this information during or after
CheckPointSnapBuild, basically, we can remove the information of all
GID's that are after cutoff LSN computed via
ReplicationSlotsComputeLogicalRestartLSN. But that seems to be costly
so we didn't pursue it.

[1]: /messages/by-id/CAA4eK1L5aX1BL9Xg-wSULbFeB417G0v9qk5qZ6NbYCkCo6JUGQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#36Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#35)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-21 11:32:29 +0530, Amit Kapila wrote:

Here, I am assuming you are asking to disable 2PC both via
apply-worker and tablesync worker till the initial sync (aka all
tables are in SUBREL_STATE_READY state) phase is complete. If we do
that and what if commit prepared happened after the initial sync phase
but prepare happened before that?

Isn't that pretty easy to detect? You compare the LSN of the tx prepare
with the LSN of achieving consistency? Any restart will recover the
LSNs, because restart_lsn will be before the start of the tx.

At Commit prepared because the 2PC is enabled, we will just send
Commit Prepared without the actual data and prepare. Now, to solve
that say we remember in TXN that at prepare time 2PC was not enabled
so at commit prepared time consider that 2PC is disabled for that TXN
and send the entire transaction along with commit as we do for non-2PC
TXNs. But it is possible that a restart might happen before the commit
prepared and then it is possible that prepare falls before
start_decoding_at point so we will still skip sending it even though
2PC is enabled after the restart and just send the commit
prepared. So, again that can lead to replica going out of sync.

I don't think that an LSN based approach is susceptible to this. And it
wouldn't require more memory etc than we'd now.

Greetings,

Andres Freund

#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#36)
Re: repeated decoding of prepared transactions

On Mon, Feb 22, 2021 at 3:56 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-21 11:32:29 +0530, Amit Kapila wrote:

Here, I am assuming you are asking to disable 2PC both via
apply-worker and tablesync worker till the initial sync (aka all
tables are in SUBREL_STATE_READY state) phase is complete. If we do
that and what if commit prepared happened after the initial sync phase
but prepare happened before that?

Isn't that pretty easy to detect? You compare the LSN of the tx prepare
with the LSN of achieving consistency?

I think by LSN of achieving consistency, you mean start_decoding_at
LSN. It is possible that start_decoding_at point has been moved ahead
because of some other unrelated commit that happens between prepare
and commit prepared. Something like below:

LSN for Prepare of xact t1 at 500
LSN for Commit of xact t2 at 520
LSN for Commit Prepared at 550

Say we skipped prepare because 2PC was not enabled but then decoded
and sent Commit of xact t2. I think after this start_decoding_at LSN
will be at 520. So comparing the prepare LSN of xact t1 with
start_decoding_at will lead to skipping the prepare after the restart
and we will just send the commit prepared without prepare and data
when we process LSN of Commit Prepared at 550.

--
With Regards,
Amit Kapila.

#38Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#37)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-22 08:22:35 +0530, Amit Kapila wrote:

On Mon, Feb 22, 2021 at 3:56 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-21 11:32:29 +0530, Amit Kapila wrote:

Here, I am assuming you are asking to disable 2PC both via
apply-worker and tablesync worker till the initial sync (aka all
tables are in SUBREL_STATE_READY state) phase is complete. If we do
that and what if commit prepared happened after the initial sync phase
but prepare happened before that?

Isn't that pretty easy to detect? You compare the LSN of the tx prepare
with the LSN of achieving consistency?

I think by LSN of achieving consistency, you mean start_decoding_at
LSN.

Kinda, but not in the way you suggest. I mean the LSN at which the slot
reached SNAPBUILD_CONSISTENT. Which also is the point in the WAL stream
we exported the initial snapshot for.

My understanding of why you need to have special handling of 2pc PREPARE
is that the initial snapshot will not contain the contents of the
prepared transaction, therefore you need to send it out at some point
(or be incorrect).

Your solution to this is:
/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/
if (!rbtxn_prepared(txn) && is_commit)

but IMO this violates a pretty fundamental tenant of how logical
decoding is supposed to work, i.e. that data that the client
acknowledges as having received (via lsn passed to START_REPLICATION)
shouldn't be sent out again.

What I am proposing is to instead track the point at which the slot
gained consistency - a simple LSN. That way you can change the above
logic to instead be

if (txn->final_lsn > snapshot_was_exported_at_lsn)
ReorderBufferReplay();
else
...

That will easily work across restarts, won't lead to sending data twice,
etc.

Greetings,

Andres Freund

#39Andres Freund
andres@anarazel.de
In reply to: Markus Wanner (#27)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-19 15:53:32 +0100, Markus Wanner wrote:

However, more generally speaking, I suspect you are overthinking this. All
of the complexity arises because of the assumption that an output plugin
receiving and confirming a PREPARE may not be able to persist that first
phase of transaction application. Instead, you are trying to somehow
resurrect the transactional changes and the prepare at COMMIT PREPARED time
and decode it in a deferred way.

The output plugin should never persist anything. That's the job of the
client, not the output plugin. The output plugin simply doesn't have the
information to know whether the client received data and successfully
applied data or not.

Given the output plugin supports two-phase commit, I argue there must be a
good reason for it setting the start_decoding_at LSN to a point in time
after a PREPARE. To me that means the output plugin (or its downstream
replica) has processed the PREPARE (and the downstream replica did whatever
it needed to do on its side in order to make the transaction ready to be
committed in a second phase).

The output plugin doesn't set / influence start_decoding_at (unless you
want to count just ERRORing out).

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the PREPARE
operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin should
not enable two-phase commit.)

I don't think that can be made work without disproportionate
complexity. Especially not in cases where we start to be CONSISTENT
based on pre-existing on-disk snapshots.

Greetings,

Andres Freund

#40Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Amit Kapila (#32)
Re: repeated decoding of prepared transactions

On 20.02.21 13:15, Amit Kapila wrote:

I think after the patch Ajin proposed decoders won't need any special
checks after receiving the prepared xacts. What additional simplicity
this approach will bring?

The API becomes clearer in that all PREPAREs are always decoded in WAL
stream order and are not ever deferred (possibly until after the commits
of many other transactions). No output plugin will need to check
against this peculiarity, but can rely on WAL ordering of events.

(And if an output plugin does not want prepares to be individual events,
it should simply not enable two-phase support. That seems like
something the output plugin could even do on a per-transaction basis.)

Do you mean to say that after creating the slot we take an additional
pass over WAL (till the LSN where we found a consistent snapshot) to
collect all prepared transactions and wait for them to get
committed/rollbacked?

No. A single pass is enough, the decoder won't need any further change
beyond the code removal in my patch.

I'm proposing for the synchronization logic (in e.g. pgoutput) to defer
the snapshot taking. So that there's some time in between creating the
logical slot (at step 1.) and taking a snapshot (at step 4.). Another
CATCHUP phase, if you want.

So that all two-phase commit transactions are delivered via either:

* the transferred snapshot (because their COMMIT PREPARED took place
before the snapshot was taken in (4)), or

* the decoder stream (because their PREPARE took place after the slot
was fully created and snapbuilder reached a consistent snapshot)

No transaction can have PREPAREd before (1) but not committed until
after (4), because we waited for all prepared transactions to commit in
step (3).

I think the scheme proposed by you is still not fully clear to me but
can you please explain how in the existing proposed patch there is a
danger of showing transactions as committed without the effects of the
PREPAREs being "visible"?

Please see the `twophase_snapshot` isolation test. The expected output
there shows the insert from s1 being committed prior to the prepare of
the transaction in s2.

On a replica applying the stream in that order, a transaction in between
these two events would see the results from s1 while still being allowed
to lock the row that s2 is about to update. Something I'd expect the
PREPARE to prevent.

That is (IMO) wrong in `master` and Ajin's patch doesn't correct it.
(While my patch does, so don't look at my patch for this example.)

* Second, it becomes possible to avoid inconsistencies during the
reconciliation window in between steps 5 and 6 by disallowing
concurrent (user) transactions to run until after completion of
step 6.

This second point sounds like a restriction that users might not like.

"It becomes possible" cannot be a restriction. If a user (or
replication solution) wants to allow for these inconsistencies, it still
can. I want to make sure that solutions which *want* to prevent
inconsistencies can be implemented.

Your concern applies to step (3), though. The current approach is
clearly quicker to restore the backup and start to apply transactions.
Until you start to think about reordering the "early" commits until
after the deferred PREPAREs in the output plugin or on the replica side,
so as to lock rows by prepared transactions before making other commits
visible so as to prevent inconsistencies...

But we need something in existing logic in WALSender or somewhere to
allow supporting 2PC for subscriptions and from your above
description, it is not clear to me how we can achieve that?

I agree that some more code is required somewhere, outside of the walsender.

Regards

Markus

#41Markus Wanner
markus.wanner@enterprisedb.com
In reply to: Andres Freund (#39)
Re: repeated decoding of prepared transactions

On 22.02.21 05:22, Andres Freund wrote:

Hi,

On 2021-02-19 15:53:32 +0100, Markus Wanner wrote:

However, more generally speaking, I suspect you are overthinking this. All
of the complexity arises because of the assumption that an output plugin
receiving and confirming a PREPARE may not be able to persist that first
phase of transaction application. Instead, you are trying to somehow
resurrect the transactional changes and the prepare at COMMIT PREPARED time
and decode it in a deferred way.

The output plugin should never persist anything.

Sure, sorry, I was sloppy in formulation. I meant the replica or client
that receives the data from the output plugin. Given it asked for
two-phase commits in the output plugin, it clearly is interested in the
PREPARE.

That's the job of the
client, not the output plugin. The output plugin simply doesn't have the
information to know whether the client received data and successfully
applied data or not.

Exactly. Therefore, it should not randomly reshuffle or reorder
PREPAREs until after other COMMITs.

The output plugin doesn't set / influence start_decoding_at (unless you
want to count just ERRORing out).

Yeah, same sloppiness, sorry.

With that line of thinking, the point in time (or in WAL) of the COMMIT
PREPARED does not matter at all to reason about the decoding of the PREPARE
operation. Instead, there are only exactly two cases to consider:

a) the PREPARE happened before the start_decoding_at LSN and must not be
decoded. (But the effects of the PREPARE must then be included in the
initial synchronization. If that's not supported, the output plugin should
not enable two-phase commit.)

I don't think that can be made work without disproportionate
complexity. Especially not in cases where we start to be CONSISTENT
based on pre-existing on-disk snapshots.

Well, the PREPARE to happen before the start_decoding_at LSN is a case
the output plugin needs to deal with. I pointed out why the current way
of dealing with it clearly is wrong.

What issues do you see with the approach I proposed?

Regards

Markus

#42Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#38)
Re: repeated decoding of prepared transactions

On Mon, Feb 22, 2021 at 9:39 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-02-22 08:22:35 +0530, Amit Kapila wrote:

On Mon, Feb 22, 2021 at 3:56 AM Andres Freund <andres@anarazel.de> wrote:

On 2021-02-21 11:32:29 +0530, Amit Kapila wrote:

Here, I am assuming you are asking to disable 2PC both via
apply-worker and tablesync worker till the initial sync (aka all
tables are in SUBREL_STATE_READY state) phase is complete. If we do
that and what if commit prepared happened after the initial sync phase
but prepare happened before that?

Isn't that pretty easy to detect? You compare the LSN of the tx prepare
with the LSN of achieving consistency?

I think by LSN of achieving consistency, you mean start_decoding_at
LSN.

Kinda, but not in the way you suggest. I mean the LSN at which the slot
reached SNAPBUILD_CONSISTENT. Which also is the point in the WAL stream
we exported the initial snapshot for.

Okay, that's an interesting idea. I have few questions on this, see below.

My understanding of why you need to have special handling of 2pc PREPARE
is that the initial snapshot will not contain the contents of the
prepared transaction, therefore you need to send it out at some point
(or be incorrect).

Your solution to this is:
/*
* It is possible that this transaction is not decoded at prepare time
* either because by that time we didn't have a consistent snapshot or it
* was decoded earlier but we have restarted. We can't distinguish between
* those two cases so we send the prepare in both the cases and let
* downstream decide whether to process or skip it. We don't need to
* decode the xact for aborts if it is not done already.
*/
if (!rbtxn_prepared(txn) && is_commit)

but IMO this violates a pretty fundamental tenant of how logical
decoding is supposed to work, i.e. that data that the client
acknowledges as having received (via lsn passed to START_REPLICATION)
shouldn't be sent out again.

I agree that this is not acceptable that is why trying to explore
other solutions including what you have proposed.

What I am proposing is to instead track the point at which the slot
gained consistency - a simple LSN. That way you can change the above
logic to instead be

if (txn->final_lsn > snapshot_was_exported_at_lsn)
ReorderBufferReplay();
else
...

With this if the prepare is prior to consistent_snapshot
(snapshot_was_exported_at_lsn)) and commit prepared is after then we
won't send the prepare and data. Won't we need to send such prepares?
If the condition is other way (if (txn->final_lsn <
snapshot_was_exported_at_lsn)) then we would send such prepares?

Just to clarify, after the initial copy, say when we start/restart the
streaming and we picked the serialized snapshot of some other
WALSender, we don't need to use snapshot_was_exported_at_lsn
corresponding to the serialized snapshot of some other slot?

I am not sure for the matter of this problem enabling 2PC during
initial sync (initial snapshot + copy) matters. Because, if we follow
the above, then it should be fine even if 2PC is enabled?

That will easily work across restarts, won't lead to sending data twice,
etc.

Yeah, we need to probably store this new point as slot's persistent information.

--
With Regards,
Amit Kapila.

#43Andres Freund
andres@anarazel.de
In reply to: Markus Wanner (#41)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-22 09:25:52 +0100, Markus Wanner wrote:

What issues do you see with the approach I proposed?

Very significant increase in complexity for initializing a logical
replica, because there's no easy way to just use the initial slot.

- Andres

#44Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#42)
Re: repeated decoding of prepared transactions

Hi,

On 2021-02-22 14:29:09 +0530, Amit Kapila wrote:

On Mon, Feb 22, 2021 at 9:39 AM Andres Freund <andres@anarazel.de> wrote:

What I am proposing is to instead track the point at which the slot
gained consistency - a simple LSN. That way you can change the above
logic to instead be

if (txn->final_lsn > snapshot_was_exported_at_lsn)
ReorderBufferReplay();
else
...

With this if the prepare is prior to consistent_snapshot
(snapshot_was_exported_at_lsn)) and commit prepared is after then we
won't send the prepare and data. Won't we need to send such prepares?
If the condition is other way (if (txn->final_lsn <
snapshot_was_exported_at_lsn)) then we would send such prepares?

Yea, I inverted the condition...

Just to clarify, after the initial copy, say when we start/restart the
streaming and we picked the serialized snapshot of some other
WALSender, we don't need to use snapshot_was_exported_at_lsn
corresponding to the serialized snapshot of some other slot?

Correct.

Yeah, we need to probably store this new point as slot's persistent information.

Should be fine I think...

Greetings,

Andres Freund

#45Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#43)
Re: repeated decoding of prepared transactions

On Mon, Feb 22, 2021 at 2:55 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-02-22 09:25:52 +0100, Markus Wanner wrote:

What issues do you see with the approach I proposed?

Very significant increase in complexity for initializing a logical
replica, because there's no easy way to just use the initial slot.

+1. The solution proposed by Andres seems to be better than other
ideas we have discussed so far.

--
With Regards,
Amit Kapila.

#46Ajin Cherian
itsajin@gmail.com
In reply to: Andres Freund (#44)
Re: repeated decoding of prepared transactions

On Mon, Feb 22, 2021 at 8:27 PM Andres Freund <andres@anarazel.de> wrote:

Yeah, we need to probably store this new point as slot's persistent information.

Should be fine I think...

This idea looks convincing. I'll write up a patch incorporating these changes.

regards,
Ajin Cherian
Fujitsu Australia

#47Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#44)
Re: repeated decoding of prepared transactions

On Mon, Feb 22, 2021 at 2:57 PM Andres Freund <andres@anarazel.de> wrote:

Yeah, we need to probably store this new point as slot's persistent information.

Should be fine I think...

So, we are in agreement that the above solution will work and we won't
need to resend the prepare after the restart. I would like to once
again describe few other points which we are discussing in this and
other thread [1]/messages/by-id/CAA4eK1L=dhuCRvyDvrXX5wZgc7s1hLRD29CKCK6oaHtVCPgiFA@mail.gmail.com to see if you or others have any different opinion on
those:

1. With respect to SQL APIs, currently 'two-phase-commit' is a plugin
option so it is possible that the first time when it gets changes
(pg_logical_slot_get_changes) *without* 2PC enabled it will not get
the prepared even though prepare is after consistent snapshot. Now
next time during getting changes (pg_logical_slot_get_changes) if the
2PC option is enabled it will skip prepare because by that time
start_decoding_at has been moved. So the user will only get commit
prepared as shown in the example in the email above [2]/messages/by-id/CAFPTHDbbth0XVwf=WXcmp=_2nU5oNaK4CxetUr22qi1UM5v6rw@mail.gmail.com. I think it
might be better to allow enable/disable of 2PC only at create_slot
time. Markus, Ajin, and I seem to be in agreement on this point. I
think the same will be true for subscriber-side solution as well.

2. There is a possibility that subscribers miss some prepared xacts.
Let me explain the problem and solution. Currently, when we create a
subscription, we first launch apply-worker and create the main apply
worker slot and then launch table sync workers as required. Now,
assume, the apply worker slot is created and after that, we launch
tablesync worker, which will initiate its slot (sync_slot) creation.
Then, on the publisher-side, the situation is such that there is a
prepared transaction that happens before we reach a consistent
snapshot for sync_slot.

Because the WALSender corresponding to apply worker is already running
so it will be in consistent state, for it, such a prepared xact can be
decoded and it will send the same to the subscriber. On the
subscriber-side, it can skip applying the data-modification operations
because the corresponding rel is still not in a ready state (see
should_apply_changes_for_rel and its callers) simply because the
corresponding table sync worker is not finished yet. But prepare will
occur and it will lead to a prepared transaction on the subscriber.

In this situation, tablesync worker has skipped prepare because the
snapshot was not consistent and then it exited because it is in sync
with the apply worker. And apply worker has skipped because tablesync
was in-progress. Later when Commit prepared will come, the
apply-worker will simply commit the previously prepared transaction
and we will never see the prepared transaction data.

For example, consider below situation:
LSN of Prepare t1 = 490, tablesync skipped because it was prior to a
consistent point
LSN of Commit t2 = 500
LSN of commit t3 = 510
LSN of Commit Prepared t1 = 520.

Tablesync worker initially (via copy) got till xact t3 (LSN = 510).
For the apply worker, we get all the above LSN's as it is started
before tablesync worker and reached a consistent point before it. In
the above example, there is a possibility that we miss applying data
for xact t1 as explained in previous paragraphs.

So, the basic premise is that we can't allow tablesync workers to skip
prepared transactions (which can be processed by apply worker) and
process later commits.

I have one idea to address this. When we get the first begin (for
prepared xact) in the apply-worker, we can check if there are any
relations in "not_ready" state and if so then just wait till all the
relations become in sync with the apply worker. This is to avoid that
any of the tablesync workers might skip prepared xact and we don't
want apply worker to also skip the same.

Now, it is possible that some tablesync worker has copied the data and
moved the sync position ahead of where the current apply worker's
position is. In such a case, we need to process transactions in apply
worker such that we can process commits if any, and write prepared
transactions to file. For prepared transactions, we can take decisions
only once the commit prepared for them has arrived.

The other idea I have thought of for this is to only enable 2PC after
initial sync (when both apply worker and tablesync workers are in
sync) is over but I think that can lead to the problem described in
point 1.

[1]: /messages/by-id/CAA4eK1L=dhuCRvyDvrXX5wZgc7s1hLRD29CKCK6oaHtVCPgiFA@mail.gmail.com
[2]: /messages/by-id/CAFPTHDbbth0XVwf=WXcmp=_2nU5oNaK4CxetUr22qi1UM5v6rw@mail.gmail.com

--
With Regards,
Amit Kapila.

#48Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#47)
1 attachment(s)
Re: repeated decoding of prepared transactions

On Tue, Feb 23, 2021 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1. With respect to SQL APIs, currently 'two-phase-commit' is a plugin
option so it is possible that the first time when it gets changes
(pg_logical_slot_get_changes) *without* 2PC enabled it will not get
the prepared even though prepare is after consistent snapshot. Now
next time during getting changes (pg_logical_slot_get_changes) if the
2PC option is enabled it will skip prepare because by that time
start_decoding_at has been moved. So the user will only get commit
prepared as shown in the example in the email above [2]. I think it
might be better to allow enable/disable of 2PC only at create_slot
time. Markus, Ajin, and I seem to be in agreement on this point. I
think the same will be true for subscriber-side solution as well.

Attaching a patch which avoids repeated decoding of prepares using the
approach suggest by Andres. Added snapshot_was_exported_at_lsn;
fields in ReplicationSlotPersistentData and SnapBuild which now stores
the LSN at which the slot snapshot is exported the time it is created.
This patch also modifies the API pg_create_logical_replication_slot()
to take an extra parameter to enable two-phase commits
and disables pg_logical_slot_get_changes() from enabling two-phase.
I plan to split this into two patches next. But do review and let me
know if you have any comments.

regards,
Ajin

Attachments:

v1-0001-Avoid-repeated-decoding-of-prepared-transactions.patchapplication/octet-stream; name=v1-0001-Avoid-repeated-decoding-of-prepared-transactions.patchDownload
From fb03795b01583a6d26ded9674f4e2173b16d7cd8 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Tue, 23 Feb 2021 23:13:53 -0500
Subject: [PATCH v1] Avoid repeated decoding of prepared transactions.

Prepared transactions were decoded again after a restart on COMMIT PREPARED
when two-phase commits were enabled. This was done to avoid missing a prepared
transaction that is not part of initial snapshot. Now, this missing PREPARE is identified
by defining a new LSN called snapshot_was_exported_at_lsn and stored in the
slot and snapbuild structures. Prepared transactions that were prior this LSN
will be replayed on a COMMIT PREPARED.

This commit also changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 72 +++++++++-------------
 .../test_decoding/expected/twophase_snapshot.out   |  6 +-
 contrib/test_decoding/expected/twophase_stream.out | 38 +++---------
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +-
 contrib/test_decoding/sql/twophase.sql             | 34 +++++-----
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +--
 contrib/test_decoding/test_decoding.c              | 18 ++----
 doc/src/sgml/logicaldecoding.sgml                  | 15 ++---
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/decode.c           | 11 +++-
 src/backend/replication/logical/logical.c          | 13 +++-
 src/backend/replication/logical/logicalfuncs.c     |  8 +++
 src/backend/replication/logical/reorderbuffer.c    | 28 ++-------
 src/backend/replication/logical/snapbuild.c        | 19 +++++-
 src/backend/replication/repl_gram.y                | 14 ++++-
 src/backend/replication/repl_scanner.l             |  1 +
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 10 ++-
 src/backend/replication/walsender.c                |  6 +-
 src/include/catalog/pg_proc.dat                    |  8 +--
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/reorderbuffer.h            |  2 +
 src/include/replication/slot.h                     | 14 ++++-
 src/include/replication/snapbuild.h                |  4 +-
 24 files changed, 178 insertions(+), 162 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..8d61107 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,21 +32,17 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -55,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -78,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -93,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -102,19 +98,16 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -146,7 +139,7 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -158,15 +151,11 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -178,7 +167,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -188,28 +177,25 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -222,7 +208,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,33 +59,11 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -103,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -111,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index 894e4f5..17ada0f 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -71,11 +71,11 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -87,26 +87,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..28c876d 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,12 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
+
+	/*
+	 * Disable two-phase here, it will be set in the core if it was
+	 * enabled whole creating the slot.
+	 */
+	ctx->twophase = false;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index cf705ed..562a7cb 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -179,7 +179,7 @@ postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +188,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -201,7 +201,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -210,7 +210,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
@@ -822,10 +822,7 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter can be used in this callback to
       check if the plugin has already received this prepare in which case it
-      can skip the remaining changes of the transaction. This can only happen
-      if the user restarts the decoding after receiving the prepare for a
-      transaction but before receiving the commit prepared say because of some
-      error.
+      can either error out or skip the remaining changes of the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..f6c5fc5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1318,6 +1318,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..1be4715 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -663,6 +663,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	XLogRecPtr	origin_lsn = InvalidXLogRecPtr;
 	TimestampTz commit_time = parsed->xact_time;
 	RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+	XLogRecPtr  snapshot_was_exported_at_lsn = InvalidXLogRecPtr;
 	int			i;
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -715,7 +716,14 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (two_phase)
 	{
+		/*
+		 * Get the LSN at which the snapshot for this slot was exported.
+		 * ReorderBufferFinishPrepared will decide based on this if the
+		 * transaction should be replayed on COMMIT PREPARED.
+		 */
+		snapshot_was_exported_at_lsn = SnapBuildExportLSNAt(ctx->snapshot_builder);
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									snapshot_was_exported_at_lsn,
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -774,7 +782,6 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/* We can't start streaming unless a consistent state is reached. */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		return;
 	}
 
@@ -792,7 +799,6 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		ReorderBufferInvalidate(ctx->reorder, xid, buf->origptr);
 		return;
 	}
@@ -854,6 +860,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..8555f5e 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.snapshot_was_exported_at_lsn);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,17 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+
+	/*
+	 * The snapshot_was_exported_at_lsn point is required in two-phase
+	 * commits to handle prepared transactions that were not part of this
+	 * snapshot at export time. PREPAREs prior to this point need special
+	 * handling if two-phase commits are enabled.
+	 * The snapshot_was_exported_at_lsn is only updated once when
+	 * the slot is created and is not modified on restarts unlike the
+	 * confirmed_flush point.
+	 */
+	slot->data.snapshot_was_exported_at_lsn = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index f7e0558..4a919d1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -239,6 +239,14 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 									LogicalOutputPrepareWrite,
 									LogicalOutputWrite, NULL);
 
+		/* If twophase is set on the slot at create time, then
+		 * make sure the field in the context is also updated
+		 */
+		if (MyReplicationSlot->data.twophase)
+		{
+			ctx->twophase = true;
+		}
+
 		/*
 		 * After the sanity checks in CreateDecodingContext, make sure the
 		 * restart_lsn is valid.  Avoid "cannot get changes" wording in this
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..9a95a15 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2623,21 +2623,6 @@ ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 	return true;
 }
 
-/* Remember that we have skipped prepare */
-void
-ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
-{
-	ReorderBufferTXN *txn;
-
-	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
-
-	/* unknown transaction, nothing to do */
-	if (txn == NULL)
-		return;
-
-	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
-}
-
 /*
  * Prepare a two-phase transaction.
  *
@@ -2672,6 +2657,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr snapshot_was_exported_at_lsn,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2696,14 +2682,12 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	txn->gid = pstrdup(gid);
 
 	/*
-	 * It is possible that this transaction is not decoded at prepare time
-	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * It is possible that this transaction was not decoded at prepare time
+	 * because by that time we didn't have a consistent snapshot.
+	 * In which case we need to replay the prepared transaction here because
+	 * downstream would not have seen this transaction yet.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < snapshot_was_exported_at_lsn) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..7622b1d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,12 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * In two-phase commits, if the PREPARE is prior to this LSN, then the
+	 * whole transaction needs to be replayed at COMMIT PREPARED.
+	 */
+	XLogRecPtr  snapshot_was_exported_at_lsn;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +275,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr snapshot_was_exported_at_lsn)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +304,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->snapshot_was_exported_at_lsn = snapshot_was_exported_at_lsn;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +365,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildExportLSNAt(SnapBuild *builder)
+{
+	return builder->snapshot_was_exported_at_lsn;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index eb283a8..aeec791 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -84,6 +84,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_SLOT
 %token K_RESERVE_WAL
 %token K_TEMPORARY
+%token K_TWOPHASE
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
@@ -102,6 +103,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	plugin_opt_arg
 %type <str>		opt_slot var_name
 %type <boolval>	opt_temporary
+%type <boolval>	opt_twophase
 %type <list>	create_slot_opt_list
 %type <defelt>	create_slot_opt
 
@@ -242,15 +244,16 @@ create_replication_slot:
 					$$ = (Node *) cmd;
 				}
 			/* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
-			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary opt_twophase K_LOGICAL IDENT create_slot_opt_list
 				{
 					CreateReplicationSlotCmd *cmd;
 					cmd = makeNode(CreateReplicationSlotCmd);
 					cmd->kind = REPLICATION_KIND_LOGICAL;
 					cmd->slotname = $2;
 					cmd->temporary = $3;
-					cmd->plugin = $5;
-					cmd->options = $6;
+					cmd->twophase = $4;
+					cmd->plugin = $6;
+					cmd->options = $7;
 					$$ = (Node *) cmd;
 				}
 			;
@@ -365,6 +368,11 @@ opt_temporary:
 			| /* EMPTY */					{ $$ = false; }
 			;
 
+opt_twophase:
+			K_TWOPHASE						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+			;
+
 opt_slot:
 			K_SLOT IDENT
 				{ $$ = $2; }
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index dcc3c3f..3032c28 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -103,6 +103,7 @@ RESERVE_WAL			{ return K_RESERVE_WAL; }
 LOGICAL				{ return K_LOGICAL; }
 SLOT				{ return K_SLOT; }
 TEMPORARY			{ return K_TEMPORARY; }
+TWOPHASE			{ return K_TWOPHASE; }
 EXPORT_SNAPSHOT		{ return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..38c385b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool twophase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.twophase    = twophase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..a441fa4 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool twophase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, twophase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		twophase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									twophase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -796,6 +799,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8124454..9146e62 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -937,7 +937,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -951,7 +952,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->twophase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1604412..8459488 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10502,10 +10502,10 @@
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..1a933e2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		twophase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..e1842c0 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr snapshot_consistency_lsn,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
@@ -676,6 +677,7 @@ bool		ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 											 TimestampTz prepare_time,
 											 RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid);
+void		ReorderBufferMarkPrepare(ReorderBuffer *rb, TransactionId xid);
 void		ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid, char *gid);
 ReorderBufferTXN *ReorderBufferGetOldestTXN(ReorderBuffer *);
 TransactionId ReorderBufferGetOldestXmin(ReorderBuffer *rb);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..9452604 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,18 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which this slot found consistent point and snapshot exported.
+	 * This is required for two-phase transactions to decide if the whole
+	 * transaction should be replayed at COMMIT PREPARED.
+	 */
+	XLogRecPtr  snapshot_was_exported_at_lsn;
+
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        twophase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -192,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool twophase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..0693115 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr snapshot_was_exported_at_lsn);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildExportLSNAt(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
1.8.3.1

#49Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#48)
2 attachment(s)
Re: repeated decoding of prepared transactions

On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

I plan to split this into two patches next. But do review and let me
know if you have any comments.

Attaching an updated patch-set with the changes for
snapshot_was_exported_at_lsn separated out from the changes for the
APIs pg_create_logical_replication_slot() and
pg_logical_slot_get_changes(). Along with a rebase that takes in a few
more commits since my last patch.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v2-0001-Avoid-repeated-decoding-of-prepared-transactions.patchapplication/octet-stream; name=v2-0001-Avoid-repeated-decoding-of-prepared-transactions.patchDownload
From 017c25df53759f3a88a96b73e2240cfba444d4b4 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 24 Feb 2021 04:34:40 -0500
Subject: [PATCH v2] Avoid repeated decoding of prepared transactions.

Prepared transactions were decoded again after a restart on COMMIT PREPARED
when two-phase commits were enabled. This was done to avoid missing a prepared
transaction that is not part of initial snapshot. Now, this missing PREPARE is identified
by defining a new LSN called snapshot_was_exported_at_lsn and stored in the
slot and snapbuild structures. Prepared transactions that were prior this LSN
will be replayed on a COMMIT PREPARED.
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 src/backend/replication/logical/decode.c           | 11 +++++--
 src/backend/replication/logical/logical.c          | 13 +++++++-
 src/backend/replication/logical/reorderbuffer.c    | 28 ++++------------
 src/backend/replication/logical/snapbuild.c        | 19 ++++++++++-
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/slot.h                     |  7 ++++
 src/include/replication/snapbuild.h                |  4 ++-
 9 files changed, 71 insertions(+), 78 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..9f6e5d5 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -663,6 +663,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	XLogRecPtr	origin_lsn = InvalidXLogRecPtr;
 	TimestampTz commit_time = parsed->xact_time;
 	RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+	XLogRecPtr  snapshot_was_exported_at_lsn = InvalidXLogRecPtr;
 	int			i;
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -715,7 +716,14 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (two_phase)
 	{
+		/*
+		 * Get the LSN at which the snapshot for this slot was exported.
+		 * ReorderBufferFinishPrepared will decide based on this if the
+		 * transaction should be replayed on COMMIT PREPARED.
+		 */
+		snapshot_was_exported_at_lsn = SnapBuildExportLSNAt(ctx->snapshot_builder);
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									snapshot_was_exported_at_lsn,
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -774,7 +782,6 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/* We can't start streaming unless a consistent state is reached. */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		return;
 	}
 
@@ -792,7 +799,6 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		ReorderBufferInvalidate(ctx->reorder, xid, buf->origptr);
 		return;
 	}
@@ -854,6 +860,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..5634635 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.snapshot_was_exported_at_lsn);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,17 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+
+	/*
+	 * The snapshot_was_exported_at_lsn point is required in two-phase
+	 * commits to handle prepared transactions that were not part of this
+	 * snapshot at export time. PREPAREs prior to this point need special
+	 * handling if two-phase commits are enabled.
+	 * The snapshot_was_exported_at_lsn is only updated once when
+	 * the slot is created and is not modified on restarts unlike the
+	 * confirmed_flush point.
+	 */
+	slot->data.snapshot_was_exported_at_lsn = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..5a3c986 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2623,21 +2623,6 @@ ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 	return true;
 }
 
-/* Remember that we have skipped prepare */
-void
-ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
-{
-	ReorderBufferTXN *txn;
-
-	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
-
-	/* unknown transaction, nothing to do */
-	if (txn == NULL)
-		return;
-
-	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
-}
-
 /*
  * Prepare a two-phase transaction.
  *
@@ -2672,6 +2657,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr snapshot_was_exported_at_lsn,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2696,14 +2682,12 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	txn->gid = pstrdup(gid);
 
 	/*
-	 * It is possible that this transaction is not decoded at prepare time
-	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * It is possible that this transaction was not decoded at prepare time
+	 * because by that time we didn't have a consistent snapshot.
+	 * In which case we need to replay the prepared transaction here because
+	 * downstream would not have seen this transaction yet.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < snapshot_was_exported_at_lsn) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..7622b1d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,12 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * In two-phase commits, if the PREPARE is prior to this LSN, then the
+	 * whole transaction needs to be replayed at COMMIT PREPARED.
+	 */
+	XLogRecPtr  snapshot_was_exported_at_lsn;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +275,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr snapshot_was_exported_at_lsn)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +304,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->snapshot_was_exported_at_lsn = snapshot_was_exported_at_lsn;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +365,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildExportLSNAt(SnapBuild *builder)
+{
+	return builder->snapshot_was_exported_at_lsn;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..1dbb50e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr snapshot_consistency_lsn,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..ad8fb37 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,13 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which this slot found consistent point and snapshot exported.
+	 * This is required for two-phase transactions to decide if the whole
+	 * transaction should be replayed at COMMIT PREPARED.
+	 */
+	XLogRecPtr  snapshot_was_exported_at_lsn;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..0693115 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr snapshot_was_exported_at_lsn);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildExportLSNAt(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
1.8.3.1

v2-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchapplication/octet-stream; name=v2-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchDownload
From a5ad5f7789532f70833e0d1fe54c232c5e2ff9bd Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 24 Feb 2021 05:53:14 -0500
Subject: [PATCH v2] Add option to enable two-phase commits in
 pg_create_logical_replication_slot

This commit changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 18 ++++--------
 doc/src/sgml/logicaldecoding.sgml                  | 19 +++++-------
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/logicalfuncs.c     |  8 +++++
 src/backend/replication/repl_gram.y                | 14 +++++++--
 src/backend/replication/repl_scanner.l             |  1 +
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 10 +++++--
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    |  8 ++---
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 18 files changed, 108 insertions(+), 86 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index c51870f..8d61107 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -139,7 +139,7 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -151,7 +151,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -167,7 +167,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -177,7 +177,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -188,14 +188,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -208,7 +208,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index 894e4f5..17ada0f 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -71,11 +71,11 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -87,26 +87,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..28c876d 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,12 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
+
+	/*
+	 * Disable two-phase here, it will be set in the core if it was
+	 * enabled whole creating the slot.
+	 */
+	ctx->twophase = false;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 6455664..019a440 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -179,7 +179,7 @@ postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +188,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -201,7 +201,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -210,7 +210,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
@@ -820,12 +820,9 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       The required <function>begin_prepare_cb</function> callback is called
       whenever the start of a prepared transaction has been decoded. The
       <parameter>gid</parameter> field, which is part of the
-      <parameter>txn</parameter> parameter, can be used in this callback to
-      check if the plugin has already received this <command>PREPARE</command>
-      in which case it can skip the remaining changes of the transaction.
-      This can only happen if the user restarts the decoding after receiving
-      the <command>PREPARE</command> for a transaction but before receiving
-      the <command>COMMIT PREPARED</command>, say because of some error.
+      <parameter>txn</parameter> parameter can be used in this callback to
+      check if the plugin has already received this prepare in which case it
+      can either error out or skip the remaining changes of the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..f6c5fc5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1318,6 +1318,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index f7e0558..4a919d1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -239,6 +239,14 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 									LogicalOutputPrepareWrite,
 									LogicalOutputWrite, NULL);
 
+		/* If twophase is set on the slot at create time, then
+		 * make sure the field in the context is also updated
+		 */
+		if (MyReplicationSlot->data.twophase)
+		{
+			ctx->twophase = true;
+		}
+
 		/*
 		 * After the sanity checks in CreateDecodingContext, make sure the
 		 * restart_lsn is valid.  Avoid "cannot get changes" wording in this
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index eb283a8..aeec791 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -84,6 +84,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_SLOT
 %token K_RESERVE_WAL
 %token K_TEMPORARY
+%token K_TWOPHASE
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
@@ -102,6 +103,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	plugin_opt_arg
 %type <str>		opt_slot var_name
 %type <boolval>	opt_temporary
+%type <boolval>	opt_twophase
 %type <list>	create_slot_opt_list
 %type <defelt>	create_slot_opt
 
@@ -242,15 +244,16 @@ create_replication_slot:
 					$$ = (Node *) cmd;
 				}
 			/* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
-			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary opt_twophase K_LOGICAL IDENT create_slot_opt_list
 				{
 					CreateReplicationSlotCmd *cmd;
 					cmd = makeNode(CreateReplicationSlotCmd);
 					cmd->kind = REPLICATION_KIND_LOGICAL;
 					cmd->slotname = $2;
 					cmd->temporary = $3;
-					cmd->plugin = $5;
-					cmd->options = $6;
+					cmd->twophase = $4;
+					cmd->plugin = $6;
+					cmd->options = $7;
 					$$ = (Node *) cmd;
 				}
 			;
@@ -365,6 +368,11 @@ opt_temporary:
 			| /* EMPTY */					{ $$ = false; }
 			;
 
+opt_twophase:
+			K_TWOPHASE						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+			;
+
 opt_slot:
 			K_SLOT IDENT
 				{ $$ = $2; }
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index dcc3c3f..3032c28 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -103,6 +103,7 @@ RESERVE_WAL			{ return K_RESERVE_WAL; }
 LOGICAL				{ return K_LOGICAL; }
 SLOT				{ return K_SLOT; }
 TEMPORARY			{ return K_TEMPORARY; }
+TWOPHASE			{ return K_TWOPHASE; }
 EXPORT_SNAPSHOT		{ return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..38c385b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool twophase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.twophase    = twophase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..a441fa4 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool twophase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, twophase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		twophase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									twophase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -796,6 +799,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8124454..9146e62 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -937,7 +937,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -951,7 +952,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->twophase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1604412..8459488 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10502,10 +10502,10 @@
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..1a933e2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		twophase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index ad8fb37..9452604 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -98,6 +98,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr  snapshot_was_exported_at_lsn;
 
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        twophase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -199,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool twophase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-- 
1.8.3.1

#50vignesh C
vignesh21@gmail.com
In reply to: Ajin Cherian (#49)
Re: repeated decoding of prepared transactions

On Wed, Feb 24, 2021 at 5:06 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

I plan to split this into two patches next. But do review and let me
know if you have any comments.

Attaching an updated patch-set with the changes for
snapshot_was_exported_at_lsn separated out from the changes for the
APIs pg_create_logical_replication_slot() and
pg_logical_slot_get_changes(). Along with a rebase that takes in a few
more commits since my last patch.

One observation while verifying the patch I noticed that most of
ReplicationSlotPersistentData structure members are displayed in
pg_replication_slots, but I did not see snapshot_was_exported_at_lsn
being displayed. Is this intentional? If not intentional we can
include snapshot_was_exported_at_lsn in pg_replication_slots.

Regards,
Vignesh

#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#49)
2 attachment(s)
Re: repeated decoding of prepared transactions

On Wed, Feb 24, 2021 at 5:06 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

I plan to split this into two patches next. But do review and let me
know if you have any comments.

Attaching an updated patch-set with the changes for
snapshot_was_exported_at_lsn separated out from the changes for the
APIs pg_create_logical_replication_slot() and
pg_logical_slot_get_changes(). Along with a rebase that takes in a few
more commits since my last patch.

Few comments on the first patch:
1. We can't remove ReorderBufferSkipPrepare because we rely on that in
SnapBuildDistributeNewCatalogSnapshot.
2. I have changed the name of the variable from
snapshot_was_exported_at_lsn to snapshot_was_exported_at but I am
still not very sure about this naming because there are times when we
don't export snapshot and we still set this like when creating slots
with CRS_NOEXPORT_SNAPSHOT or when creating via SQL APIs. The other
name that comes to mind is initial_consistency_at, what do you think?
3. Changed comments at various places.

Please find the above changes as a separate patch, if you like you can
include these in the main patch.

Apart from the above, I think the comments related to docs in my
previous email [1]/messages/by-id/CAA4eK1Kr34_TiREr57Wpd=3=03x=1n55DAjwJPGpHAEc4dWfUQ@mail.gmail.com are still valid, can you please take care of those.

[1]: /messages/by-id/CAA4eK1Kr34_TiREr57Wpd=3=03x=1n55DAjwJPGpHAEc4dWfUQ@mail.gmail.com

--
With Regards,
Amit Kapila.

Attachments:

v3-0001-Avoid-repeated-decoding-of-prepared-transactions.patchapplication/octet-stream; name=v3-0001-Avoid-repeated-decoding-of-prepared-transactions.patchDownload
From b939268386b82214b8a8fcc968885ba90f4e12fd Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Wed, 24 Feb 2021 04:34:40 -0500
Subject: [PATCH v3 1/2] Avoid repeated decoding of prepared transactions.

Prepared transactions were decoded again after a restart on COMMIT PREPARED
when two-phase commits were enabled. This was done to avoid missing a prepared
transaction that is not part of initial snapshot. Now, this missing PREPARE is identified
by defining a new LSN called snapshot_was_exported_at_lsn and stored in the
slot and snapbuild structures. Prepared transactions that were prior this LSN
will be replayed on a COMMIT PREPARED.
---
 contrib/test_decoding/expected/twophase.out   | 38 ++++++-------------
 .../expected/twophase_stream.out              | 28 ++------------
 src/backend/replication/logical/decode.c      | 11 +++++-
 src/backend/replication/logical/logical.c     | 13 ++++++-
 .../replication/logical/reorderbuffer.c       | 28 +++-----------
 src/backend/replication/logical/snapbuild.c   | 19 +++++++++-
 src/include/replication/reorderbuffer.h       |  1 +
 src/include/replication/slot.h                |  7 ++++
 src/include/replication/snapbuild.h           |  4 +-
 9 files changed, 71 insertions(+), 78 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bedd1c..c51870f8dd 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd36..d54e640b40 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..9f6e5d52ab 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -663,6 +663,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	XLogRecPtr	origin_lsn = InvalidXLogRecPtr;
 	TimestampTz commit_time = parsed->xact_time;
 	RepOriginId origin_id = XLogRecGetOrigin(buf->record);
+	XLogRecPtr  snapshot_was_exported_at_lsn = InvalidXLogRecPtr;
 	int			i;
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -715,7 +716,14 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (two_phase)
 	{
+		/*
+		 * Get the LSN at which the snapshot for this slot was exported.
+		 * ReorderBufferFinishPrepared will decide based on this if the
+		 * transaction should be replayed on COMMIT PREPARED.
+		 */
+		snapshot_was_exported_at_lsn = SnapBuildExportLSNAt(ctx->snapshot_builder);
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									snapshot_was_exported_at_lsn,
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -774,7 +782,6 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/* We can't start streaming unless a consistent state is reached. */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		return;
 	}
 
@@ -792,7 +799,6 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
 	{
-		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		ReorderBufferInvalidate(ctx->reorder, xid, buf->origptr);
 		return;
 	}
@@ -854,6 +860,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45ff43..5634635e94 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.snapshot_was_exported_at_lsn);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,17 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+
+	/*
+	 * The snapshot_was_exported_at_lsn point is required in two-phase
+	 * commits to handle prepared transactions that were not part of this
+	 * snapshot at export time. PREPAREs prior to this point need special
+	 * handling if two-phase commits are enabled.
+	 * The snapshot_was_exported_at_lsn is only updated once when
+	 * the slot is created and is not modified on restarts unlike the
+	 * confirmed_flush point.
+	 */
+	slot->data.snapshot_was_exported_at_lsn = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b963211e..5a3c986c85 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2623,21 +2623,6 @@ ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 	return true;
 }
 
-/* Remember that we have skipped prepare */
-void
-ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
-{
-	ReorderBufferTXN *txn;
-
-	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
-
-	/* unknown transaction, nothing to do */
-	if (txn == NULL)
-		return;
-
-	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
-}
-
 /*
  * Prepare a two-phase transaction.
  *
@@ -2672,6 +2657,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr snapshot_was_exported_at_lsn,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2696,14 +2682,12 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	txn->gid = pstrdup(gid);
 
 	/*
-	 * It is possible that this transaction is not decoded at prepare time
-	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * It is possible that this transaction was not decoded at prepare time
+	 * because by that time we didn't have a consistent snapshot.
+	 * In which case we need to replay the prepared transaction here because
+	 * downstream would not have seen this transaction yet.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < snapshot_was_exported_at_lsn) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e11788795f..7622b1dba1 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -164,6 +164,12 @@ struct SnapBuild
 	 */
 	XLogRecPtr	start_decoding_at;
 
+	/*
+	 * In two-phase commits, if the PREPARE is prior to this LSN, then the
+	 * whole transaction needs to be replayed at COMMIT PREPARED.
+	 */
+	XLogRecPtr  snapshot_was_exported_at_lsn;
+
 	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
@@ -269,7 +275,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr snapshot_was_exported_at_lsn)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +304,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->snapshot_was_exported_at_lsn = snapshot_was_exported_at_lsn;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -356,6 +364,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 	return builder->state;
 }
 
+/*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildExportLSNAt(SnapBuild *builder)
+{
+	return builder->snapshot_was_exported_at_lsn;
+}
+
 /*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf7af..1dbb50eb53 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr snapshot_consistency_lsn,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b3fc..ad8fb371a6 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,13 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which this slot found consistent point and snapshot exported.
+	 * This is required for two-phase transactions to decide if the whole
+	 * transaction should be replayed at COMMIT PREPARED.
+	 */
+	XLogRecPtr  snapshot_was_exported_at_lsn;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a58e..0693115005 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr snapshot_was_exported_at_lsn);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildExportLSNAt(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
2.28.0.windows.1

v3-0002-Fixed-review-comments.patchapplication/octet-stream; name=v3-0002-Fixed-review-comments.patchDownload
From 7e69b0f926743a3b15233b6c9c5b3436b927cb1d Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Thu, 25 Feb 2021 16:46:39 +0530
Subject: [PATCH v3 2/2] Fixed review comments.

---
 src/backend/replication/logical/decode.c      | 11 ++------
 src/backend/replication/logical/logical.c     | 14 ++--------
 .../replication/logical/reorderbuffer.c       | 28 +++++++++++++++----
 src/backend/replication/logical/snapbuild.c   | 18 +++++++-----
 src/include/replication/slot.h                |  7 ++---
 src/include/replication/snapbuild.h           |  4 +--
 6 files changed, 43 insertions(+), 39 deletions(-)

diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 9f6e5d52ab..7f83bbb6a3 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -663,7 +663,6 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	XLogRecPtr	origin_lsn = InvalidXLogRecPtr;
 	TimestampTz commit_time = parsed->xact_time;
 	RepOriginId origin_id = XLogRecGetOrigin(buf->record);
-	XLogRecPtr  snapshot_was_exported_at_lsn = InvalidXLogRecPtr;
 	int			i;
 
 	if (parsed->xinfo & XACT_XINFO_HAS_ORIGIN)
@@ -716,14 +715,8 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (two_phase)
 	{
-		/*
-		 * Get the LSN at which the snapshot for this slot was exported.
-		 * ReorderBufferFinishPrepared will decide based on this if the
-		 * transaction should be replayed on COMMIT PREPARED.
-		 */
-		snapshot_was_exported_at_lsn = SnapBuildExportLSNAt(ctx->snapshot_builder);
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
-									snapshot_was_exported_at_lsn,
+									SnapBuildExportAt(ctx->snapshot_builder),
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -782,6 +775,7 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/* We can't start streaming unless a consistent state is reached. */
 	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 	{
+		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		return;
 	}
 
@@ -799,6 +793,7 @@ DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	 */
 	if (DecodeTXNNeedSkip(ctx, buf, parsed->dbId, origin_id))
 	{
+		ReorderBufferSkipPrepare(ctx->reorder, xid);
 		ReorderBufferInvalidate(ctx->reorder, xid, buf->origptr);
 		return;
 	}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5634635e94..2ea82c6513 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot, slot->data.snapshot_was_exported_at_lsn);
+								need_full_snapshot, slot->data.snapshot_was_exported_at);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,17 +590,7 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
-
-	/*
-	 * The snapshot_was_exported_at_lsn point is required in two-phase
-	 * commits to handle prepared transactions that were not part of this
-	 * snapshot at export time. PREPAREs prior to this point need special
-	 * handling if two-phase commits are enabled.
-	 * The snapshot_was_exported_at_lsn is only updated once when
-	 * the slot is created and is not modified on restarts unlike the
-	 * confirmed_flush point.
-	 */
-	slot->data.snapshot_was_exported_at_lsn = ctx->reader->EndRecPtr;
+	slot->data.snapshot_was_exported_at = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5a3c986c85..8aefc7eaa7 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2623,6 +2623,21 @@ ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 	return true;
 }
 
+/* Remember that we have skipped prepare */
+void
+ReorderBufferSkipPrepare(ReorderBuffer* rb, TransactionId xid)
+{
+	ReorderBufferTXN* txn;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
+
+	/* unknown transaction, nothing to do */
+	if (txn == NULL)
+		return;
+
+	txn->txn_flags |= RBTXN_SKIPPED_PREPARE;
+}
+
 /*
  * Prepare a two-phase transaction.
  *
@@ -2657,7 +2672,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
-							XLogRecPtr snapshot_was_exported_at_lsn,
+							XLogRecPtr snapshot_was_exported_at,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2682,12 +2697,13 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	txn->gid = pstrdup(gid);
 
 	/*
-	 * It is possible that this transaction was not decoded at prepare time
-	 * because by that time we didn't have a consistent snapshot.
-	 * In which case we need to replay the prepared transaction here because
-	 * downstream would not have seen this transaction yet.
+	 * It is possible that this transaction is not decoded at prepare time
+	 * either because by that time we didn't have a consistent snapshot or it
+	 * was decoded earlier but we have restarted. We only need to send the
+	 * prepare if it was not decoded earlier. We don't need to decode the xact
+	 * for aborts if it is not done already.
 	 */
-	if ((txn->final_lsn < snapshot_was_exported_at_lsn) && is_commit)
+	if ((txn->final_lsn < snapshot_was_exported_at) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 7622b1dba1..fe486c4c83 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,10 +165,14 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
-	 * In two-phase commits, if the PREPARE is prior to this LSN, then the
-	 * whole transaction needs to be replayed at COMMIT PREPARED.
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported snapshot for initial copy.
+	 *
+	 * The prepared transactions that are not covered by initial snapshot needs
+	 * to be sent later along with commit prepared and they must be before this
+	 * point.
 	 */
-	XLogRecPtr  snapshot_was_exported_at_lsn;
+	XLogRecPtr  snapshot_was_exported_at;
 
 	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
@@ -276,7 +280,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
 						bool need_full_snapshot,
-						XLogRecPtr snapshot_was_exported_at_lsn)
+						XLogRecPtr snapshot_was_exported_at)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -304,7 +308,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
-	builder->snapshot_was_exported_at_lsn = snapshot_was_exported_at_lsn;
+	builder->snapshot_was_exported_at = snapshot_was_exported_at;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -368,9 +372,9 @@ SnapBuildCurrentState(SnapBuild *builder)
  * Return the LSN at which the snapshot was exported
  */
 XLogRecPtr
-SnapBuildExportLSNAt(SnapBuild *builder)
+SnapBuildExportAt(SnapBuild *builder)
 {
-	return builder->snapshot_was_exported_at_lsn;
+	return builder->snapshot_was_exported_at;
 }
 
 /*
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index ad8fb371a6..5764293555 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -92,11 +92,10 @@ typedef struct ReplicationSlotPersistentData
 	XLogRecPtr	confirmed_flush;
 
 	/*
-	 * LSN at which this slot found consistent point and snapshot exported.
-	 * This is required for two-phase transactions to decide if the whole
-	 * transaction should be replayed at COMMIT PREPARED.
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported snapshot for initial copy.
 	 */
-	XLogRecPtr  snapshot_was_exported_at_lsn;
+	XLogRecPtr  snapshot_was_exported_at;
 
 	/* plugin name */
 	NameData	plugin;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 0693115005..f15ac6639f 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -62,7 +62,7 @@ extern void CheckPointSnapBuild(void);
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
 										  bool need_full_snapshot,
-										  XLogRecPtr snapshot_was_exported_at_lsn);
+										  XLogRecPtr snapshot_was_exported_at);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -76,7 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
-extern XLogRecPtr SnapBuildExportLSNAt(SnapBuild *builder);
+extern XLogRecPtr SnapBuildExportAt(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
2.28.0.windows.1

#52Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#51)
1 attachment(s)
Re: repeated decoding of prepared transactions

On Thu, Feb 25, 2021 at 10:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few comments on the first patch:
1. We can't remove ReorderBufferSkipPrepare because we rely on that in
SnapBuildDistributeNewCatalogSnapshot.
2. I have changed the name of the variable from
snapshot_was_exported_at_lsn to snapshot_was_exported_at but I am
still not very sure about this naming because there are times when we
don't export snapshot and we still set this like when creating slots
with CRS_NOEXPORT_SNAPSHOT or when creating via SQL APIs. The other
name that comes to mind is initial_consistency_at, what do you think?
3. Changed comments at various places.

Please find the above changes as a separate patch, if you like you can
include these in the main patch.

Apart from the above, I think the comments related to docs in my
previous email [1] are still valid, can you please take care of those.

[1] - /messages/by-id/CAA4eK1Kr34_TiREr57Wpd=3=03x=1n55DAjwJPGpHAEc4dWfUQ@mail.gmail.com

I've added Amit's changes-patch as well as addressed comments related
to docs in the attached patch.

On Thu, Feb 25, 2021 at 10:34 PM vignesh C <vignesh21@gmail.com> wrote:
One observation while verifying the patch I noticed that most of
ReplicationSlotPersistentData structure members are displayed in
pg_replication_slots, but I did not see snapshot_was_exported_at_lsn
being displayed. Is this intentional? If not intentional we can
include snapshot_was_exported_at_lsn in pg_replication_slots.

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patchapplication/octet-stream; name=v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patchDownload
From 17ff5462c3c4137796e044e95e609cbe77df673c Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 26 Feb 2021 02:58:49 -0500
Subject: [PATCH v4] Avoid repeated decoding of prepared transactions.

Prepared transactions were decoded again after a restart on COMMIT PREPARED
when two-phase commits were enabled. This was done to avoid missing a prepared
transaction that is not part of initial snapshot. Now, this missing PREPARE is identified
by defining a new LSN called snapshot_was_exported_at_lsn and stored in the
slot and snapbuild structures. Prepared transactions that were prior this LSN
will be replayed on a COMMIT PREPARED.
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 doc/src/sgml/catalogs.sgml                         | 11 +++++++
 doc/src/sgml/logicaldecoding.sgml                  |  9 ++---
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/decode.c           |  2 ++
 src/backend/replication/logical/logical.c          |  3 +-
 src/backend/replication/logical/reorderbuffer.c    | 14 ++++----
 src/backend/replication/logical/snapbuild.c        | 23 ++++++++++++-
 src/backend/replication/slotfuncs.c                |  7 +++-
 src/include/catalog/pg_proc.dat                    |  6 ++--
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/slot.h                     |  6 ++++
 src/include/replication/snapbuild.h                |  4 ++-
 src/test/regress/expected/rules.out                |  3 +-
 15 files changed, 83 insertions(+), 73 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index db29905..366a971 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11479,6 +11479,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>snapshot_was_exported_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) at which the logical
+       slot found a consistent point at the time of slot creation.
+       <literal>NULL</literal> for physical slots.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>wal_status</structfield> <type>text</type>
       </para>
       <para>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 6455664..18d592d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -191,9 +191,6 @@ postgres=# COMMIT PREPARED 'test_prepared1';
 postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
- 0/1689DC0 | 529 | BEGIN 529
- 0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'
- 0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
 (4 row)
 
@@ -822,10 +819,8 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter, can be used in this callback to
       check if the plugin has already received this <command>PREPARE</command>
-      in which case it can skip the remaining changes of the transaction.
-      This can only happen if the user restarts the decoding after receiving
-      the <command>PREPARE</command> for a transaction but before receiving
-      the <command>COMMIT PREPARED</command>, say because of some error.
+      in which case it can either error out or skip the remaining changes of 
+      the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..3f94398 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -893,6 +893,7 @@ CREATE VIEW pg_replication_slots AS
             L.catalog_xmin,
             L.restart_lsn,
             L.confirmed_flush_lsn,
+			L.snapshot_was_exported_at,
             L.wal_status,
             L.safe_wal_size
     FROM pg_get_replication_slots() AS L
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..7f83bbb 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -716,6 +716,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	if (two_phase)
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									SnapBuildExportAt(ctx->snapshot_builder),
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -854,6 +855,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..2ea82c6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.snapshot_was_exported_at);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,7 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+	slot->data.snapshot_was_exported_at = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..8aefc7e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2625,9 +2625,9 @@ ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 
 /* Remember that we have skipped prepare */
 void
-ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
+ReorderBufferSkipPrepare(ReorderBuffer* rb, TransactionId xid)
 {
-	ReorderBufferTXN *txn;
+	ReorderBufferTXN* txn;
 
 	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
 
@@ -2672,6 +2672,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr snapshot_was_exported_at,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2698,12 +2699,11 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	/*
 	 * It is possible that this transaction is not decoded at prepare time
 	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * was decoded earlier but we have restarted. We only need to send the
+	 * prepare if it was not decoded earlier. We don't need to decode the xact
+	 * for aborts if it is not done already.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < snapshot_was_exported_at) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..fe486c4 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,16 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported snapshot for initial copy.
+	 *
+	 * The prepared transactions that are not covered by initial snapshot needs
+	 * to be sent later along with commit prepared and they must be before this
+	 * point.
+	 */
+	XLogRecPtr  snapshot_was_exported_at;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +279,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr snapshot_was_exported_at)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +308,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->snapshot_was_exported_at = snapshot_was_exported_at;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +369,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildExportAt(SnapBuild *builder)
+{
+	return builder->snapshot_was_exported_at;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..f5efdff 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -236,7 +236,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 14
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -344,6 +344,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 		else
 			nulls[i++] = true;
 
+		if (slot_contents.data.snapshot_was_exported_at != InvalidXLogRecPtr)
+			values[i++] = LSNGetDatum(slot_contents.data.snapshot_was_exported_at);
+		else
+			nulls[i++] = true;
+
 		/*
 		 * If invalidated_at is valid and restart_lsn is invalid, we know for
 		 * certain that the slot has been invalidated.  Otherwise, test
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1604412..e1e4f3e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10496,9 +10496,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,pg_lsn,text,int8}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,snapshot_was_exported_at,wal_status,safe_wal_size}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..1dbb50e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr snapshot_consistency_lsn,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..5764293 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,12 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported snapshot for initial copy.
+	 */
+	XLogRecPtr  snapshot_was_exported_at;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..f15ac66 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr snapshot_was_exported_at);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildExportAt(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34..10647d4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1476,9 +1476,10 @@ pg_replication_slots| SELECT l.slot_name,
     l.catalog_xmin,
     l.restart_lsn,
     l.confirmed_flush_lsn,
+    l.snapshot_was_exported_at,
     l.wal_status,
     l.safe_wal_size
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, snapshot_was_exported_at, wal_status, safe_wal_size)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
1.8.3.1

#53Ajin Cherian
itsajin@gmail.com
In reply to: Ajin Cherian (#52)
2 attachment(s)
Re: repeated decoding of prepared transactions

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v4-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchapplication/octet-stream; name=v4-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchDownload
From 3c08b2ecda5470d24f595ec920226843563a5a95 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 26 Feb 2021 05:38:09 -0500
Subject: [PATCH v4] Add option to enable two-phase commits in
 pg_create_logical_replication_slot

This commit changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 18 ++++--------
 doc/src/sgml/logicaldecoding.sgml                  | 19 ++++++------
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/logicalfuncs.c     |  8 +++++
 src/backend/replication/repl_gram.y                | 14 +++++++--
 src/backend/replication/repl_scanner.l             |  1 +
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 10 +++++--
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    |  8 ++---
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 18 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index c51870f..8d61107 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -139,7 +139,7 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -151,7 +151,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -167,7 +167,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -177,7 +177,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -188,14 +188,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -208,7 +208,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index 894e4f5..17ada0f 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -71,11 +71,11 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -87,26 +87,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..28c876d 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,12 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
+
+	/*
+	 * Disable two-phase here, it will be set in the core if it was
+	 * enabled whole creating the slot.
+	 */
+	ctx->twophase = false;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 18d592d..5839d96 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -169,17 +169,18 @@ $ pg_recvlogical -d postgres --slot=test --drop-slot
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set
-  <varname>max_prepared_transactions</varname> to at least 1. You must also set
-  the option 'two-phase-commit' to 1 while calling
-  <function>pg_logical_slot_get_changes</function>. Note that we will stream
-  the entire transaction after the commit if it is not already decoded.
+  <varname>max_prepared_transactions</varname> to at least 1. You must also have
+  set the two-phase parameter as 'true' while creating the slot using
+  <function>pg_create_logical_replication_slot</function>
+  Note that we will stream the entire transaction after the commit if it
+  is not already decoded.
   </para>
 <programlisting>
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +189,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
@@ -198,7 +199,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -207,7 +208,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3f94398..1e61274 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1319,6 +1319,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index f7e0558..4a919d1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -239,6 +239,14 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 									LogicalOutputPrepareWrite,
 									LogicalOutputWrite, NULL);
 
+		/* If twophase is set on the slot at create time, then
+		 * make sure the field in the context is also updated
+		 */
+		if (MyReplicationSlot->data.twophase)
+		{
+			ctx->twophase = true;
+		}
+
 		/*
 		 * After the sanity checks in CreateDecodingContext, make sure the
 		 * restart_lsn is valid.  Avoid "cannot get changes" wording in this
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index eb283a8..aeec791 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -84,6 +84,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_SLOT
 %token K_RESERVE_WAL
 %token K_TEMPORARY
+%token K_TWOPHASE
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
@@ -102,6 +103,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	plugin_opt_arg
 %type <str>		opt_slot var_name
 %type <boolval>	opt_temporary
+%type <boolval>	opt_twophase
 %type <list>	create_slot_opt_list
 %type <defelt>	create_slot_opt
 
@@ -242,15 +244,16 @@ create_replication_slot:
 					$$ = (Node *) cmd;
 				}
 			/* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
-			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary opt_twophase K_LOGICAL IDENT create_slot_opt_list
 				{
 					CreateReplicationSlotCmd *cmd;
 					cmd = makeNode(CreateReplicationSlotCmd);
 					cmd->kind = REPLICATION_KIND_LOGICAL;
 					cmd->slotname = $2;
 					cmd->temporary = $3;
-					cmd->plugin = $5;
-					cmd->options = $6;
+					cmd->twophase = $4;
+					cmd->plugin = $6;
+					cmd->options = $7;
 					$$ = (Node *) cmd;
 				}
 			;
@@ -365,6 +368,11 @@ opt_temporary:
 			| /* EMPTY */					{ $$ = false; }
 			;
 
+opt_twophase:
+			K_TWOPHASE						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+			;
+
 opt_slot:
 			K_SLOT IDENT
 				{ $$ = $2; }
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index dcc3c3f..3032c28 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -103,6 +103,7 @@ RESERVE_WAL			{ return K_RESERVE_WAL; }
 LOGICAL				{ return K_LOGICAL; }
 SLOT				{ return K_SLOT; }
 TEMPORARY			{ return K_TEMPORARY; }
+TWOPHASE			{ return K_TWOPHASE; }
 EXPORT_SNAPSHOT		{ return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..38c385b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool twophase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.twophase    = twophase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index f5efdff..974f3af 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool twophase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, twophase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		twophase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									twophase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -801,6 +804,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8124454..9146e62 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -937,7 +937,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -951,7 +952,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->twophase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e1e4f3e..187418a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10502,10 +10502,10 @@
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..1a933e2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		twophase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5764293..c441af4 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -97,6 +97,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr  snapshot_was_exported_at;
 
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        twophase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -198,7 +203,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool twophase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-- 
1.8.3.1

v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patchapplication/octet-stream; name=v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patchDownload
From 17ff5462c3c4137796e044e95e609cbe77df673c Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 26 Feb 2021 02:58:49 -0500
Subject: [PATCH v4] Avoid repeated decoding of prepared transactions.

Prepared transactions were decoded again after a restart on COMMIT PREPARED
when two-phase commits were enabled. This was done to avoid missing a prepared
transaction that is not part of initial snapshot. Now, this missing PREPARE is identified
by defining a new LSN called snapshot_was_exported_at_lsn and stored in the
slot and snapbuild structures. Prepared transactions that were prior this LSN
will be replayed on a COMMIT PREPARED.
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 doc/src/sgml/catalogs.sgml                         | 11 +++++++
 doc/src/sgml/logicaldecoding.sgml                  |  9 ++---
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/decode.c           |  2 ++
 src/backend/replication/logical/logical.c          |  3 +-
 src/backend/replication/logical/reorderbuffer.c    | 14 ++++----
 src/backend/replication/logical/snapbuild.c        | 23 ++++++++++++-
 src/backend/replication/slotfuncs.c                |  7 +++-
 src/include/catalog/pg_proc.dat                    |  6 ++--
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/slot.h                     |  6 ++++
 src/include/replication/snapbuild.h                |  4 ++-
 src/test/regress/expected/rules.out                |  3 +-
 15 files changed, 83 insertions(+), 73 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index db29905..366a971 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11479,6 +11479,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>snapshot_was_exported_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) at which the logical
+       slot found a consistent point at the time of slot creation.
+       <literal>NULL</literal> for physical slots.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>wal_status</structfield> <type>text</type>
       </para>
       <para>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 6455664..18d592d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -191,9 +191,6 @@ postgres=# COMMIT PREPARED 'test_prepared1';
 postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
- 0/1689DC0 | 529 | BEGIN 529
- 0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'
- 0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
 (4 row)
 
@@ -822,10 +819,8 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter, can be used in this callback to
       check if the plugin has already received this <command>PREPARE</command>
-      in which case it can skip the remaining changes of the transaction.
-      This can only happen if the user restarts the decoding after receiving
-      the <command>PREPARE</command> for a transaction but before receiving
-      the <command>COMMIT PREPARED</command>, say because of some error.
+      in which case it can either error out or skip the remaining changes of 
+      the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..3f94398 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -893,6 +893,7 @@ CREATE VIEW pg_replication_slots AS
             L.catalog_xmin,
             L.restart_lsn,
             L.confirmed_flush_lsn,
+			L.snapshot_was_exported_at,
             L.wal_status,
             L.safe_wal_size
     FROM pg_get_replication_slots() AS L
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..7f83bbb 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -716,6 +716,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	if (two_phase)
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									SnapBuildExportAt(ctx->snapshot_builder),
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -854,6 +855,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..2ea82c6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.snapshot_was_exported_at);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,7 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+	slot->data.snapshot_was_exported_at = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..8aefc7e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2625,9 +2625,9 @@ ReorderBufferRememberPrepareInfo(ReorderBuffer *rb, TransactionId xid,
 
 /* Remember that we have skipped prepare */
 void
-ReorderBufferSkipPrepare(ReorderBuffer *rb, TransactionId xid)
+ReorderBufferSkipPrepare(ReorderBuffer* rb, TransactionId xid)
 {
-	ReorderBufferTXN *txn;
+	ReorderBufferTXN* txn;
 
 	txn = ReorderBufferTXNByXid(rb, xid, false, NULL, InvalidXLogRecPtr, false);
 
@@ -2672,6 +2672,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr snapshot_was_exported_at,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2698,12 +2699,11 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	/*
 	 * It is possible that this transaction is not decoded at prepare time
 	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * was decoded earlier but we have restarted. We only need to send the
+	 * prepare if it was not decoded earlier. We don't need to decode the xact
+	 * for aborts if it is not done already.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < snapshot_was_exported_at) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..fe486c4 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,16 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported snapshot for initial copy.
+	 *
+	 * The prepared transactions that are not covered by initial snapshot needs
+	 * to be sent later along with commit prepared and they must be before this
+	 * point.
+	 */
+	XLogRecPtr  snapshot_was_exported_at;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +279,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr snapshot_was_exported_at)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +308,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->snapshot_was_exported_at = snapshot_was_exported_at;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +369,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildExportAt(SnapBuild *builder)
+{
+	return builder->snapshot_was_exported_at;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..f5efdff 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -236,7 +236,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 14
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -344,6 +344,11 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 		else
 			nulls[i++] = true;
 
+		if (slot_contents.data.snapshot_was_exported_at != InvalidXLogRecPtr)
+			values[i++] = LSNGetDatum(slot_contents.data.snapshot_was_exported_at);
+		else
+			nulls[i++] = true;
+
 		/*
 		 * If invalidated_at is valid and restart_lsn is invalid, we know for
 		 * certain that the slot has been invalidated.  Otherwise, test
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1604412..e1e4f3e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10496,9 +10496,9 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,pg_lsn,text,int8}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,snapshot_was_exported_at,wal_status,safe_wal_size}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..1dbb50e 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr snapshot_consistency_lsn,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..5764293 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,12 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported snapshot for initial copy.
+	 */
+	XLogRecPtr  snapshot_was_exported_at;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..f15ac66 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr snapshot_was_exported_at);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildExportAt(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34..10647d4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1476,9 +1476,10 @@ pg_replication_slots| SELECT l.slot_name,
     l.catalog_xmin,
     l.restart_lsn,
     l.confirmed_flush_lsn,
+    l.snapshot_was_exported_at,
     l.wal_status,
     l.safe_wal_size
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size)
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, snapshot_was_exported_at, wal_status, safe_wal_size)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
1.8.3.1

#54vignesh C
vignesh21@gmail.com
In reply to: Ajin Cherian (#53)
Re: repeated decoding of prepared transactions

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

Thanks for fixing and providing an updated patch. Patch applies, make
check and make check-world passes. I could see the issue working fine.

Few minor comments:
+       <structfield>snapshot_was_exported_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) at which the logical
+       slot found a consistent point at the time of slot creation.
+       <literal>NULL</literal> for physical slots.
+      </para></entry>
+     </row>

I had seen earlier also we had some discussion on naming
snapshot_was_exported_at. Can we change snapshot_was_exported_at to
snapshot_exported_lsn, I felt if we can include the lsn in the name,
the user will be able to interpret easily and also it will be similar
to other columns in pg_replication_slots view.

L.restart_lsn,
L.confirmed_flush_lsn,
+ L.snapshot_was_exported_at,
L.wal_status,
L.safe_wal_size

Looks like there is some indentation issue here.

Regards,
Vignesh

#55Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#54)
Re: repeated decoding of prepared transactions

On Fri, Feb 26, 2021 at 7:26 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

Thanks for fixing and providing an updated patch. Patch applies, make
check and make check-world passes. I could see the issue working fine.

Few minor comments:
+       <structfield>snapshot_was_exported_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) at which the logical
+       slot found a consistent point at the time of slot creation.
+       <literal>NULL</literal> for physical slots.
+      </para></entry>
+     </row>

I had seen earlier also we had some discussion on naming
snapshot_was_exported_at. Can we change snapshot_was_exported_at to
snapshot_exported_lsn, I felt if we can include the lsn in the name,
the user will be able to interpret easily and also it will be similar
to other columns in pg_replication_slots view.

I have recommended above to change this name to initial_consistency_at
because there are times when we don't export snapshot and we still set
this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when
creating via SQL APIs. I am not sure why Ajin neither changed the
name nor responded to that comment. What is your opinion?

--
With Regards,
Amit Kapila.

#56Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#50)
Re: repeated decoding of prepared transactions

On Thu, Feb 25, 2021 at 5:04 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Feb 24, 2021 at 5:06 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian <itsajin@gmail.com> wrote:

I plan to split this into two patches next. But do review and let me
know if you have any comments.

Attaching an updated patch-set with the changes for
snapshot_was_exported_at_lsn separated out from the changes for the
APIs pg_create_logical_replication_slot() and
pg_logical_slot_get_changes(). Along with a rebase that takes in a few
more commits since my last patch.

One observation while verifying the patch I noticed that most of
ReplicationSlotPersistentData structure members are displayed in
pg_replication_slots, but I did not see snapshot_was_exported_at_lsn
being displayed. Is this intentional? If not intentional we can
include snapshot_was_exported_at_lsn in pg_replication_slots.

On thinking about this point, I feel we don't need this new parameter
in the view because I am not able to see how it is of any use to the
user. Over time, corresponding to that LSN there won't be any WAL
record or maybe WAL would be overwritten. I think this is primarily
for our internal use so let's not expose it. I intend to remove it
from the patch unless you have some reason for exposing this to the
user.

--
With Regards,
Amit Kapila.

#57Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#55)
Re: repeated decoding of prepared transactions

On Sat, 27 Feb, 2021, 1:59 pm Amit Kapila, <amit.kapila16@gmail.com> wrote:

I have recommended above to change this name to initial_consistency_at
because there are times when we don't export snapshot and we still set
this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when
creating via SQL APIs. I am not sure why Ajin neither changed the
name nor responded to that comment. What is your opinion?

I am fine with the name initial_consistency_at. I am also fine with not
showing this in the pg_replication_slot view and keeping this internal.

Regards,
Ajin Cherian
Fujitsu Australia

Show quoted text
#58Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#53)
2 attachment(s)
Re: repeated decoding of prepared transactions

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

Thanks, I have made some minor changes to the first patch and now it
looks good to me. The changes are as below:
1. Removed the changes related to exposing this new parameter via view
as mentioned in my previous email.
2. Changed the variable name initial_consistent_point.
3. Ran pgindent, minor changes in comments, and modified the commit message.

Let me know what you think about these changes.

Next, I'll review your second patch which allows the 2PC option to be
enabled only at slot creation time.

--
With Regards,
Amit Kapila.

Attachments:

v5-0001-Avoid-repeated-decoding-of-prepared-transactions-.patchapplication/x-patch; name=v5-0001-Avoid-repeated-decoding-of-prepared-transactions-.patchDownload
From b4d4504b64452ba6cc8602f66acac8209317da0a Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 26 Feb 2021 02:58:49 -0500
Subject: [PATCH v5 1/2] Avoid repeated decoding of prepared transactions after
 the restart.

In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consisten_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 doc/src/sgml/logicaldecoding.sgml                  |  9 ++---
 src/backend/replication/logical/decode.c           |  2 ++
 src/backend/replication/logical/logical.c          |  3 +-
 src/backend/replication/logical/reorderbuffer.c    | 10 +++---
 src/backend/replication/logical/snapbuild.c        | 24 +++++++++++++-
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/slot.h                     |  7 ++++
 src/include/replication/snapbuild.h                |  4 ++-
 10 files changed, 60 insertions(+), 66 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 6455664..18d592d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -191,9 +191,6 @@ postgres=# COMMIT PREPARED 'test_prepared1';
 postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
- 0/1689DC0 | 529 | BEGIN 529
- 0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'
- 0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
 (4 row)
 
@@ -822,10 +819,8 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter, can be used in this callback to
       check if the plugin has already received this <command>PREPARE</command>
-      in which case it can skip the remaining changes of the transaction.
-      This can only happen if the user restarts the decoding after receiving
-      the <command>PREPARE</command> for a transaction but before receiving
-      the <command>COMMIT PREPARED</command>, say because of some error.
+      in which case it can either error out or skip the remaining changes of 
+      the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..423188d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -716,6 +716,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	if (two_phase)
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									SnapBuildInitialConsistentPoint(ctx->snapshot_builder),
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -854,6 +855,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..3f6d723 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.initial_consistent_point);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,7 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+	slot->data.initial_consistent_point = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..91600ac 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2672,6 +2672,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr initial_consistent_point,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2698,12 +2699,11 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	/*
 	 * It is possible that this transaction is not decoded at prepare time
 	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * was decoded earlier but we have restarted. We only need to send the
+	 * prepare if it was not decoded earlier. We don't need to decode the xact
+	 * for aborts if it is not done already.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < initial_consistent_point) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..6087467 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,17 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported a snapshot for the
+	 * initial copy.
+	 *
+	 * The prepared transactions that are not covered by initial snapshot
+	 * needs to be sent later along with commit prepared and they must be
+	 * before this point.
+	 */
+	XLogRecPtr	initial_consistent_point;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +280,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr initial_consistent_point)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +309,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->initial_consistent_point = initial_consistent_point;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +370,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildInitialConsistentPoint(SnapBuild *builder)
+{
+	return builder->initial_consistent_point;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..565a961 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr initial_consistent_point,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..5c3fde2 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,13 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported a snapshot for the
+	 * initial copy.
+	 */
+	XLogRecPtr	initial_consistent_point;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..fbabce6 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr initial_consistent_point);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildInitialConsistentPoint(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
1.8.3.1

v5-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchapplication/x-patch; name=v5-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchDownload
From 5a986ec983bea2740e20d95c8ee4db574237b68c Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 27 Feb 2021 11:18:28 +0530
Subject: [PATCH v5 2/2] Add option to enable two-phase commits in
 pg_create_logical_replication_slot.

This commit changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 18 ++++--------
 doc/src/sgml/logicaldecoding.sgml                  | 19 ++++++------
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/logicalfuncs.c     |  8 +++++
 src/backend/replication/repl_gram.y                | 14 +++++++--
 src/backend/replication/repl_scanner.l             |  1 +
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 10 +++++--
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    |  8 ++---
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 18 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index c51870f..8d61107 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -139,7 +139,7 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -151,7 +151,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -167,7 +167,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -177,7 +177,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -188,14 +188,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -208,7 +208,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index 894e4f5..17ada0f 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -71,11 +71,11 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -87,26 +87,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..28c876d 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,12 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
+
+	/*
+	 * Disable two-phase here, it will be set in the core if it was
+	 * enabled whole creating the slot.
+	 */
+	ctx->twophase = false;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 18d592d..5839d96 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -169,17 +169,18 @@ $ pg_recvlogical -d postgres --slot=test --drop-slot
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set
-  <varname>max_prepared_transactions</varname> to at least 1. You must also set
-  the option 'two-phase-commit' to 1 while calling
-  <function>pg_logical_slot_get_changes</function>. Note that we will stream
-  the entire transaction after the commit if it is not already decoded.
+  <varname>max_prepared_transactions</varname> to at least 1. You must also have
+  set the two-phase parameter as 'true' while creating the slot using
+  <function>pg_create_logical_replication_slot</function>
+  Note that we will stream the entire transaction after the commit if it
+  is not already decoded.
   </para>
 <programlisting>
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +189,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
@@ -198,7 +199,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -207,7 +208,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..f6c5fc5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1318,6 +1318,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index f7e0558..4a919d1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -239,6 +239,14 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 									LogicalOutputPrepareWrite,
 									LogicalOutputWrite, NULL);
 
+		/* If twophase is set on the slot at create time, then
+		 * make sure the field in the context is also updated
+		 */
+		if (MyReplicationSlot->data.twophase)
+		{
+			ctx->twophase = true;
+		}
+
 		/*
 		 * After the sanity checks in CreateDecodingContext, make sure the
 		 * restart_lsn is valid.  Avoid "cannot get changes" wording in this
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index eb283a8..aeec791 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -84,6 +84,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_SLOT
 %token K_RESERVE_WAL
 %token K_TEMPORARY
+%token K_TWOPHASE
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
@@ -102,6 +103,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	plugin_opt_arg
 %type <str>		opt_slot var_name
 %type <boolval>	opt_temporary
+%type <boolval>	opt_twophase
 %type <list>	create_slot_opt_list
 %type <defelt>	create_slot_opt
 
@@ -242,15 +244,16 @@ create_replication_slot:
 					$$ = (Node *) cmd;
 				}
 			/* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
-			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary opt_twophase K_LOGICAL IDENT create_slot_opt_list
 				{
 					CreateReplicationSlotCmd *cmd;
 					cmd = makeNode(CreateReplicationSlotCmd);
 					cmd->kind = REPLICATION_KIND_LOGICAL;
 					cmd->slotname = $2;
 					cmd->temporary = $3;
-					cmd->plugin = $5;
-					cmd->options = $6;
+					cmd->twophase = $4;
+					cmd->plugin = $6;
+					cmd->options = $7;
 					$$ = (Node *) cmd;
 				}
 			;
@@ -365,6 +368,11 @@ opt_temporary:
 			| /* EMPTY */					{ $$ = false; }
 			;
 
+opt_twophase:
+			K_TWOPHASE						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+			;
+
 opt_slot:
 			K_SLOT IDENT
 				{ $$ = $2; }
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index dcc3c3f..3032c28 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -103,6 +103,7 @@ RESERVE_WAL			{ return K_RESERVE_WAL; }
 LOGICAL				{ return K_LOGICAL; }
 SLOT				{ return K_SLOT; }
 TEMPORARY			{ return K_TEMPORARY; }
+TWOPHASE			{ return K_TWOPHASE; }
 EXPORT_SNAPSHOT		{ return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..38c385b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool twophase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.twophase    = twophase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..a441fa4 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool twophase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, twophase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		twophase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									twophase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -796,6 +799,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8124454..9146e62 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -937,7 +937,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -951,7 +952,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->twophase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710..1d9e51a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10502,10 +10502,10 @@
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..1a933e2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		twophase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5c3fde2..f524544 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -98,6 +98,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	initial_consistent_point;
 
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        twophase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -199,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool twophase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-- 
1.8.3.1

#59Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#58)
2 attachment(s)
Re: repeated decoding of prepared transactions

On Sat, Feb 27, 2021 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

Thanks, I have made some minor changes to the first patch and now it
looks good to me. The changes are as below:
1. Removed the changes related to exposing this new parameter via view
as mentioned in my previous email.
2. Changed the variable name initial_consistent_point.
3. Ran pgindent, minor changes in comments, and modified the commit message.

Let me know what you think about these changes.

In the attached, I have just bumped SNAPBUILD_VERSION as we are
adding a new member in the SnapBuild structure.

Next, I'll review your second patch which allows the 2PC option to be
enabled only at slot creation time.

Few comments on 0002 patch:
=========================
1.
+
+ /*
+ * Disable two-phase here, it will be set in the core if it was
+ * enabled whole creating the slot.
+ */
+ ctx->twophase = false;

Typo, /whole/while. I think we don't need to initialize this variable
here at all.

2.
+ /* If twophase is set on the slot at create time, then
+ * make sure the field in the context is also updated
+ */
+ if (MyReplicationSlot->data.twophase)
+ {
+ ctx->twophase = true;
+ }
+

For multi-line comments, the first line of comment should be empty.
Also, I think this is not the right place because the WALSender path
needs to set it separately. I guess you can set it in
CreateInitDecodingContext/CreateDecodingContext by doing something
like

ctx->twophase &= MyReplicationSlot->data.twophase

3. I think we can support this option at the protocol level in a
separate patch where we need to allow it via replication commands (say
when we support it in CreateSubscription). Right now, there is nothing
to test all the code you have added in repl_gram.y.

4. I think we can expose this new option via pg_replication_slots.

--
With Regards,
Amit Kapila.

Attachments:

v6-0001-Avoid-repeated-decoding-of-prepared-transactions-.patchapplication/octet-stream; name=v6-0001-Avoid-repeated-decoding-of-prepared-transactions-.patchDownload
From aea5b0bc00cb4c282b3e659c492b825880c2e8e6 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Fri, 26 Feb 2021 02:58:49 -0500
Subject: [PATCH v6 1/2] Avoid repeated decoding of prepared transactions after
 the restart.

In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consisten_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 doc/src/sgml/logicaldecoding.sgml                  |  9 ++---
 src/backend/replication/logical/decode.c           |  2 ++
 src/backend/replication/logical/logical.c          |  3 +-
 src/backend/replication/logical/reorderbuffer.c    | 10 +++---
 src/backend/replication/logical/snapbuild.c        | 26 +++++++++++++--
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/slot.h                     |  7 ++++
 src/include/replication/snapbuild.h                |  4 ++-
 10 files changed, 61 insertions(+), 67 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -189,13 +178,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 6455664..18d592d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -191,9 +191,6 @@ postgres=# COMMIT PREPARED 'test_prepared1';
 postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
- 0/1689DC0 | 529 | BEGIN 529
- 0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'
- 0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
 (4 row)
 
@@ -822,10 +819,8 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter, can be used in this callback to
       check if the plugin has already received this <command>PREPARE</command>
-      in which case it can skip the remaining changes of the transaction.
-      This can only happen if the user restarts the decoding after receiving
-      the <command>PREPARE</command> for a transaction but before receiving
-      the <command>COMMIT PREPARED</command>, say because of some error.
+      in which case it can either error out or skip the remaining changes of 
+      the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..423188d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -716,6 +716,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	if (two_phase)
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									SnapBuildInitialConsistentPoint(ctx->snapshot_builder),
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -854,6 +855,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..3f6d723 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.initial_consistent_point);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,7 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+	slot->data.initial_consistent_point = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..91600ac 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2672,6 +2672,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr initial_consistent_point,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2698,12 +2699,11 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	/*
 	 * It is possible that this transaction is not decoded at prepare time
 	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * was decoded earlier but we have restarted. We only need to send the
+	 * prepare if it was not decoded earlier. We don't need to decode the xact
+	 * for aborts if it is not done already.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < initial_consistent_point) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..c42005e 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,17 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported a snapshot for the
+	 * initial copy.
+	 *
+	 * The prepared transactions that are not covered by initial snapshot
+	 * needs to be sent later along with commit prepared and they must be
+	 * before this point.
+	 */
+	XLogRecPtr	initial_consistent_point;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +280,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr initial_consistent_point)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +309,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->initial_consistent_point = initial_consistent_point;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +370,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildInitialConsistentPoint(SnapBuild *builder)
+{
+	return builder->initial_consistent_point;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
@@ -1422,7 +1444,7 @@ typedef struct SnapBuildOnDisk
 	offsetof(SnapBuildOnDisk, version)
 
 #define SNAPBUILD_MAGIC 0x51A1E001
-#define SNAPBUILD_VERSION 3
+#define SNAPBUILD_VERSION 4 
 
 /*
  * Store/Load a snapshot from disk, depending on the snapshot builder's state.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..565a961 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr initial_consistent_point,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..5c3fde2 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,13 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported a snapshot for the
+	 * initial copy.
+	 */
+	XLogRecPtr	initial_consistent_point;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..fbabce6 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr initial_consistent_point);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildInitialConsistentPoint(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
1.8.3.1

v6-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchapplication/octet-stream; name=v6-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchDownload
From a2b3e03e3321a81ed4bf10791cb66fa57c716fe0 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Sat, 27 Feb 2021 11:18:28 +0530
Subject: [PATCH v6 2/2] Add option to enable two-phase commits in
 pg_create_logical_replication_slot.

This commit changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 18 ++++--------
 doc/src/sgml/logicaldecoding.sgml                  | 19 ++++++------
 src/backend/catalog/system_views.sql               |  1 +
 src/backend/replication/logical/logicalfuncs.c     |  8 +++++
 src/backend/replication/repl_gram.y                | 14 +++++++--
 src/backend/replication/repl_scanner.l             |  1 +
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 10 +++++--
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    |  8 ++---
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 18 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index c51870f..8d61107 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -139,7 +139,7 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -151,7 +151,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -167,7 +167,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -177,7 +177,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -188,14 +188,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -208,7 +208,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index 894e4f5..17ada0f 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -71,11 +71,11 @@ WHERE locktype = 'relation'
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding. The
 -- call should return within a second.
 SET statement_timeout = '1s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -87,26 +87,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..28c876d 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,12 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
+
+	/*
+	 * Disable two-phase here, it will be set in the core if it was
+	 * enabled whole creating the slot.
+	 */
+	ctx->twophase = false;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 18d592d..5839d96 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -169,17 +169,18 @@ $ pg_recvlogical -d postgres --slot=test --drop-slot
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set
-  <varname>max_prepared_transactions</varname> to at least 1. You must also set
-  the option 'two-phase-commit' to 1 while calling
-  <function>pg_logical_slot_get_changes</function>. Note that we will stream
-  the entire transaction after the commit if it is not already decoded.
+  <varname>max_prepared_transactions</varname> to at least 1. You must also have
+  set the two-phase parameter as 'true' while creating the slot using
+  <function>pg_create_logical_replication_slot</function>
+  Note that we will stream the entire transaction after the commit if it
+  is not already decoded.
   </para>
 <programlisting>
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +189,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
@@ -198,7 +199,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -207,7 +208,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..f6c5fc5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1318,6 +1318,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index f7e0558..4a919d1 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -239,6 +239,14 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 									LogicalOutputPrepareWrite,
 									LogicalOutputWrite, NULL);
 
+		/* If twophase is set on the slot at create time, then
+		 * make sure the field in the context is also updated
+		 */
+		if (MyReplicationSlot->data.twophase)
+		{
+			ctx->twophase = true;
+		}
+
 		/*
 		 * After the sanity checks in CreateDecodingContext, make sure the
 		 * restart_lsn is valid.  Avoid "cannot get changes" wording in this
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index eb283a8..aeec791 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -84,6 +84,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_SLOT
 %token K_RESERVE_WAL
 %token K_TEMPORARY
+%token K_TWOPHASE
 %token K_EXPORT_SNAPSHOT
 %token K_NOEXPORT_SNAPSHOT
 %token K_USE_SNAPSHOT
@@ -102,6 +103,7 @@ static SQLCmd *make_sqlcmd(void);
 %type <node>	plugin_opt_arg
 %type <str>		opt_slot var_name
 %type <boolval>	opt_temporary
+%type <boolval>	opt_twophase
 %type <list>	create_slot_opt_list
 %type <defelt>	create_slot_opt
 
@@ -242,15 +244,16 @@ create_replication_slot:
 					$$ = (Node *) cmd;
 				}
 			/* CREATE_REPLICATION_SLOT slot TEMPORARY LOGICAL plugin */
-			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT create_slot_opt_list
+			| K_CREATE_REPLICATION_SLOT IDENT opt_temporary opt_twophase K_LOGICAL IDENT create_slot_opt_list
 				{
 					CreateReplicationSlotCmd *cmd;
 					cmd = makeNode(CreateReplicationSlotCmd);
 					cmd->kind = REPLICATION_KIND_LOGICAL;
 					cmd->slotname = $2;
 					cmd->temporary = $3;
-					cmd->plugin = $5;
-					cmd->options = $6;
+					cmd->twophase = $4;
+					cmd->plugin = $6;
+					cmd->options = $7;
 					$$ = (Node *) cmd;
 				}
 			;
@@ -365,6 +368,11 @@ opt_temporary:
 			| /* EMPTY */					{ $$ = false; }
 			;
 
+opt_twophase:
+			K_TWOPHASE						{ $$ = true; }
+			| /* EMPTY */					{ $$ = false; }
+			;
+
 opt_slot:
 			K_SLOT IDENT
 				{ $$ = $2; }
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index dcc3c3f..3032c28 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -103,6 +103,7 @@ RESERVE_WAL			{ return K_RESERVE_WAL; }
 LOGICAL				{ return K_LOGICAL; }
 SLOT				{ return K_SLOT; }
 TEMPORARY			{ return K_TEMPORARY; }
+TWOPHASE			{ return K_TWOPHASE; }
 EXPORT_SNAPSHOT		{ return K_EXPORT_SNAPSHOT; }
 NOEXPORT_SNAPSHOT	{ return K_NOEXPORT_SNAPSHOT; }
 USE_SNAPSHOT		{ return K_USE_SNAPSHOT; }
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..38c385b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool twophase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.twophase    = twophase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..a441fa4 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool twophase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, twophase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		twophase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									twophase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -796,6 +799,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8124454..9146e62 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -937,7 +937,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -951,7 +952,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->twophase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710..1d9e51a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10502,10 +10502,10 @@
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..1a933e2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		twophase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5c3fde2..f524544 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -98,6 +98,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	initial_consistent_point;
 
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        twophase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -199,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool twophase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
-- 
1.8.3.1

#60vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#55)
Re: repeated decoding of prepared transactions

On Sat, Feb 27, 2021 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:26 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

Thanks for fixing and providing an updated patch. Patch applies, make
check and make check-world passes. I could see the issue working fine.

Few minor comments:
+       <structfield>snapshot_was_exported_at</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       The address (<literal>LSN</literal>) at which the logical
+       slot found a consistent point at the time of slot creation.
+       <literal>NULL</literal> for physical slots.
+      </para></entry>
+     </row>

I had seen earlier also we had some discussion on naming
snapshot_was_exported_at. Can we change snapshot_was_exported_at to
snapshot_exported_lsn, I felt if we can include the lsn in the name,
the user will be able to interpret easily and also it will be similar
to other columns in pg_replication_slots view.

I have recommended above to change this name to initial_consistency_at
because there are times when we don't export snapshot and we still set
this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when
creating via SQL APIs. I am not sure why Ajin neither changed the
name nor responded to that comment. What is your opinion?

initial_consistency_at looks good to me. That is more understandable.

Regards,
Vignesh

#61vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#59)
Re: repeated decoding of prepared transactions

On Sat, Feb 27, 2021 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Feb 27, 2021 at 11:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian <itsajin@gmail.com> wrote:

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian <itsajin@gmail.com> wrote:

I've updated snapshot_was_exported_at_ member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

Update with both patches.

Thanks, I have made some minor changes to the first patch and now it
looks good to me. The changes are as below:
1. Removed the changes related to exposing this new parameter via view
as mentioned in my previous email.
2. Changed the variable name initial_consistent_point.
3. Ran pgindent, minor changes in comments, and modified the commit message.

Let me know what you think about these changes.

In the attached, I have just bumped SNAPBUILD_VERSION as we are
adding a new member in the SnapBuild structure.

Few minor comments:

git am v6-0001-Avoid-repeated-decoding-of-prepared-transactions-.patch
Applying: Avoid repeated decoding of prepared transactions after the restart.
/home/vignesh/postgres/.git/rebase-apply/patch:286: trailing whitespace.
#define SNAPBUILD_VERSION 4
warning: 1 line adds whitespace errors.

There is one whitespace error.

In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consisten_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.

One minor typo in commit message, initial_consisten_point should be
initial_consistent_point

Regards,
Vignesh

#62Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#59)
2 attachment(s)
Re: repeated decoding of prepared transactions

On Sat, Feb 27, 2021 at 11:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few comments on 0002 patch:
=========================
1.
+
+ /*
+ * Disable two-phase here, it will be set in the core if it was
+ * enabled whole creating the slot.
+ */
+ ctx->twophase = false;

Typo, /whole/while. I think we don't need to initialize this variable
here at all.

2.
+ /* If twophase is set on the slot at create time, then
+ * make sure the field in the context is also updated
+ */
+ if (MyReplicationSlot->data.twophase)
+ {
+ ctx->twophase = true;
+ }
+

For multi-line comments, the first line of comment should be empty.
Also, I think this is not the right place because the WALSender path
needs to set it separately. I guess you can set it in
CreateInitDecodingContext/CreateDecodingContext by doing something
like

ctx->twophase &= MyReplicationSlot->data.twophase

Updated accordingly.

3. I think we can support this option at the protocol level in a
separate patch where we need to allow it via replication commands (say
when we support it in CreateSubscription). Right now, there is nothing
to test all the code you have added in repl_gram.y.

Removed that.

4. I think we can expose this new option via pg_replication_slots.

Done. Added,

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v7-0001-Avoid-repeated-decoding-of-prepared-transactions-.patchapplication/octet-stream; name=v7-0001-Avoid-repeated-decoding-of-prepared-transactions-.patchDownload
From 12fde882b030338484190aa9bad6aa5619f96a2c Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Sun, 28 Feb 2021 04:47:09 -0500
Subject: [PATCH v7] Avoid repeated decoding of prepared transactions after a
 restart.

In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consisten_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It
will break existing slots which is fine in a major release.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
---
 contrib/test_decoding/expected/twophase.out        | 38 +++++++---------------
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--------------
 doc/src/sgml/logicaldecoding.sgml                  |  9 ++---
 src/backend/replication/logical/decode.c           |  2 ++
 src/backend/replication/logical/logical.c          |  3 +-
 src/backend/replication/logical/reorderbuffer.c    | 10 +++---
 src/backend/replication/logical/snapbuild.c        | 26 +++++++++++++--
 src/include/replication/reorderbuffer.h            |  1 +
 src/include/replication/slot.h                     |  7 ++++
 src/include/replication/snapbuild.h                |  4 ++-
 10 files changed, 61 insertions(+), 67 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index afa3566..8a1d06d 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                        data                        
-----------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                  data                                   
--------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+               data                
+-----------------------------------
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -158,14 +151,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                                   data                                    
----------------------------------------------------------------------------
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
- PREPARE TRANSACTION 'test_prepared_lock'
+                 data                 
+--------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
-(5 rows)
+(1 row)
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -188,13 +177,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-                            data                            
-------------------------------------------------------------
- BEGIN
- table public.test_prepared_savepoint: INSERT: a[integer]:1
- PREPARE TRANSACTION 'test_prepared_savepoint'
+                   data                    
+-------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
-(4 rows)
+(1 row)
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index 3acc4acd3..d54e640 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -60,32 +60,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
-                            data                             
--------------------------------------------------------------
- BEGIN
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa1'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa2'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa3'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa4'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa5'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa6'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa7'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa8'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa9'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa10'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa11'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa12'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa13'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa14'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa15'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa16'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa17'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa18'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa19'
- table public.stream_test: INSERT: data[text]:'aaaaaaaaaa20'
- PREPARE TRANSACTION 'test1'
+          data           
+-------------------------
  COMMIT PREPARED 'test1'
-(23 rows)
+(1 row)
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 6455664..18d592d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -191,9 +191,6 @@ postgres=# COMMIT PREPARED 'test_prepared1';
 postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
- 0/1689DC0 | 529 | BEGIN 529
- 0/1689DC0 | 529 | table public.data: INSERT: id[integer]:3 data[text]:'5'
- 0/1689FC0 | 529 | PREPARE TRANSACTION 'test_prepared1', txid 529
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
 (4 row)
 
@@ -822,10 +819,8 @@ typedef bool (*LogicalDecodeFilterPrepareCB) (struct LogicalDecodingContext *ctx
       <parameter>gid</parameter> field, which is part of the
       <parameter>txn</parameter> parameter, can be used in this callback to
       check if the plugin has already received this <command>PREPARE</command>
-      in which case it can skip the remaining changes of the transaction.
-      This can only happen if the user restarts the decoding after receiving
-      the <command>PREPARE</command> for a transaction but before receiving
-      the <command>COMMIT PREPARED</command>, say because of some error.
+      in which case it can either error out or skip the remaining changes of 
+      the transaction.
       <programlisting>
        typedef void (*LogicalDecodeBeginPrepareCB) (struct LogicalDecodingContext *ctx,
                                                     ReorderBufferTXN *txn);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df0..423188d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -716,6 +716,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	if (two_phase)
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
+									SnapBuildInitialConsistentPoint(ctx->snapshot_builder),
 									commit_time, origin_id, origin_lsn,
 									parsed->twophase_gid, true);
 	}
@@ -854,6 +855,7 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	{
 		ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, buf->endptr,
 									abort_time, origin_id, origin_lsn,
+									InvalidXLogRecPtr,
 									parsed->twophase_gid, false);
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index baeb45f..3f6d723 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -207,7 +207,7 @@ StartupDecodingContext(List *output_plugin_options,
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
 		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
-								need_full_snapshot);
+								need_full_snapshot, slot->data.initial_consistent_point);
 
 	ctx->reorder->private_data = ctx;
 
@@ -590,6 +590,7 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 
 	SpinLockAcquire(&slot->mutex);
 	slot->data.confirmed_flush = ctx->reader->EndRecPtr;
+	slot->data.initial_consistent_point = ctx->reader->EndRecPtr;
 	SpinLockRelease(&slot->mutex);
 }
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c3b9632..91600ac 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2672,6 +2672,7 @@ ReorderBufferPrepare(ReorderBuffer *rb, TransactionId xid,
 void
 ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 							XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+							XLogRecPtr initial_consistent_point,
 							TimestampTz commit_time, RepOriginId origin_id,
 							XLogRecPtr origin_lsn, char *gid, bool is_commit)
 {
@@ -2698,12 +2699,11 @@ ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 	/*
 	 * It is possible that this transaction is not decoded at prepare time
 	 * either because by that time we didn't have a consistent snapshot or it
-	 * was decoded earlier but we have restarted. We can't distinguish between
-	 * those two cases so we send the prepare in both the cases and let
-	 * downstream decide whether to process or skip it. We don't need to
-	 * decode the xact for aborts if it is not done already.
+	 * was decoded earlier but we have restarted. We only need to send the
+	 * prepare if it was not decoded earlier. We don't need to decode the xact
+	 * for aborts if it is not done already.
 	 */
-	if (!rbtxn_prepared(txn) && is_commit)
+	if ((txn->final_lsn < initial_consistent_point) && is_commit)
 	{
 		txn->txn_flags |= RBTXN_PREPARE;
 
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index e117887..ed3acad 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -165,6 +165,17 @@ struct SnapBuild
 	XLogRecPtr	start_decoding_at;
 
 	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported a snapshot for the
+	 * initial copy.
+	 *
+	 * The prepared transactions that are not covered by initial snapshot
+	 * needs to be sent later along with commit prepared and they must be
+	 * before this point.
+	 */
+	XLogRecPtr	initial_consistent_point;
+
+	/*
 	 * Don't start decoding WAL until the "xl_running_xacts" information
 	 * indicates there are no running xids with an xid smaller than this.
 	 */
@@ -269,7 +280,8 @@ SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
 						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
-						bool need_full_snapshot)
+						bool need_full_snapshot,
+						XLogRecPtr initial_consistent_point)
 {
 	MemoryContext context;
 	MemoryContext oldcontext;
@@ -297,6 +309,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
+	builder->initial_consistent_point = initial_consistent_point;
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -357,6 +370,15 @@ SnapBuildCurrentState(SnapBuild *builder)
 }
 
 /*
+ * Return the LSN at which the snapshot was exported
+ */
+XLogRecPtr
+SnapBuildInitialConsistentPoint(SnapBuild *builder)
+{
+	return builder->initial_consistent_point;
+}
+
+/*
  * Should the contents of transaction ending at 'ptr' be decoded?
  */
 bool
@@ -1422,7 +1444,7 @@ typedef struct SnapBuildOnDisk
 	offsetof(SnapBuildOnDisk, version)
 
 #define SNAPBUILD_MAGIC 0x51A1E001
-#define SNAPBUILD_VERSION 3
+#define SNAPBUILD_VERSION 4
 
 /*
  * Store/Load a snapshot from disk, depending on the snapshot builder's state.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index bab31bf..565a961 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -643,6 +643,7 @@ void		ReorderBufferCommit(ReorderBuffer *, TransactionId,
 								TimestampTz commit_time, RepOriginId origin_id, XLogRecPtr origin_lsn);
 void		ReorderBufferFinishPrepared(ReorderBuffer *rb, TransactionId xid,
 										XLogRecPtr commit_lsn, XLogRecPtr end_lsn,
+										XLogRecPtr initial_consistent_point,
 										TimestampTz commit_time,
 										RepOriginId origin_id, XLogRecPtr origin_lsn,
 										char *gid, bool is_commit);
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 38a9a0b..5c3fde2 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -91,6 +91,13 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	confirmed_flush;
 
+	/*
+	 * LSN at which we found a consistent point at the time of slot creation.
+	 * This is also the point where we have exported a snapshot for the
+	 * initial copy.
+	 */
+	XLogRecPtr	initial_consistent_point;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index d9f187a..fbabce6 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -61,7 +61,8 @@ extern void CheckPointSnapBuild(void);
 
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
 										  TransactionId xmin_horizon, XLogRecPtr start_lsn,
-										  bool need_full_snapshot);
+										  bool need_full_snapshot,
+										  XLogRecPtr initial_consistent_point);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
 extern void SnapBuildSnapDecRefcount(Snapshot snap);
@@ -75,6 +76,7 @@ extern Snapshot SnapBuildGetOrBuildSnapshot(SnapBuild *builder,
 											TransactionId xid);
 
 extern bool SnapBuildXactNeedsSkip(SnapBuild *snapstate, XLogRecPtr ptr);
+extern XLogRecPtr SnapBuildInitialConsistentPoint(SnapBuild *builder);
 
 extern void SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
 							   TransactionId xid, int nsubxacts,
-- 
1.8.3.1

v7-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchapplication/octet-stream; name=v7-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patchDownload
From 2e37345bf690b0a3d62b5e3f0b8b135159726f98 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Sun, 28 Feb 2021 05:07:19 -0500
Subject: [PATCH v7] Add option to enable two-phase commits in
 pg_create_logical_replication_slot.

This commit changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 13 +--------
 doc/src/sgml/catalogs.sgml                         |  9 ++++++
 doc/src/sgml/logicaldecoding.sgml                  | 19 ++++++------
 src/backend/catalog/system_views.sql               |  4 ++-
 src/backend/replication/logical/logical.c          |  6 ++++
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 14 ++++++---
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    | 14 ++++-----
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 src/test/regress/expected/rules.out                |  5 ++--
 18 files changed, 111 insertions(+), 88 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index 8a1d06d..e5e0f96 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -138,7 +138,7 @@ WHERE locktype = 'relation'
 
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding.
 SET statement_timeout = '180s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -150,7 +150,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -166,7 +166,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -176,7 +176,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -187,14 +187,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -207,7 +207,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index dacedfe..05f18e8 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -70,11 +70,11 @@ WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding.
 SET statement_timeout = '180s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -86,26 +86,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..3dfa503 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,7 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
+
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index db29905..2d61c351 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11529,6 +11529,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        is <literal>-1</literal>.
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>twophase</structfield> <type>bool</type>
+      </para>
+      <para>
+      True if two-phase commits are enabled on this slot.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 18d592d..5839d96 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -169,17 +169,18 @@ $ pg_recvlogical -d postgres --slot=test --drop-slot
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set
-  <varname>max_prepared_transactions</varname> to at least 1. You must also set
-  the option 'two-phase-commit' to 1 while calling
-  <function>pg_logical_slot_get_changes</function>. Note that we will stream
-  the entire transaction after the commit if it is not already decoded.
+  <varname>max_prepared_transactions</varname> to at least 1. You must also have
+  set the two-phase parameter as 'true' while creating the slot using
+  <function>pg_create_logical_replication_slot</function>
+  Note that we will stream the entire transaction after the commit if it
+  is not already decoded.
   </para>
 <programlisting>
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +189,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
@@ -198,7 +199,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -207,7 +208,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..c3c7583 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
             L.restart_lsn,
             L.confirmed_flush_lsn,
             L.wal_status,
-            L.safe_wal_size
+            L.safe_wal_size,
+			L.twophase
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
@@ -1318,6 +1319,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3f6d723..28d18e8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -533,6 +533,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
+	/*
+	 * If twophase is set on the slot at create time, then
+	 * make sure the field in the context is also updated.
+	 */
+	ctx->twophase &= MyReplicationSlot->data.twophase;
+
 	ereport(LOG,
 			(errmsg("starting logical decoding for slot \"%s\"",
 					NameStr(slot->data.name)),
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..38c385b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool twophase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.twophase    = twophase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..5b2864a 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool twophase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, twophase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		twophase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									twophase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -236,7 +239,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 14
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -432,6 +435,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 			values[i++] = Int64GetDatum(failLSN - currlsn);
 		}
 
+		values[i++] = BoolGetDatum(slot_contents.data.twophase);
+
 		Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
@@ -796,6 +801,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 8124454..9146e62 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -937,7 +937,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -951,7 +952,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->twophase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710..b83abda 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10496,16 +10496,16 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,twophase}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..1a933e2 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		twophase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5c3fde2..f524544 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -98,6 +98,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	initial_consistent_point;
 
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        twophase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -199,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool twophase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34..9a054d1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1477,8 +1477,9 @@ pg_replication_slots| SELECT l.slot_name,
     l.restart_lsn,
     l.confirmed_flush_lsn,
     l.wal_status,
-    l.safe_wal_size
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size)
+    l.safe_wal_size,
+    l.twophase
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, twophase)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
1.8.3.1

#63vignesh C
vignesh21@gmail.com
In reply to: Ajin Cherian (#62)
Re: repeated decoding of prepared transactions

On Mon, Mar 1, 2021 at 7:23 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Sat, Feb 27, 2021 at 11:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few comments on 0002 patch:
=========================
1.
+
+ /*
+ * Disable two-phase here, it will be set in the core if it was
+ * enabled whole creating the slot.
+ */
+ ctx->twophase = false;

Typo, /whole/while. I think we don't need to initialize this variable
here at all.

2.
+ /* If twophase is set on the slot at create time, then
+ * make sure the field in the context is also updated
+ */
+ if (MyReplicationSlot->data.twophase)
+ {
+ ctx->twophase = true;
+ }
+

For multi-line comments, the first line of comment should be empty.
Also, I think this is not the right place because the WALSender path
needs to set it separately. I guess you can set it in
CreateInitDecodingContext/CreateDecodingContext by doing something
like

ctx->twophase &= MyReplicationSlot->data.twophase

Updated accordingly.

3. I think we can support this option at the protocol level in a
separate patch where we need to allow it via replication commands (say
when we support it in CreateSubscription). Right now, there is nothing
to test all the code you have added in repl_gram.y.

Removed that.

4. I think we can expose this new option via pg_replication_slots.

Done. Added,

v7-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patch adds
twophase to pg_create_logical_replication_slot, I feel this option
should be documented in src/sgml/func.sgml.

Regards,
Vignesh

#64Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#62)
Re: repeated decoding of prepared transactions

On Mon, Mar 1, 2021 at 7:23 AM Ajin Cherian <itsajin@gmail.com> wrote:

Pushed, the first patch in the series.

--
With Regards,
Amit Kapila.

#65Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#62)
Re: repeated decoding of prepared transactions

On Mon, Mar 1, 2021 at 7:23 AM Ajin Cherian <itsajin@gmail.com> wrote:

Few minor comments on 0002 patch
=============================
1.
ctx->streaming &= enable_streaming;
- ctx->twophase &= enable_twophase;
+
}

Spurious line addition.

2.
-  proallargtypes =>
'{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes =>
'{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,twophase}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name
name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',

I think it is better to use two_phase here and at other places as well
to be consistent with similar parameters.

3.
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
             L.restart_lsn,
             L.confirmed_flush_lsn,
             L.wal_status,
-            L.safe_wal_size
+            L.safe_wal_size,
+ L.twophase
     FROM pg_get_replication_slots() AS L

Indentation issue. Here, you need you spaces instead of tabs.

4.
@@ -533,6 +533,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,

ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

+ /*
+ * If twophase is set on the slot at create time, then
+ * make sure the field in the context is also updated.
+ */
+ ctx->twophase &= MyReplicationSlot->data.twophase;
+

Why didn't you made similar change in CreateInitDecodingContext when I
already suggested the same in my previous email? If we don't make that
change then during slot initialization two_phase will always be true
even though user passed in as false. It looks inconsistent and even
though there is no direct problem due to that but it could be cause of
possible problem in future.

--
With Regards,
Amit Kapila.

#66Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#65)
1 attachment(s)
Re: repeated decoding of prepared transactions

On Mon, Mar 1, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few minor comments on 0002 patch
=============================
1.
ctx->streaming &= enable_streaming;
- ctx->twophase &= enable_twophase;
+
}

Spurious line addition.

Deleted.

2.
-  proallargtypes =>
'{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes =>
'{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,twophase}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name
name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',

I think it is better to use two_phase here and at other places as well
to be consistent with similar parameters.

Updated as requested.

3.
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
L.restart_lsn,
L.confirmed_flush_lsn,
L.wal_status,
-            L.safe_wal_size
+            L.safe_wal_size,
+ L.twophase
FROM pg_get_replication_slots() AS L

Indentation issue. Here, you need you spaces instead of tabs.

Updated.

4.
@@ -533,6 +533,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,

ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

+ /*
+ * If twophase is set on the slot at create time, then
+ * make sure the field in the context is also updated.
+ */
+ ctx->twophase &= MyReplicationSlot->data.twophase;
+

Why didn't you made similar change in CreateInitDecodingContext when I
already suggested the same in my previous email? If we don't make that
change then during slot initialization two_phase will always be true
even though user passed in as false. It looks inconsistent and even
though there is no direct problem due to that but it could be cause of
possible problem in future.

Updated.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

v8-0001-Add-option-to-enable-two-phase-commits-in-pg_crea.patchapplication/octet-stream; name=v8-0001-Add-option-to-enable-two-phase-commits-in-pg_crea.patchDownload
From 7b4b8c9dbc49dd79defbb55e4b9dae409cb5c368 Mon Sep 17 00:00:00 2001
From: Ajin Cherian <ajinc@fast.au.fujitsu.com>
Date: Mon, 1 Mar 2021 19:42:56 -0500
Subject: [PATCH v8] Add option to enable two-phase commits in
 pg_create_logical_replication_slot.

This commit changes the way two-phase commits are enabled in test_decoding plugin.
Two-phase commits can now only be enabled while creating the slot using
pg_create_logical_replication_slot() and cannot be set using pg_logical_slot_get_changes().
For this the API pg_create_logical_replication_slot() is modified to take one more
optional boolean parameter 'twophase', which when set to TRUE enables two-phase commits.
The parameter defaults to FALSE.
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 12 --------
 doc/src/sgml/catalogs.sgml                         |  9 ++++++
 doc/src/sgml/logicaldecoding.sgml                  | 19 ++++++------
 src/backend/catalog/system_views.sql               |  4 ++-
 src/backend/replication/logical/logical.c          | 12 ++++++++
 src/backend/replication/slot.c                     |  3 +-
 src/backend/replication/slotfuncs.c                | 14 ++++++---
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    | 14 ++++-----
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 src/test/regress/expected/rules.out                |  5 ++--
 18 files changed, 116 insertions(+), 88 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index 8a1d06d..e5e0f96 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -138,7 +138,7 @@ WHERE locktype = 'relation'
 
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding.
 SET statement_timeout = '180s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -150,7 +150,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -166,7 +166,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -176,7 +176,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -187,14 +187,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -207,7 +207,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index dacedfe..05f18e8 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -70,11 +70,11 @@ WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding.
 SET statement_timeout = '180s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -86,26 +86,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..ae5f397 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index db29905..a68f3b8 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11529,6 +11529,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        is <literal>-1</literal>.
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase</structfield> <type>bool</type>
+      </para>
+      <para>
+      True if two-phase commits are enabled on this slot.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index f1f13d8..80eb96d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -169,17 +169,18 @@ $ pg_recvlogical -d postgres --slot=test --drop-slot
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set
-  <varname>max_prepared_transactions</varname> to at least 1. You must also set
-  the option 'two-phase-commit' to 1 while calling
-  <function>pg_logical_slot_get_changes</function>. Note that we will stream
-  the entire transaction after the commit if it is not already decoded.
+  <varname>max_prepared_transactions</varname> to at least 1. You must also have
+  set the two-phase parameter as 'true' while creating the slot using
+  <function>pg_create_logical_replication_slot</function>
+  Note that we will stream the entire transaction after the commit if it
+  is not already decoded.
   </para>
 <programlisting>
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +189,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
@@ -198,7 +199,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -207,7 +208,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..fc94a73 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
             L.restart_lsn,
             L.confirmed_flush_lsn,
             L.wal_status,
-            L.safe_wal_size
+            L.safe_wal_size,
+            L.two_phase
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
@@ -1318,6 +1319,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3f6d723..c2aa46d 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -431,6 +431,12 @@ CreateInitDecodingContext(const char *plugin,
 		startup_cb_wrapper(ctx, &ctx->options, true);
 	MemoryContextSwitchTo(old_context);
 
+	/*
+	 * If two_phase is set on the slot at create time, then
+	 * make sure the field in the context is also updated.
+	 */
+	ctx->twophase &= MyReplicationSlot->data.two_phase;
+
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
 	return ctx;
@@ -533,6 +539,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
+	/*
+	 * If two_phase is set on the slot at create time, then
+	 * make sure the field in the context is also updated.
+	 */
+	ctx->twophase &= MyReplicationSlot->data.two_phase;
+
 	ereport(LOG,
 			(errmsg("starting logical decoding for slot \"%s\"",
 					NameStr(slot->data.name)),
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..acd14b2 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -219,7 +219,7 @@ ReplicationSlotValidateName(const char *name, int elevel)
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool two_phase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +277,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.two_phase    = two_phase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..9817b44 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool two_phase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		two_phase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									two_phase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -236,7 +239,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 14
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -432,6 +435,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 			values[i++] = Int64GetDatum(failLSN - currlsn);
 		}
 
+		values[i++] = BoolGetDatum(slot_contents.data.two_phase);
+
 		Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
@@ -796,6 +801,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index eb3f18e..23baa44 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -938,7 +938,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -952,7 +953,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->two_phase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710..3d3974f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10496,16 +10496,16 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..ebc43a0 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		two_phase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5c3fde2..320b2e5 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -98,6 +98,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	initial_consistent_point;
 
+	/*
+	 * Is the slot two-phase enabled?
+	 */
+	bool        two_phase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -199,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool two_phase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34..b1c9b7b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1477,8 +1477,9 @@ pg_replication_slots| SELECT l.slot_name,
     l.restart_lsn,
     l.confirmed_flush_lsn,
     l.wal_status,
-    l.safe_wal_size
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size)
+    l.safe_wal_size,
+    l.two_phase
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
1.8.3.1

#67vignesh C
vignesh21@gmail.com
In reply to: Ajin Cherian (#66)
Re: repeated decoding of prepared transactions

On Tue, Mar 2, 2021 at 6:37 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Mon, Mar 1, 2021 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few minor comments on 0002 patch
=============================
1.
ctx->streaming &= enable_streaming;
- ctx->twophase &= enable_twophase;
+
}

Spurious line addition.

Deleted.

2.
-  proallargtypes =>
'{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes =>
'{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames =>
'{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,twophase}',
prosrc => 'pg_get_replication_slots' },
{ oid => '3786', descr => 'set up a logical replication slot',
proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name
name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',

I think it is better to use two_phase here and at other places as well
to be consistent with similar parameters.

Updated as requested.

3.
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
L.restart_lsn,
L.confirmed_flush_lsn,
L.wal_status,
-            L.safe_wal_size
+            L.safe_wal_size,
+ L.twophase
FROM pg_get_replication_slots() AS L

Indentation issue. Here, you need you spaces instead of tabs.

Updated.

4.
@@ -533,6 +533,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,

ctx->reorder->output_rewrites = ctx->options.receive_rewrites;

+ /*
+ * If twophase is set on the slot at create time, then
+ * make sure the field in the context is also updated.
+ */
+ ctx->twophase &= MyReplicationSlot->data.twophase;
+

Why didn't you made similar change in CreateInitDecodingContext when I
already suggested the same in my previous email? If we don't make that
change then during slot initialization two_phase will always be true
even though user passed in as false. It looks inconsistent and even
though there is no direct problem due to that but it could be cause of
possible problem in future.

Updated.

I have a minor comment regarding the below:
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase</structfield> <type>bool</type>
+      </para>
+      <para>
+      True if two-phase commits are enabled on this slot.
+      </para></entry>
+     </row>

Can we change something like:
True if the slot is enabled for decoding prepared transaction
information. Refer link for more information.(link should point where
more detailed information is available for two-phase in
pg_create_logical_replication_slot).

Also there is one small indentation in that line, I think there should
be one space before "True if....".

Regards,
Vignesh

#68Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#67)
1 attachment(s)
Re: repeated decoding of prepared transactions

On Tue, Mar 2, 2021 at 8:20 AM vignesh C <vignesh21@gmail.com> wrote:

I have a minor comment regarding the below:
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase</structfield> <type>bool</type>
+      </para>
+      <para>
+      True if two-phase commits are enabled on this slot.
+      </para></entry>
+     </row>

Can we change something like:
True if the slot is enabled for decoding prepared transaction
information. Refer link for more information.(link should point where
more detailed information is available for two-phase in
pg_create_logical_replication_slot).

Also there is one small indentation in that line, I think there should
be one space before "True if....".

Okay, fixed these but I added a slightly different description. I have
also added the parameter description for
pg_create_logical_replication_slot in docs and changed the comments at
various places in the code. Apart from that ran pgindent. The patch
looks good to me now. Let me know what do you think?

--
With Regards,
Amit Kapila.

Attachments:

v9-0001-Add-option-to-enable-two_phase-commits-via-pg_cre.patchapplication/octet-stream; name=v9-0001-Add-option-to-enable-two_phase-commits-via-pg_cre.patchDownload
From b95a9bfb6c689556bbe9a1d0c2b201f82a06b91f Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Tue, 2 Mar 2021 09:07:07 +0530
Subject: [PATCH v9] Add option to enable two_phase commits via
 pg_create_logical_replication_slot.

Commit 0aa8a01d04 extends the output plugin API to allow decoding of
prepared xacts and allowed the user to enable/disable the two-phase option
via pg_logical_slot_get_changes(). This can lead to a problem such that
the first time when it gets changes via pg_logical_slot_get_changes()
without 2PC enabled it will not get the prepared even though prepare is
after consistent snapshot. Now next time during getting changes, if the
2PC option is enabled it can skip prepare because by that time start
decoding point has been moved. So the user will only get commit prepared.

Allow to enable/disable this option at the create slot time and default
will be false. It will break the existing slots which is fine in a major
release.

Author: Ajin Cherian
Reviewed-by: Amit Kapila and Vignesh C
Discussion: https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
---
 contrib/test_decoding/expected/twophase.out        | 34 +++++++++++-----------
 .../test_decoding/expected/twophase_snapshot.out   |  6 ++--
 contrib/test_decoding/expected/twophase_stream.out | 10 +++----
 contrib/test_decoding/specs/twophase_snapshot.spec |  4 +--
 contrib/test_decoding/sql/twophase.sql             | 34 +++++++++++-----------
 contrib/test_decoding/sql/twophase_stream.sql      | 10 +++----
 contrib/test_decoding/test_decoding.c              | 12 --------
 doc/src/sgml/catalogs.sgml                         | 10 +++++++
 doc/src/sgml/func.sgml                             | 10 ++++---
 doc/src/sgml/logicaldecoding.sgml                  | 19 ++++++------
 src/backend/catalog/system_views.sql               |  4 ++-
 src/backend/replication/logical/logical.c          | 12 ++++++++
 src/backend/replication/slot.c                     | 10 ++++++-
 src/backend/replication/slotfuncs.c                | 14 ++++++---
 src/backend/replication/walsender.c                |  6 ++--
 src/include/catalog/pg_proc.dat                    | 14 ++++-----
 src/include/nodes/replnodes.h                      |  1 +
 src/include/replication/slot.h                     |  7 ++++-
 src/test/regress/expected/rules.out                |  5 ++--
 19 files changed, 130 insertions(+), 92 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index 8a1d06d..e5e0f96 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -15,14 +15,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -32,7 +32,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (4 rows)
 
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#1'
@@ -42,7 +42,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -51,7 +51,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                 data                 
 -------------------------------------
  ROLLBACK PREPARED 'test_prepared#2'
@@ -74,7 +74,7 @@ WHERE locktype = 'relation'
 (2 rows)
 
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                   data                                   
 -------------------------------------------------------------------------
  BEGIN
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                         data                        
 ----------------------------------------------------
  BEGIN
@@ -98,7 +98,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 (3 rows)
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                data                
 -----------------------------------
  COMMIT PREPARED 'test_prepared#3'
@@ -107,7 +107,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                
 --------------------------------------------------------------------
  BEGIN
@@ -138,7 +138,7 @@ WHERE locktype = 'relation'
 
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding.
 SET statement_timeout = '180s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                    data                                    
 ---------------------------------------------------------------------------
  BEGIN
@@ -150,7 +150,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                  data                 
 --------------------------------------
  COMMIT PREPARED 'test_prepared_lock'
@@ -166,7 +166,7 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                             data                            
 ------------------------------------------------------------
  BEGIN
@@ -176,7 +176,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                    data                    
 -------------------------------------------
  COMMIT PREPARED 'test_prepared_savepoint'
@@ -187,14 +187,14 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
 
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
                                 data                                 
 ---------------------------------------------------------------------
  BEGIN
@@ -207,7 +207,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
  data 
 ------
 (0 rows)
diff --git a/contrib/test_decoding/expected/twophase_snapshot.out b/contrib/test_decoding/expected/twophase_snapshot.out
index 14d9387..0e8e1f5 100644
--- a/contrib/test_decoding/expected/twophase_snapshot.out
+++ b/contrib/test_decoding/expected/twophase_snapshot.out
@@ -6,7 +6,7 @@ step s2txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
 
 f              
-step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); <waiting ...>
+step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true); <waiting ...>
 step s3b: BEGIN;
 step s3txid: SELECT pg_current_xact_id() IS NULL;
 ?column?       
@@ -22,14 +22,14 @@ step s1init: <... completed>
 
 init           
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
 table public.do_write: INSERT: id[integer]:2
 COMMIT         
 step s2cp: COMMIT PREPARED 'test1';
-step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');
+step s1start: SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');
 data           
 
 BEGIN          
diff --git a/contrib/test_decoding/expected/twophase_stream.out b/contrib/test_decoding/expected/twophase_stream.out
index d54e640..b08bb0e 100644
--- a/contrib/test_decoding/expected/twophase_stream.out
+++ b/contrib/test_decoding/expected/twophase_stream.out
@@ -1,6 +1,6 @@
 -- Test streaming of two-phase commits
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
  ?column? 
 ----------
  init
@@ -28,7 +28,7 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -59,7 +59,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
           data           
 -------------------------
  COMMIT PREPARED 'test1'
@@ -81,7 +81,7 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                            data                           
 ----------------------------------------------------------
  streaming message: transactional: 1 prefix: test, sz: 50
@@ -89,7 +89,7 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
                             data                             
 -------------------------------------------------------------
  BEGIN
diff --git a/contrib/test_decoding/specs/twophase_snapshot.spec b/contrib/test_decoding/specs/twophase_snapshot.spec
index 3e70040..e8d9567 100644
--- a/contrib/test_decoding/specs/twophase_snapshot.spec
+++ b/contrib/test_decoding/specs/twophase_snapshot.spec
@@ -15,8 +15,8 @@ teardown
 session "s1"
 setup { SET synchronous_commit=on; }
 
-step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');}
-step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit', '1');}
+step "s1init" {SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding', false, true);}
+step "s1start" {SELECT data  FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false', 'skip-empty-xacts', '1');}
 step "s1insert" { INSERT INTO do_write DEFAULT VALUES; }
 
 session "s2"
diff --git a/contrib/test_decoding/sql/twophase.sql b/contrib/test_decoding/sql/twophase.sql
index dacedfe..05f18e8 100644
--- a/contrib/test_decoding/sql/twophase.sql
+++ b/contrib/test_decoding/sql/twophase.sql
@@ -1,7 +1,7 @@
 -- Test prepared transactions. When two-phase-commit is enabled, transactions are
 -- decoded at PREPARE time rather than at COMMIT PREPARED time.
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE test_prepared1(id integer primary key);
 CREATE TABLE test_prepared2(id integer primary key);
@@ -12,20 +12,20 @@ BEGIN;
 INSERT INTO test_prepared1 VALUES (1);
 INSERT INTO test_prepared1 VALUES (2);
 -- should show nothing because the xact has not been prepared yet.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 PREPARE TRANSACTION 'test_prepared#1';
 -- should show both the above inserts and the PREPARE TRANSACTION.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared#1';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (3);
 PREPARE TRANSACTION 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 ROLLBACK PREPARED 'test_prepared#2';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test prepare of a xact containing ddl. Leaving xact uncommitted for next test.
 BEGIN;
@@ -38,7 +38,7 @@ FROM pg_locks
 WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The insert should show the newly altered column but not the DDL.
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that we decode correctly while an uncommitted prepared xact
 -- with ddl exists.
@@ -47,14 +47,14 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 -- the ALTER will stop us inserting into the other one.
 --
 INSERT INTO test_prepared2 VALUES (5);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 COMMIT PREPARED 'test_prepared#3';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
 INSERT INTO test_prepared2 VALUES (7);
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Check 'CLUSTER' (as operation that hold exclusive lock) doesn't block
 -- logical decoding.
@@ -70,11 +70,11 @@ WHERE locktype = 'relation'
   AND relation = 'test_prepared1'::regclass;
 -- The above CLUSTER command shouldn't cause a timeout on 2pc decoding.
 SET statement_timeout = '180s';
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test savepoints and sub-xacts. Creating savepoints will create
 -- sub-xacts implicitly.
@@ -86,26 +86,26 @@ INSERT INTO test_prepared_savepoint VALUES (2);
 ROLLBACK TO SAVEPOINT test_savepoint;
 PREPARE TRANSACTION 'test_prepared_savepoint';
 -- should show only 1, not 2
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_savepoint';
 -- consume the commit
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test that a GID containing "_nodecode" gets decoded at commit prepared time.
 BEGIN;
 INSERT INTO test_prepared1 VALUES (20);
 PREPARE TRANSACTION 'test_prepared_nodecode';
 -- should show nothing
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 COMMIT PREPARED 'test_prepared_nodecode';
 -- should be decoded now
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 -- Test 8:
 -- cleanup and make sure results are also empty
 DROP TABLE test_prepared1;
 DROP TABLE test_prepared2;
 -- show results. There should be nothing to show
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
 
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/sql/twophase_stream.sql b/contrib/test_decoding/sql/twophase_stream.sql
index e9dd44f..646076d 100644
--- a/contrib/test_decoding/sql/twophase_stream.sql
+++ b/contrib/test_decoding/sql/twophase_stream.sql
@@ -1,7 +1,7 @@
 -- Test streaming of two-phase commits
 
 SET synchronous_commit = on;
-SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
 
 CREATE TABLE stream_test(data text);
 
@@ -18,11 +18,11 @@ ROLLBACK TO s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1';
 -- should show the inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1';
 --should show the COMMIT PREPARED and the other changes in the transaction
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 -- streaming test with sub-transaction and PREPARE/COMMIT PREPARED but with
 -- filtered gid. gids with '_nodecode' will not be decoded at prepare time.
@@ -35,11 +35,11 @@ ROLLBACK to s1;
 INSERT INTO stream_test SELECT repeat('a', 10) || g.i FROM generate_series(1, 20) g(i);
 PREPARE TRANSACTION 'test1_nodecode';
 -- should NOT show inserts after a ROLLBACK
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 COMMIT PREPARED 'test1_nodecode';
 -- should show the inserts but not show a COMMIT PREPARED but a COMMIT
-SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
 
 DROP TABLE stream_test;
 SELECT pg_drop_replication_slot('regression_slot');
diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 929255e..ae5f397 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -164,7 +164,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	ListCell   *option;
 	TestDecodingData *data;
 	bool		enable_streaming = false;
-	bool		enable_twophase = false;
 
 	data = palloc0(sizeof(TestDecodingData));
 	data->context = AllocSetContextCreate(ctx->context,
@@ -265,16 +264,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
 								strVal(elem->arg), elem->defname)));
 		}
-		else if (strcmp(elem->defname, "two-phase-commit") == 0)
-		{
-			if (elem->arg == NULL)
-				continue;
-			else if (!parse_bool(strVal(elem->arg), &enable_twophase))
-				ereport(ERROR,
-						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-						 errmsg("could not parse value \"%s\" for parameter \"%s\"",
-								strVal(elem->arg), elem->defname)));
-		}
 		else
 		{
 			ereport(ERROR,
@@ -286,7 +275,6 @@ pg_decode_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 	}
 
 	ctx->streaming &= enable_streaming;
-	ctx->twophase &= enable_twophase;
 }
 
 /* cleanup this plugin's resources */
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index db29905..b1de6d0 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11529,6 +11529,16 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
        is <literal>-1</literal>.
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase</structfield> <type>bool</type>
+      </para>
+      <para>
+       True if the slot is enabled for decoding prepared transactions.  Always
+       false for physical slots.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 08f0832..ad676dd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25556,7 +25556,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
         <indexterm>
          <primary>pg_create_logical_replication_slot</primary>
         </indexterm>
-        <function>pg_create_logical_replication_slot</function> ( <parameter>slot_name</parameter> <type>name</type>, <parameter>plugin</parameter> <type>name</type> <optional>, <parameter>temporary</parameter> <type>boolean</type> </optional> )
+        <function>pg_create_logical_replication_slot</function> ( <parameter>slot_name</parameter> <type>name</type>, <parameter>plugin</parameter> <type>name</type> <optional>, <parameter>temporary</parameter> <type>boolean</type>, <parameter>two_phase</parameter> <type>boolean</type> </optional> )
         <returnvalue>record</returnvalue>
         ( <parameter>slot_name</parameter> <type>name</type>,
         <parameter>lsn</parameter> <type>pg_lsn</type> )
@@ -25568,9 +25568,11 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
         parameter, <parameter>temporary</parameter>, when set to true, specifies that
         the slot should not be permanently stored to disk and is only meant
         for use by the current session. Temporary slots are also
-        released upon any error. A call to this function has the same
-        effect as the replication protocol command
-        <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
+        released upon any error. The optional fourth parameter,
+        <parameter>two_phase</parameter>, when set to true, specifies
+        that the decoding of prepared transactions is enabled for this
+        slot. A call to this function has the same effect as the replication
+        protocol command <literal>CREATE_REPLICATION_SLOT ... LOGICAL</literal>.
        </para></entry>
       </row>
 
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index f1f13d8..80eb96d 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -55,7 +55,7 @@
 
 <programlisting>
 postgres=# -- Create a slot named 'regression_slot' using the output plugin 'test_decoding'
-postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
+postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
     slot_name    |    lsn
 -----------------+-----------
  regression_slot | 0/16B1970
@@ -169,17 +169,18 @@ $ pg_recvlogical -d postgres --slot=test --drop-slot
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set
-  <varname>max_prepared_transactions</varname> to at least 1. You must also set
-  the option 'two-phase-commit' to 1 while calling
-  <function>pg_logical_slot_get_changes</function>. Note that we will stream
-  the entire transaction after the commit if it is not already decoded.
+  <varname>max_prepared_transactions</varname> to at least 1. You must also have
+  set the two-phase parameter as 'true' while creating the slot using
+  <function>pg_create_logical_replication_slot</function>
+  Note that we will stream the entire transaction after the commit if it
+  is not already decoded.
   </para>
 <programlisting>
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('5');
 postgres=*# PREPARE TRANSACTION 'test_prepared1';
 
-postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/1689DC0 | 529 | BEGIN 529
@@ -188,7 +189,7 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# COMMIT PREPARED 'test_prepared1';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                    data                    
 -----------+-----+--------------------------------------------
  0/168A060 | 529 | COMMIT PREPARED 'test_prepared1', txid 529
@@ -198,7 +199,7 @@ postgres=#-- you can also rollback a prepared transaction
 postgres=# BEGIN;
 postgres=*# INSERT INTO data(data) VALUES('6');
 postgres=*# PREPARE TRANSACTION 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                          data                           
 -----------+-----+---------------------------------------------------------
  0/168A180 | 530 | BEGIN 530
@@ -207,7 +208,7 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
 (3 rows)
 
 postgres=# ROLLBACK PREPARED 'test_prepared2';
-postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1');
+postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NULL);
     lsn    | xid |                     data                     
 -----------+-----+----------------------------------------------
  0/168A4B8 | 530 | ROLLBACK PREPARED 'test_prepared2', txid 530
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd..fc94a73 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
             L.restart_lsn,
             L.confirmed_flush_lsn,
             L.wal_status,
-            L.safe_wal_size
+            L.safe_wal_size,
+            L.two_phase
     FROM pg_get_replication_slots() AS L
             LEFT JOIN pg_database D ON (L.datoid = D.oid);
 
@@ -1318,6 +1319,7 @@ AS 'pg_create_physical_replication_slot';
 CREATE OR REPLACE FUNCTION pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
+    IN twophase boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)
 RETURNS RECORD
 LANGUAGE INTERNAL
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3f6d723..37b75de 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -431,6 +431,12 @@ CreateInitDecodingContext(const char *plugin,
 		startup_cb_wrapper(ctx, &ctx->options, true);
 	MemoryContextSwitchTo(old_context);
 
+	/*
+	 * We allow decoding of prepared transactions iff the two_phase option is
+	 * enabled at the time of slot creation.
+	 */
+	ctx->twophase &= MyReplicationSlot->data.two_phase;
+
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
 	return ctx;
@@ -531,6 +537,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 		startup_cb_wrapper(ctx, &ctx->options, false);
 	MemoryContextSwitchTo(old_context);
 
+	/*
+	 * We allow decoding of prepared transactions iff the two_phase option is
+	 * enabled at the time of slot creation.
+	 */
+	ctx->twophase &= MyReplicationSlot->data.two_phase;
+
 	ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
 
 	ereport(LOG,
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index fb4af2e..75a087c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -216,10 +216,17 @@ ReplicationSlotValidateName(const char *name, int elevel)
  * name: Name of the slot
  * db_specific: logical decoding is db specific; if the slot is going to
  *	   be used for that pass true, otherwise false.
+ * two_phase: Allows decoding of prepared transactions. We allow this option
+ *     to be enabled only at the slot creation time. If we allow this option
+ *     to be changed during decoding then it is quite possible that we skip
+ *     prepare first time because this option was not enabled. Now next time
+ *     during getting changes, if the two_phase  option is enabled it can skip
+ *     prepare because by that time start decoding point has been moved. So the
+ *     user will only get commit prepared.
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
-					  ReplicationSlotPersistency persistency)
+					  ReplicationSlotPersistency persistency, bool two_phase)
 {
 	ReplicationSlot *slot = NULL;
 	int			i;
@@ -277,6 +284,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	namestrcpy(&slot->data.name, name);
 	slot->data.database = db_specific ? MyDatabaseId : InvalidOid;
 	slot->data.persistency = persistency;
+	slot->data.two_phase = two_phase;
 
 	/* and then data only present in shared memory */
 	slot->just_dirtied = false;
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index d24bb5b..9817b44 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -50,7 +50,7 @@ create_physical_replication_slot(char *name, bool immediately_reserve,
 
 	/* acquire replication slot, this will check for conflicting names */
 	ReplicationSlotCreate(name, false,
-						  temporary ? RS_TEMPORARY : RS_PERSISTENT);
+						  temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
 
 	if (immediately_reserve)
 	{
@@ -124,7 +124,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS)
  */
 static void
 create_logical_replication_slot(char *name, char *plugin,
-								bool temporary, XLogRecPtr restart_lsn,
+								bool temporary, bool two_phase,
+								XLogRecPtr restart_lsn,
 								bool find_startpoint)
 {
 	LogicalDecodingContext *ctx = NULL;
@@ -140,7 +141,7 @@ create_logical_replication_slot(char *name, char *plugin,
 	 * error as well.
 	 */
 	ReplicationSlotCreate(name, true,
-						  temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase);
 
 	/*
 	 * Create logical decoding context to find start point or, if we don't
@@ -177,6 +178,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	Name		name = PG_GETARG_NAME(0);
 	Name		plugin = PG_GETARG_NAME(1);
 	bool		temporary = PG_GETARG_BOOL(2);
+	bool		two_phase = PG_GETARG_BOOL(3);
 	Datum		result;
 	TupleDesc	tupdesc;
 	HeapTuple	tuple;
@@ -193,6 +195,7 @@ pg_create_logical_replication_slot(PG_FUNCTION_ARGS)
 	create_logical_replication_slot(NameStr(*name),
 									NameStr(*plugin),
 									temporary,
+									two_phase,
 									InvalidXLogRecPtr,
 									true);
 
@@ -236,7 +239,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS)
 Datum
 pg_get_replication_slots(PG_FUNCTION_ARGS)
 {
-#define PG_GET_REPLICATION_SLOTS_COLS 13
+#define PG_GET_REPLICATION_SLOTS_COLS 14
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
@@ -432,6 +435,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 			values[i++] = Int64GetDatum(failLSN - currlsn);
 		}
 
+		values[i++] = BoolGetDatum(slot_contents.data.two_phase);
+
 		Assert(i == PG_GET_REPLICATION_SLOTS_COLS);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
@@ -796,6 +801,7 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		create_logical_replication_slot(NameStr(*dst_name),
 										plugin,
 										temporary,
+										false,
 										src_restart_lsn,
 										false);
 	}
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index eb3f18e..23baa44 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -938,7 +938,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 	if (cmd->kind == REPLICATION_KIND_PHYSICAL)
 	{
 		ReplicationSlotCreate(cmd->slotname, false,
-							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT);
+							  cmd->temporary ? RS_TEMPORARY : RS_PERSISTENT,
+							  false);
 	}
 	else
 	{
@@ -952,7 +953,8 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
 		 * they get dropped on error as well.
 		 */
 		ReplicationSlotCreate(cmd->slotname, true,
-							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL);
+							  cmd->temporary ? RS_TEMPORARY : RS_EPHEMERAL,
+							  cmd->two_phase);
 	}
 
 	if (cmd->kind == REPLICATION_KIND_LOGICAL)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710..3d3974f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10496,16 +10496,16 @@
   proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f',
   proretset => 't', provolatile => 's', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
+  proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase}',
   prosrc => 'pg_get_replication_slots' },
 { oid => '3786', descr => 'set up a logical replication slot',
   proname => 'pg_create_logical_replication_slot', provolatile => 'v',
-  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool',
-  proallargtypes => '{name,name,bool,name,pg_lsn}',
-  proargmodes => '{i,i,i,o,o}',
-  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
+  proparallel => 'u', prorettype => 'record', proargtypes => 'name name bool bool',
+  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
+  proargmodes => '{i,i,i,i,o,o}',
+  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
   prosrc => 'pg_create_logical_replication_slot' },
 { oid => '4222',
   descr => 'copy a logical replication slot, changing temporality and plugin',
diff --git a/src/include/nodes/replnodes.h b/src/include/nodes/replnodes.h
index faa3a25..ebc43a0 100644
--- a/src/include/nodes/replnodes.h
+++ b/src/include/nodes/replnodes.h
@@ -56,6 +56,7 @@ typedef struct CreateReplicationSlotCmd
 	ReplicationKind kind;
 	char	   *plugin;
 	bool		temporary;
+	bool		two_phase;
 	List	   *options;
 } CreateReplicationSlotCmd;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 5c3fde2..1ad5e6c 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -98,6 +98,11 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	XLogRecPtr	initial_consistent_point;
 
+	/*
+	 * Allow decoding of prepared transactions?
+	 */
+	bool		two_phase;
+
 	/* plugin name */
 	NameData	plugin;
 } ReplicationSlotPersistentData;
@@ -199,7 +204,7 @@ extern void ReplicationSlotsShmemInit(void);
 
 /* management of individual slots */
 extern void ReplicationSlotCreate(const char *name, bool db_specific,
-								  ReplicationSlotPersistency p);
+								  ReplicationSlotPersistency p, bool two_phase);
 extern void ReplicationSlotPersist(void);
 extern void ReplicationSlotDrop(const char *name, bool nowait);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 10a1f34..b1c9b7b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1477,8 +1477,9 @@ pg_replication_slots| SELECT l.slot_name,
     l.restart_lsn,
     l.confirmed_flush_lsn,
     l.wal_status,
-    l.safe_wal_size
-   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size)
+    l.safe_wal_size,
+    l.two_phase
+   FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase)
      LEFT JOIN pg_database d ON ((l.datoid = d.oid)));
 pg_roles| SELECT pg_authid.rolname,
     pg_authid.rolsuper,
-- 
1.8.3.1

#69Ajin Cherian
itsajin@gmail.com
In reply to: Amit Kapila (#68)
Re: repeated decoding of prepared transactions

On Tue, Mar 2, 2021 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One minor comment:
+      </para>
+      <para>
+       True if the slot is enabled for decoding prepared transactions.  Always
+       false for physical slots.
+      </para></entry>
+     </row>

There is an extra space before Always. But when rendered in html this
is not seen, so this might not be a problem.

Other than that no more comments about the patch. Looks good.

regards,
Ajin Cherian
Fujitsu Australia

#70Amit Kapila
amit.kapila16@gmail.com
In reply to: Ajin Cherian (#69)
Re: repeated decoding of prepared transactions

On Tue, Mar 2, 2021 at 10:38 AM Ajin Cherian <itsajin@gmail.com> wrote:

On Tue, Mar 2, 2021 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One minor comment:
+      </para>
+      <para>
+       True if the slot is enabled for decoding prepared transactions.  Always
+       false for physical slots.
+      </para></entry>
+     </row>

There is an extra space before Always. But when rendered in html this
is not seen, so this might not be a problem.

I am just trying to be consistent with the nearby description. For example, see:
"The number of bytes that can be written to WAL such that this slot is
not in danger of getting in state "lost". It is NULL for lost slots,
as well as if <varname>max_slot_wal_keep_size</varname> is
<literal>-1</literal>."

In Pg docs, comments, you will find that there are places where we use
a single space before the new line and also places where we use two
spaces. In this case, for the sake of consistency with the nearby
description, I used two spaces.

--
With Regards,
Amit Kapila.

#71vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#68)
Re: repeated decoding of prepared transactions

On Tue, Mar 2, 2021 at 9:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 2, 2021 at 8:20 AM vignesh C <vignesh21@gmail.com> wrote:

I have a minor comment regarding the below:
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase</structfield> <type>bool</type>
+      </para>
+      <para>
+      True if two-phase commits are enabled on this slot.
+      </para></entry>
+     </row>

Can we change something like:
True if the slot is enabled for decoding prepared transaction
information. Refer link for more information.(link should point where
more detailed information is available for two-phase in
pg_create_logical_replication_slot).

Also there is one small indentation in that line, I think there should
be one space before "True if....".

Okay, fixed these but I added a slightly different description. I have
also added the parameter description for
pg_create_logical_replication_slot in docs and changed the comments at
various places in the code. Apart from that ran pgindent. The patch
looks good to me now. Let me know what do you think?

Patch applies cleanly, make check and make check-world passes. I did
not find any other issue. The patch looks good to me.

Regards,
Vignesh

#72Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#71)
Re: repeated decoding of prepared transactions

On Tue, Mar 2, 2021 at 12:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Mar 2, 2021 at 9:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 2, 2021 at 8:20 AM vignesh C <vignesh21@gmail.com> wrote:

I have a minor comment regarding the below:
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>two_phase</structfield> <type>bool</type>
+      </para>
+      <para>
+      True if two-phase commits are enabled on this slot.
+      </para></entry>
+     </row>

Can we change something like:
True if the slot is enabled for decoding prepared transaction
information. Refer link for more information.(link should point where
more detailed information is available for two-phase in
pg_create_logical_replication_slot).

Also there is one small indentation in that line, I think there should
be one space before "True if....".

Okay, fixed these but I added a slightly different description. I have
also added the parameter description for
pg_create_logical_replication_slot in docs and changed the comments at
various places in the code. Apart from that ran pgindent. The patch
looks good to me now. Let me know what do you think?

Patch applies cleanly, make check and make check-world passes. I did
not find any other issue. The patch looks good to me.

Thanks, I have pushed this patch.

--
With Regards,
Amit Kapila.